Difference in performance between mikropml and caret #331
-
Hi! For my MSc Thesis, I have compared the performance (using the AUC over the test set) between caret (using trainControl and train functions) and mikropml (run_ml function) on the same dataset (consisting in abundances from a 16S rRNA sequencing study) by means of Random Forest (rf). Even considering that mikropml is based on caret, and that I used the same exact seed, LOOCV and an 80/20 split for both procedures, AUCs differ and, most importantly, selected hyperparameter values also differ in a more considerable way, even if the provided range of values used for grid search was also the same in both procedures. Is there a possible explanation for this? This is the code I used for the caret-based RF model:
This is the code I used for the mikropml-based RF model:
AUCs for the best (best meaning the one with the highest AUC out of all the built models using LOOCV and grid search) caret-based model and the best mikropml-based one were 0.85 and 0.9, respectively. Selected hyperparameter values were mtry=7 and ntree=4 for the caret-based one and mtry=15 and ntree=17 for the mikropml-based one. What would this difference in hyperparameter values be due to? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
There are a few reasons I would expect different performance values and different run times with these two code samples:
Given these major differences I'm not at all surprised to see different performance values, as well as a longer runtime for |
Beta Was this translation helpful? Give feedback.
There are a few reasons I would expect different performance values and different run times with these two code samples:
find_feature_importance = TRUE
, which takes a considerable amount of run time. Caret'strain()
doesn't do permutation feature importance.calculate_performance = TRUE
, which calculates the model performance on the test set, although it shouldn't be too slow. Caret'strain()
doesn't do this step.training_frac
a vector of indices if you want to specify the exact training set. (Note: since you set the s…