Hi,
On the next image the results from a RandomForest Classifier run with default setings vs the same data run over a Model with hyper parameter tuning from the code below.
The difference is because the model with the default settings over fit and the model with the Hyper Parameter did not?
By this resuts should I use the one with a lower accuracy hopping to get better results over new data bases? What could be the next steps?
A couple of lines regarding the dataframe / excersise:
1700 rows
3 classes
no imbalance
31 features affter selecting them with Lasso and Correlation
from sklearn.model_selection import GridSearchCV from sklearn.ensemble import RandomForestRegressor random_forest = RandomForestRegressor() # Number of trees in random forest n_estimators = [int(x) for x in np.linspace(start=100, stop=300, num=3)] # Number of features to consider at every split max_features = [round(x, 2) for x in np.linspace(start=0.3, stop=1.0, num=3)] # Minimum number of samples required at each leaf node min_samples_leaf = [int(x) for x in np.linspace(start=200, stop=600, num=3)] # Method of selecting training subset for training each tree bootstrap = [True, False] # Create the random grid param_grid = {'n_estimators': n_estimators, 'max_features': max_features, 'min_samples_leaf': min_samples_leaf, 'bootstrap': bootstrap } param_grid # Grid search of parameters by searching all the possible combinations rf_grid = GridSearchCV(estimator= random_forest, param_grid=param_grid, cv=5 ) # Fit the model to find the best hyperparameter values rf_grid.fit(X_train, Y_train) # Best hyperparameter values rf_grid.best_params_