RandomForest Results Vs Hyper Parameter Tuning Results

Alejandro_Holguin · June 2, 2022, 5:17pm

Hi,

On the next image the results from a RandomForest Classifier run with default setings vs the same data run over a Model with hyper parameter tuning from the code below.

The difference is because the model with the default settings over fit and the model with the Hyper Parameter did not?

By this resuts should I use the one with a lower accuracy hopping to get better results over new data bases? What could be the next steps?

A couple of lines regarding the dataframe / excersise:

1700 rows

3 classes

no imbalance

31 features affter selecting them with Lasso and Correlation

from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestRegressor
random_forest = RandomForestRegressor()


# Number of trees in random forest
n_estimators = [int(x) for x in np.linspace(start=100, stop=300, num=3)]

# Number of features to consider at every split
max_features = [round(x, 2) for x in np.linspace(start=0.3, stop=1.0, num=3)]

# Minimum number of samples required at each leaf node
min_samples_leaf = [int(x) for x in np.linspace(start=200, stop=600, num=3)]

# Method of selecting training subset for training each tree
bootstrap = [True, False]

# Create the random grid
param_grid = {'n_estimators': n_estimators,
              'max_features': max_features,
              'min_samples_leaf': min_samples_leaf,
              'bootstrap': bootstrap
              }

param_grid


# Grid search of parameters by searching all the possible combinations
rf_grid = GridSearchCV(estimator= random_forest,
                       param_grid=param_grid, cv=5
                       )

# Fit the model to find the best hyperparameter values
rf_grid.fit(X_train, Y_train)

# Best hyperparameter values
rf_grid.best_params_