Cross Validation
Cross-validation is a technique used to validate a model by checking the results of a statistical analysis on an independent data. It is one of the methods for assessing and choosing the best parameters in a prediction or machine learning task. The process of cross validation includes keeping aside a sample dataset, then training the model on the remaining dataset and finally, using the dataset kept aside to test if the model gives a positive result or not.
Mainly, there are three methods that are used for cross-validation, namely:
> k fold cross validation: In k-fold cross validation, a dataset is divided into k subsets. Each time, k subsets are used as the test dataset and other k-1 subsets are put together to form a training dataset. Every data point is in the test set exactly once and in a training set k-1 times. The results from the folds are then averaged to produce a single estimation.
> Validation set approach: In this approach, 50% of the dataset is kept for training the model, while the rest of the 50% is used for validation.
> Leave one out cross validation: As the name suggests, only one data-point is reserved for testing, while training is done on the remaining dataset and this iteration is done for each and every data point.