Data pre processing for option trading using machine learning

Kirti · August 19, 2025, 1:38pm

Is it right to use standard scaler or minmax scaler on columns which are already bounded by upper or lower range and columns whose values are normalised,
Should we only select columns which are not normalised for standard scaler or minmax scaler?

Mohak_Pachisia · August 20, 2025, 1:42pm

Hi Kirti,

When deciding whether to apply scaling methods or not, the choice depends on the type of feature.

Features that are already stationary, or already scaled, or bounded within a fixed range, such as ratios or values between 0 and 1, typically do not require additional scaling. These features already carry their meaning in a normalized space, and rescaling them may add no value or even distort interpretation.

The exception here is, when using models that are highly sensitive to scale, such as support vector machines, k-means, or neural networks. In such cases, it can sometimes be useful to rescale all features uniformly, even if some are already bounded, to maintain consistency in optimization.

Raw features that are unbounded, such as volumes, are the ones that benefit most from scaling. Without it, these variables may dominate model training.

Additional tip:

Data leakage in scaling occurs when you fit a scaler on the full dataset, because each point (even the first) is influenced by future values. This introduces lookahead bias in time series, as normalization uses information not available at that time. The fix is to fit scalers only on training data, never on validation/test. For time series, use rolling or expanding windows: fit the scaler on data available up to that point, then transform the next window. For cross-sectional problems, fitting on the training set as a whole is fine, since temporal ordering is not critical.