Remove missing values

Course Name: Backtesting Trading Strategies, Section No: 4, Unit No: 9, Unit type: Exercise

Is it better to remove the rows with NaN or to fill them with previous / later or zero value?

Hello Apostolos,

To answer your question in short,



Removal of rows/columns:

It is a quick and convenient solution, especially if the amount of missing data is very insignificant.

A downside here is that you may also delete useful information in the process.



Imputing the missing value (or simply, replacing the missing value):

If you feel that you can make a pretty good approximation about what the missing value would be, you can choose to replace it with some arbitrary value.



Now there are many ways to do this and some popular methods include:

  • Replace with 0 (this may create discontinuity in time-series data)
  • Replace with previous or next value (to ensure continuity in a time-series data)
  • Replace with Mean, Median or Mode of the series
I recommend you to go ahead and check out the blog on Data Preprocessing linked below
Data Preprocessing: Python, Machine Learning, Examples and more

You will find the section on Missing Values very interesting as it explores a lot of these concepts in detail.

I hope this answers your question.
Feel free to reach out if you face any challenges in the future.

Thank you, Kevin.