The data set for GC1 from quandl contains zero values in OHL columns and wrong values in Close column from Oct-2017-Dec-2017 which is skewing the data graph. I'm trying to drop the rows where any column value is Zero but the code is not executing correctly. Can you point our where I'm going wrong? Below is the code snippet:
Data = quandl.get("CHRIS/MCX_GC1", start_date="2017-1-1", api_key=api_key) # get Gold prices from Quandl
The code is not working. The problem is that although OHL columns have 0 values, values in Close column are non-zero but they are incorrect. I want to either backfill/forward-fill the values in OHLC columns (including Close column) where either of OHL value is zero. I've tried the "replace" method also but the same is also not working.
Set Close to zero when Open is 0. You can do similar exercise when other columns are zero
Data.loc[Data.Open==0.0, 'Close'] = 0.0
Replace zeros in the dataframe with NaN or blank values
Data = Data.replace(0, np.nan)
Replace NaN values with previous values
Data = Data.fillna(method='ffill')
Print the top 5 rows
Data.head()
I would recommend that you print the Data after each step to see what is changing. Also, instead of forward filling the values you can also consider to drop the values.
Yes, this works. However, ffill and dropping rows containing 0 values have little effect on the graph. It still smoothens out the portion of the graph where values are missing. But it is much better than having a graph with an abnormal dip in the plot.