Missing 2017 & 2018 data on strategy performance charts

Hi,
Under Analyse The Strategy Performance, the analyse_strategy function seems to miss out on 2017 & 2018 data as they didn’t appear on both charts. Please explain? Also, can have the code to forecast next day’s GARCH_predicted_volatility? E.g today is 1st April, and the code shall print out GARCH_predicted_volatility for 2nd April?

Course Name: Financial Time Series Analysis for Trading, Section No: 25, Unit No: 9, Unit type: Notebook

Hi RR,

The data we are using is from 2020-12-21.

Understand the Model Training and Future Predictions:

When you’re building a machine learning model, especially for time-series or financial data,
the process generally involves two main stages:

Step 1: Training the Model

We start with a dataset that has:

  • Features (X): things the model can “look at” to make a decision
    (e.g., current and past prices of Nifty and Gold).
  • Target (y): what we want the model to predict
    (e.g., the future price of gold).

We split the data into a training set and a testing set, then fit the model using:

model.fit(X_train, y_train)

This tells the model: “Here’s a set of inputs (X_train) and the correct answers (y_train). Learn the pattern.”

Step 2: Making Predictions on New Data

Once trained, the model can now make predictions using:

model.predict(X_test)

This returns predicted values for the test data. These can be compared to actual values (since y_test is known),
and you can calculate how well the model did using metrics like RMSE.

You can also pass in just one new row as long as it contains the same features the model was trained on.

Example:

X_new_test = pd.DataFrame([{
    'nifty': 22000,
    'nifty_lag1': 21800,
    'gold_lag1': 190,
    'gold': 192
}])

prediction = model.predict(X_new_test)

This might be today’s market data, and you’re using it to predict next month’s gold price.

Important Note: Predicting the Future

When you make a prediction for the future (like next month), the actual value hasn’t happened yet.
So you can’t compare it with anything right now.

You’re forecasting based on patterns in past data — but you’ll only know how accurate it is once the real data for next month becomes available.

This is different from evaluating your model on test data, where both input and output already exist.

Summary:

  • model.fit(X_train, y_train): trains your model.
  • model.predict(X_new_test): makes predictions on any new data, even a single row.
  • Ensure X_new_test has the same features the model was trained on.
  • If you’re predicting the future, you have to wait for real data to evaluate the prediction.

I think you are not answering my question, so let me rephrase it:

(1) Under Analyse The Strategy Performance, the analyse_strategy function seems to miss out on 2017 & 2018 data as they didn’t appear on both charts. Please explain?

(2) Also, can have the code to forecast next day’s GARCH_predicted_volatility? E.g today is 1st April, and the code shall print out GARCH_predicted_volatility for 2nd April?

Hi RR,

Thanks for clarifying:

For part 1:

While exploring the data we realised that VXX data is available from 2019.

# The data is stored in the directory 'data_modules'
path = '../data_modules/'
# Read the csv file using read_csv method of pandas
data = pd.read_csv(path + 'SP500_VXX_price_2017_2020_GARCH.csv', index_col=0)
data.index = pd.to_datetime(data.index, format = "%d-%m-%Y")
#data.tail()
# Finding first non na values in each column
for i in data.columns:
    print(i," " ,data[i].first_valid_index())

SP500   2017-01-03 00:00:00
VIX     2017-01-03 00:00:00
VXX     2019-01-02 00:00:00

Also at step:

# Generate the trading signal
data['signal'] = np.where(data['GARCH_predicted_volatility'] > data['actual_historical_volatility'], 1, -1)

# Calculate the strategy returns
data['strategy_returns'] = data['VXX'].pct_change() * data.signal.shift(1)

# Finding first non na values in each column
for i in data.columns:
    print(i," " ,data[i].first_valid_index())

data.dropna(inplace=True)
data.head()

SP500                         2017-01-03 00:00:00
VIX                           2017-01-03 00:00:00
VXX                           2019-01-02 00:00:00
log_returns                   2017-01-04 00:00:00
actual_historical_volatility  2017-01-24 00:00:00
GARCH_predicted_volatility    2018-01-03 00:00:00
signal                        2017-01-03 00:00:00
strategy_returns              2019-01-03 00:00:00

Because VXX has missing values till 2019-01-02 after dropna the data is starting from: 2019-01-03.

For part 2:

Let’s reduce the complexity of the code:

# Calculate log returns and historical volatility
data['log_returns'] = np.log(data['SP500'].pct_change() + 1) * 100
data['actual_historical_volatility'] = data['log_returns'].rolling(14).std() * np.sqrt(252)

# Display first 5 rows 
print(data.round(2).head())

# For GARCH predicted volatility, we'll use a rolling window approach without a separate function
garch_volatility = []

# Loop through the data with a 252-day rolling window
for i in range(252, len(data)):
    window_returns = data['log_returns'].iloc[i-252:i].values

    # Define and fit GARCH model
    gm = arch_model(window_returns, vol='GARCH', p=1, q=1, dist='skewt')
    gm_fit = gm.fit(disp='off')

    # Make forecast and calculate annualized volatility
    forecasted_variance = gm_fit.forecast(horizon=1).variance.values[-1]
    annualized_vol = np.sqrt(forecasted_variance) * np.sqrt(252)

    garch_volatility.append(annualized_vol)

# Fill with NaN for the first 252 days
garch_volatility = [np.nan] * 252 + garch_volatility
data['GARCH_predicted_volatility'] = garch_volatility

# Generate trading signals (1 for long VXX, -1 for short VXX)
data['signal'] = np.where(data['GARCH_predicted_volatility'] > data['actual_historical_volatility'], 1, -1)

Now for next day forecast we can:

latest_returns = data['log_returns'].tail(252).values
next_day_gm = arch_model(latest_returns, vol='GARCH', p=1, q=1, dist='skewt')
next_day_gm_fit = next_day_gm.fit(disp='off')
next_day_forecast = next_day_gm_fit.forecast(horizon=1)
next_day_volatility = np.sqrt(next_day_forecast.variance.values[-1][0]) * np.sqrt(252)
print(f"Forecasted GARCH volatility for next trading day: {next_day_volatility:.2f}%")

Thanks for the reply