Hi,
The code to predict next day’s up & down deviation is missing. It should have at present day. For example, by running the model today on 1st Apr, there should be prediction code for 2nd April and so on. Which function is applicable for this request? Thanks.
Following code shows how to predict for last row, the model is just for demonstration:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import yfinance as yf
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.model_selection import GridSearchCV, TimeSeriesSplit
import matplotlib.dates as mdates
from datetime import datetime
# Define date range
start_date = '2000-01-01'
end_date = datetime.now().strftime('%Y-%m-%d')
# Download data
print(f"Downloading data from {start_date} to {end_date}")
nifty = yf.download('^NSEI', start=start_date, end=end_date)
gold = yf.download('GLD', start=start_date, end=end_date)
# Handle MultiIndex columns if they exist
if isinstance(nifty.columns, pd.MultiIndex):
nifty.columns = nifty.columns.droplevel(1)
if isinstance(gold.columns, pd.MultiIndex):
gold.columns = gold.columns.droplevel(1)
# Calculate monthly average prices
nifty_monthly_avg = nifty['Close'].resample('M').mean()
gold_monthly_avg = gold['Close'].resample('M').mean()
# Combine data and drop any rows with missing values
data = pd.DataFrame({
'nifty': nifty_monthly_avg,
'gold': gold_monthly_avg
}).dropna()
# Create lag features and target variable
data['nifty_lag1'] = data['nifty'].shift(1)
data['gold_lag1'] = data['gold'].shift(1)
data['future_gold_price'] = data['gold'].shift(-1)
# Save last row for prediction
last_row = data[['nifty', 'nifty_lag1', 'gold_lag1', 'gold']].iloc[-1]
last_date = data.index[-1]
# Create dataset for predictions
features_df = data[['nifty', 'nifty_lag1', 'gold_lag1', 'gold']].copy()
features_df_clean = features_df.dropna()
# Remove rows with NaN values after creating features
data_clean = data.dropna()
# Split the data into features and target
X = data_clean[['nifty', 'nifty_lag1', 'gold_lag1', 'gold']]
y = data_clean['future_gold_price']
# Split into training and testing sets (80% train, 20% test)
train_size = int(len(data_clean) * 0.8)
X_train, X_test = X.iloc[:train_size], X.iloc[train_size:]
y_train, y_test = y.iloc[:train_size], y.iloc[train_size:]
# Create pipeline with scaling and linear regression
pipeline = Pipeline([
('scaler', StandardScaler()),
('linear', LinearRegression())
])
# Define parameters for grid search
parameters = {'linear__fit_intercept': [True, False]}
# Use TimeSeriesSplit for cross-validation
tscv = TimeSeriesSplit(n_splits=5)
# Perform grid search with cross-validation
model = GridSearchCV(
pipeline,
parameters,
scoring='neg_mean_squared_error',
cv=tscv
)
model.fit(X_train, y_train)
# Get best parameters
best_params = model.best_params_
print("Best parameters:", best_params)
# Train final model with best parameters
final_model = LinearRegression(fit_intercept=best_params['linear__fit_intercept'])
final_model.fit(X_train, y_train)
# Make predictions on the test set
y_pred = model.predict(X_test)
# Calculate RMSE for the test set
rmse = np.sqrt(np.mean((y_test - y_pred) ** 2))
print(f"Test RMSE: {rmse:.2f}")
# Compare predicted vs actual values
comparison = pd.DataFrame({'Actual': y_test, 'Predicted': y_pred})
print(comparison.tail()) # Show the last few predictions vs actual values
# === Final Prediction for Next Month ===
# Prepare input for the final prediction (only last row)
last_row_df = pd.DataFrame([last_row])
# Predict the future gold price
predicted_next_gold_price = final_model.predict(last_row_df)[0]
# Compute the prediction date (1 month after the last date)
prediction_date = last_date + pd.DateOffset(months=1)
# Display result
print(f"\nPredicted Gold Price for {prediction_date.strftime('%Y-%m-%d')}: ${predicted_next_gold_price:.2f}")
Thanks,
AJ
[quote=“Red Red, post:1, topic:26381, full:true, username:Red_Red”]
Hi,
The code to predict next day’s up & down deviation is missing. It should have at present day. For example, by running the model today on 1st Apr, there should be prediction code for 2nd April and so on. Which function is applicable for this request? Thanks.
You can refer to the notebook “Predict the Next Day’s High and Low” (Section 6 Unit 5) of the course.
Ideally you will have a dataset which spans years and has the data till 1st April 2025 as the last row.
Then you will be dividing it into train test, where the last column of the test dataset consist of data of 1 April.
If you fit the regression model on the train dataset and use the trained model to predict the up and down deviation, you will be able to use the predicted upside deviation values to calculate the High price and the predicted downside deviation values to calculate the Low price as shown in the notebook.
This predicted high and predicted low is stored in the P_H and P_H column in the dataframe X_test. And you can find the predicted high and low columns for the last day of the test dataset by printing the last column, or simply using the code, X_test.P_H.tail(1) and X_test.P_L.tail(1).
But you should be using a dataset which contains the data of April 1, and you should follow all the steps to train and test the model.
Here, the point is that we have predicted the high and low based on the open price of the day. If you check the video in section 7 unit 1, that we are creating a trading strategy based on these values for the next day, and not actually predicting the prices for the next day. Theconditions for buying or selling are given in the notebook in section 7 unit 6.
I hope this clarifies your point. If you want to predict the prices for the next day, you will have to change the output or target variables.