Time series resampling

philip_hoy · December 5, 2024, 2:08am

I am resampling 1min OHLC to various calculations to perform machine learning on how a 30 minute calculation and 60min calculation will effect say the 240min. I understand how to group the data but if I want to perform column calculations to column the rows dont match. I need 30 rows of 1min data to get 1 row of 30….if I group by and perform the calculation from the 1min data frame into the 1min data frame then I have 30rows of repeating values in my 30min column. Or if I create a new data frame from my 1min to new 30min data frame and perform calculations in a separate dataframe and rejoin them I have to deal with rows that are not equal and will have to create repeating values this way??? How is the usually dealt with??? But also if lets say I have a column of 1min data and a column of 30min data and call it 158,000 rows roughly a year and do merge them something will be repeating values one way or the other. In order to get my rows to match to perform column to column calculations. How with this effect the outcome. And are there any resources to show me the options.

I suppose I could do this things from an iteration or a vector and create a step in some way to match…there must be a simpler way ???

Ajay_Pawar · December 5, 2024, 7:01am

You’re trying to:

Resample 1-minute OHLC data to higher intervals (30 minutes, 60 minutes, 240 minutes).
Perform column-to-column calculations across different timeframes.
Align rows from different intervals without unnecessary repetition or mismatches.

The main issue arises because the number of rows in higher intervals is fewer (e.g., 1 row for 30 minutes vs. 30 rows for 1 minute), which makes merging and calculations tricky

How This Code Solves the Problem

Step-by-Step Explanation

Fetch 1-Minute Data

symbol = “AAPL” 
data = yf.download(symbol, start=“2024-12-03”, end=“2024-12-04”, interval=“1m”)

This provides 1-minute OHLC data for your desired timeframe.

Create Batches for 30-Minute Intervals

df[‘30min_batch’] = df[‘Datetime’].dt.floor(‘30T’)

Group rows into 30-minute intervals using dt.floor(). This ensures all timestamps in a batch are aligned to a common 30-minute interval.

Calculate Rolling OHLC Values

df[‘30min_open’] = df.groupby(‘30min_batch’)[‘Open’].transform(‘first’)
df[‘30min_close’] = df[‘Close’]
df[‘30min_high’] = df.groupby(‘30min_batch’)[‘High’].transform(lambda x: x.cummax())
df[‘30min_low’] = df.groupby(‘30min_batch’)[‘Low’].transform(lambda x: x.cummin())

For each 30-minute batch:
- 30min_open: The first Open value in the interval.
- 30min_close: Dynamically updates with the current Close value in the interval.
- 30min_high and 30min_low: Dynamically update with the cumulative max/min values within the interval.

Align Resampled Data with 1-Minute Data

The resulting 30min_* columns are directly aligned with the original 1-minute data because the .transform() function retains the original dataframe’s index structure.