Hello there:
I was actually writing a code in order to evaluate the Hurst exponent of 9 periods in python, after that the code reads an excel sheet and puts the data into a pandas dataframe, and evaluates the hurst exponent for the column of the lows and the column of the highs (using only 9 periods of data) but I get this error:
File "C:\Users\Ghery\anaconda3\lib\site-packages\pandas\core\indexing.py", line 761, in _validate_key_length
raise IndexingError("Too many indexers")
IndexingError: Too many indexers
The code I used was this:
from scipy import stats
import pandas as pd
import numpy as np
from scipy import stats
def Hurst9(df):
# calculate returns and eliminate the first row
df = df.pct_change()
df = df.iloc[1:]
# split the dataframe in 2 dataframes each one with the first and the last rows
df1 = df.iloc[:4,:]
df2 = df.iloc[4:,:]
# split the later dataframes in 2 other dataframes
df1_1 = df1.iloc[:2,:]
df1_2 = df1.iloc[2:,:]
df2_1 = df2.iloc[:2,:]
df2_2 = df2.iloc[2:,:]
# calculate the standard deviation of every dataframe
stdev = df.std()
stdev1 = df1.std()
stdev2 = df2.std()
stdev1_1 = df1_1.std()
stdev1_2 = df1_2.std()
stdev2_1 = df2_1.std()
stdev2_2 = df2_2.std()
# Rest the mean for every column
df = df.add(-df.mean())
df1 = df1.add(-df1.mean())
df2 = df2.add(-df2.mean())
df1_1 = df1_1.add(-df1_1.mean())
df1_2 = df1_2.add(-df1_2.mean())
df2_1 = df2_1.add(-df2_1.mean())
df2_2 = df2_2.add(-df2_2.mean())
# Calculate the cumulative sum of every dataframe
df = df.cumsum()
df1 = df1.cumsum()
df2 = df2.cumsum()
df1_1 = df1_1.cumsum()
df1_2 = df1_2.cumsum()
df2_1 = df2_1.cumsum()
df2_2 = df2_2.cumsum()
# Calculate the range for each column
r = df.max() - df.min()
r1 = df1.max() - df1.min()
r2 = df2.max() - df2.min()
r1_1 = df1_1.max() - df1_1.min()
r1_2 = df1_2.max() - df1_2.min()
r2_1 = df2_1.max() - df2_1.min()
r2_2 = df2_2.max() - df2_2.min()
# Calculate the rescaled range
rs = r/stdev
rs1 = r1 / stdev1
rs2 = r2 / stdev2
rs1_1 = r1_1 / stdev1_1
rs1_2 = r1_2 / stdev1_2
rs2_1 = r2_1 / stdev2_1
rs2_2 = r2_2 / stdev2_2
# Calculate average rescaled range for each chunk
ave_RS = float(rs)
ave_RS_1 = float(0.5*(rs1 + rs2))
ave_RS_2 = float((0.25*(rs1_1 + rs1_2 + rs2_1 + rs2_2)))
# Evaluate the natural logarithm for each size and each average rescaled range
x = np.log(np.array([8,4,2]))
y = np.log(np.array([ave_RS,ave_RS_1,ave_RS_2]))
slope, intercept, r_value, p_value, std_err = stats.linregress(x,y)
return slope
data = pd.read_excel('file.xlsx')
Hurst_low = Hurst9(data['low'])
Hurst_high = Hurst9(data['high'])
print('Hurst low is', Hurst_low)
print('Hurst high is', Hurst_high)
Can Anyone tell me how to solve this??? whats wrong with my code??
Hi Ghery,
This may happen sometimes when indexes are similar or contain similar values.
For more details, you can also refer to the following thread -
https://stackoverflow.com/questions/30781037/too-many-indexers-with-dataframe-loc
Regards,
Akshay
Ok, thanks… but I reviewed the link and I am still not sure how to fix
Hi Ghery,
One possible solution can be splitting the rows simply using df.iloc[:X]
For example -
df1 = df.iloc[:4]
df2 = df.iloc[4:]
I hope this helps.
Thanks,
Akshay