BaggingRegressor

aytac_kolukisa_3nGcF · April 16, 2020, 3:11pm

Hi i attemp to decison tree in trading and I am workind BaggingRegressor. I try to fit like that:

bagging_reg.fit(x_train,y_train)

but throw me error :

raise ValueError(msg_err.format(type_err, X.dtype))

ValueError: Input contains NaN, infinity or a value too large for dtype('float32').

i sent my all code below :

import pandas as pd

import numpy as np

#%%

#İMPORT DATA AND DROP UNUSEFUL COLUMNS

data=pd.read_csv("AAPL.csv")

data.info()

data.drop(["Adj_Volume","Adj_Low","Adj_High","Adj_Open","Split","Dividend"],axis=1,inplace=True)

#%%

#DEFİNE PREDİCTOR AND VARAIBALES AN A TARGET VARIABLE

#RETURNS

data["ret1"]=data.Adj_Close.pct_change()

data["ret5"]=data.ret1.rolling(5).sum()

data["ret10"]=data.ret1.rolling(10).sum()

data["ret20"]=data.ret1.rolling(20).sum()

data["ret40"]=data.ret1.rolling(40).sum()

#STANDART DEVIATION

data["std5"]=data.ret1.rolling(5).std()

data["std10"]=data.ret1.rolling(10).std()

data["std20"]=data.ret1.rolling(20).std()

data["std40"]=data.ret1.rolling(40).std()

#Target veriable is going to be feture return so that we use shift() function.

data["retFut1"]=data.ret1.shift(-1)

#Drop nan

data.dropna()

predictor_list=["ret5","ret10","ret20","ret40","std5","std10","std20","std40","ret1","Volume"]

x=data[predictor_list]

y=data.retFut1

#SPLİT DATA

train_lenght=(int(len(data)*0.8))

x_train=x[:train_lenght]

x_test=x[train_lenght:]

y_train=y[:train_lenght]

y_test=y[train_lenght:]

#CREATE REGRESSIN MODEL

#Base Estimator remember firt of all each subset make a decision tree regression

from sklearn.tree import DecisionTreeRegressor

#Improt the BaggingRegressor

from sklearn.ensemble import BaggingRegressor

#Her bir altküme için sample sayısı belirleyelim

seed=42

#BaggingRegressor modelimiz oluşturalım

bagging_reg=BaggingRegressor(base_estimator=DecisionTreeRegressor(min_samples_leaf=400),

n_estimators=10,

random_state=seed)

bagging_reg

#Fit

bagging_reg.fit(x_train,y_train)

how can i hande it thanks

Ishan_Shah · April 16, 2020, 5:29pm

It looks like there are nan values in your data which you are passing to the Bagging ago. You can change the below code to remove the nan values. Thanks

#Drop nan

data.dropna(inplace=True)

aytac_kolukisa_3nGcF · April 17, 2020, 7:34am

Thanks your responds i type your cod and alll nan turn to zero than even ? type fit cod:bagging_reg.fit(x_train,y_train) it dosen't any throw after type Visalize The Model cod :

from sklearn import tree

import graphviz

dot_data = tree.export_graphviz(bagging_reg,

out_file=None,

filled=True,

feature_names=predictor_list)

graphviz.Source(dot_data)

it throw raise NotFittedError(msg % {'name': type(estimator).name})

NotFittedError: This BaggingRegressor instance is not fitted yet. Call 'fit' with appropriate arguments before using this method.

really i dont understant Why do I get an error even though I spell the code correctly ?

Ishan_Shah · April 17, 2020, 7:46am

Could you please share your full code with data on quantra@quantinsti.com? Thanks

aytac_kolukisa_3nGcF · April 17, 2020, 12:56pm

ok i will sent it

aytac_kolukisa_3nGcF · April 17, 2020, 9:37pm

Did you get my email ?

Akshay_Nautiyal_2ycOh · April 20, 2020, 10:16am

Hello Aytac,

Your code looks fine. Just one mistake in the end.

random_subspace object is a RandomForest generated by the class BaggingRegressor.

The sklearn function tree.export_graphviz takes only a DecissionTree to print. We can't print an entire forest!

So, you will have to either drop the idea or print individual Decision trees in the RandomForest. These Decision trees can be obtained by using the random_subspace.estimators_ member variable. This gives a list of all decision trees in the forest. You can print them like:

dot_data = tree.export_graphviz(regr.estimators_[0],
out_file=None,
filled=True,
feature_names=predictor_list)