Hi,
I am doing the Machine learning courses by Dr. Chan. I was following along with the first example of a classification tree model. I was able to run the notebook correctly (where they use the example of ACC stock), but when trying to do the same steps myself for a different stock, I get an error in the training/testing split.
My code-
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.3, random_state=42, stratify= y)
print (X_train.shape, Y_train.shape)
print (X_test.shape, Y_test.shape)
Error-
NameError Traceback (most recent call last)
<ipython-input-32-d399d509db2c> in <module>()
1 from sklearn.model_selection import train_test_split
2
----> 3 X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.3, random_state=42, stratify= y)
4
5 print (X_train.shape, Y_train.shape)
NameError: name 'y' is not defined
I definitely have scikit installed. Not sure why I am gettingi this error. Can anyone help please?
To resolve this error, please define the target variable in lower-case, 'y' instead of upper-case 'Y'.
Thank you for this response. Now I get a new error- please see in the answer below, as on this prompt i am unable to paste code properly. Thanks for your help.
Still get an error although a different one-
my code-
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42, stratify=y)
print (X_train.shape, y_train.shape)
print (X_test.shape, y_test.shape)
Error-
ValueError Traceback (most recent call last)
<ipython-input-41-496d06629092> in <module>()
1 from sklearn.model_selection import train_test_split
2
----> 3 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42, stratify=y)
4
5 print (X_train.shape, y_train.shape)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\model_selection\_split.py in train_test_split(*arrays, **options)
2029 test_size = 0.25
2030
-> 2031 arrays = indexable(*arrays)
2032
2033 if shuffle is False:
~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\utils\validation.py in indexable(*iterables)
227 else:
228 result.append(np.array(X))
--> 229 check_consistent_length(*result)
230 return result
231
~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\utils\validation.py in check_consistent_length(*arrays)
202 if len(uniques) > 1:
203 raise ValueError("Found input variables with inconsistent numbers of"
--> 204 " samples: %r" % [int(l) for l in lengths])
205
206
ValueError: Found input variables with inconsistent numbers of samples: [3, 129]
Please help!
Thanks
I actually managed to resolve this issue (I think), it seems I wasn’t defining my X properly. I do have some other issue with running the machine learning part though, but I will ask that as a separate question. thank you !