Question about classification tree course from Dr. Chan

Hi, 



I am doing the Machine learning courses by Dr. Chan. I was following along with the first example of a classification tree model. I was able to run the notebook correctly (where they use the example of ACC stock), but when trying to do the same steps myself for a different stock, I get an error in the training/testing split. 



My code- 

from sklearn.model_selection import train_test_split



X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.3, random_state=42, stratify= y)



print (X_train.shape, Y_train.shape)

print (X_test.shape, Y_test.shape)



Error- 

NameError                                 Traceback (most recent call last)
<ipython-input-32-d399d509db2c> in <module>()
      1 from sklearn.model_selection import train_test_split
      2 
----> 3 X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.3, random_state=42, stratify= y)
      4 
      5 print (X_train.shape, Y_train.shape)

NameError: name 'y' is not defined

I definitely have scikit installed. Not sure why I am gettingi this error. Can anyone help please? 

To resolve this error, please define the target variable in lower-case, 'y' instead of upper-case 'Y'. 

Thank you for this response. Now I get a new error- please see in the answer below, as on this prompt i am unable to paste code properly. Thanks for your help.

Still get an error although a different one- 

my code- 



from sklearn.model_selection import train_test_split



X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42, stratify=y)



print (X_train.shape, y_train.shape)

print (X_test.shape, y_test.shape)



Error-

ValueError                                Traceback (most recent call last)
<ipython-input-41-496d06629092> in <module>()
      1 from sklearn.model_selection import train_test_split
      2 
----> 3 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42, stratify=y)
      4 
      5 print (X_train.shape, y_train.shape)

~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\model_selection\_split.py in train_test_split(*arrays, **options)
   2029         test_size = 0.25
   2030 
-> 2031     arrays = indexable(*arrays)
   2032 
   2033     if shuffle is False:

~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\utils\validation.py in indexable(*iterables)
    227         else:
    228             result.append(np.array(X))
--> 229     check_consistent_length(*result)
    230     return result
    231 

~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\utils\validation.py in check_consistent_length(*arrays)
    202     if len(uniques) > 1:
    203         raise ValueError("Found input variables with inconsistent numbers of"
--> 204                          " samples: %r" % [int(l) for l in lengths])
    205 
    206 

ValueError: Found input variables with inconsistent numbers of samples: [3, 129]

Please help!
Thanks
 

I actually managed to resolve this issue (I think), it seems I wasn’t defining my X properly. I do have some other issue with running the machine learning part though, but I will ask that as a separate question. thank you !