AI SVM model

Hello!

I have been trying to get an AI algo working. I have been closely following through with the tutorial. 

Here's the relevant machine learning part: 

def test_svm(df):
    df['Open-Close'] = df['open'] - df['close']
    df['High-Low'] = df['high'] - df['low']
    X = df[['Open-Close', 'High-Low']]
    y = np.where(df['close'].shift(-1) > df['close'], 1, 0)
    split_percentage = 0.8
    split = int(split_percentage*len(df))

# Train data set
    X_train = X[:split]
    y_train = y[:split]

# Test data set
    X_test = X[split:]
    y_test = y[split:]
    cls = SVC().fit(X_train, y_train)
    accuracy_train = accuracy_score(y_train, cls.predict(X_train))
    accuracy_test = accuracy_score(y_test, cls.predict(X_test))
    return accuracy_train,accuracy_test, cls,X,y

Running this in blueshift, I get this error: 

line 29, in rebalance
The number of classes has to be greater than one; got 1 class

I tried StackExchange and Overflow, but none of their answers exactly has to relate with mine. Further, I looked in the quantra Jupyter notebook, and found that there were 0s and 1s in the array when cls.predict() was run. 



Could someone help me out here please? What can I do to best fix this error?



Entire code: 

from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# For data manipulation
import pandas as pd
import numpy as np

# Zipline
from zipline.api import(    symbol,
                            order_target_percent,
                            schedule_function,
                            date_rules,
                            time_rules,
                       )

def initialize(context):
    context.spy = symbol('SPY')
    schedule_function(rebalance, date_rules.week_start(), time_rules.market_open(hours=1))
def rebalance(context, data):
    df = data.history(context.spy, ['low','high','close','open','price'], 100, '1m')
    accuracy_train, accuracy_test,cls,X,y = test_svm(df)
    df['predicted_signal'] = cls.predict(X)
    if accuracy_test > 0.5:
        for val in df['predicted_signal'].values:
            if val == 1:
                if data.can_trade(context.spy):
                    order_target_percent(context.spy, 0.6)
            elif val == 0:
                if data.can_trade(context.spy):
                    order_target_percent(context.spy,0.0)
            else:
                pass
    else: 
        print('Null hypothesis. Model failure.')
def test_svm(df):
    df['Open-Close'] = df['open'] - df['close']
    df['High-Low'] = df['high'] - df['low']
    X = df[['Open-Close', 'High-Low']]
    y = np.where(df['close'].shift(-1) > df['close'], 1, 0)
    split_percentage = 0.8
    split = int(split_percentage*len(df))

# Train data set
    X_train = X[:split]
    y_train = y[:split]

# Test data set
    X_test = X[split:]
    y_test = y[split:]
    cls = SVC().fit(X_train, y_train)
    accuracy_train = accuracy_score(y_train, cls.predict(X_train))
    accuracy_test = accuracy_score(y_test, cls.predict(X_test))
    return accuracy_train,accuracy_test, cls,X,y

 

Hi Tim,



The issue is because of the line below:

y = np.where(df['close'].shift(-1) > df['close'], 1, 0)
Currently what is happening is that you are using SPY as your security. However, at present SPY is used just as a benchmark. Data fetch for minute frequency fails. 

As a result the condition df['close'].shift(-1) > df['close'] is never satisfied and the above line assigns all 0s to y. Therefore there is only 1 label in the target variable and hence the error. The target variable y should have at least 2 labels.

Thus, you should change the security and try with another asset. Ensure that the number of labels in the train and as well as test data is two.

Hope this helps.

Thanks!