NLP in Trading - BERT implementation

Saurabh_Kamal_4ltSd · March 13, 2024, 12:07am

Hi,

I have a very important question. I am trying to run BERT. Created the environment, downloaded the pre-trained bert model, but when I am trying to run the following command, I get an error:

bert-serving-start -model_dir "C:/Users/sharma/Desktop/Saurabh/downloads/uncased_L-12_H-768_A-12" --num_worker=2

Type Error: 'NoneType' object is not iterable

I have tried different versions of the same code, but then I get different error messages

I am stuck not able to move forward.

I would really appreciate your reply on this problem.

Thanks & Regards

Saurabh Kamal

bert-serving-start: error: unrecognized arguments: --num_worker=2

I am using CPU machine, so even after trying -cpu. It is not working.

I am not able to move forward. I would really appreciate if you could provide any solution on this case.

Thanks & Regards

Saurabh Kamal

varun_kumar_pothula · March 13, 2024, 1:24pm

Hello Saurabh,

Could you please check the versions of Python and TensorFlow on your local system?

It should be noted that the BERT server will only run on Python 3.5 to Python 3.7.13 with TensorFlow 1.10 or above. Also, TensorFlow 2.0 or above may result in log error issues, so it would be beneficial to install TensorFlow versions from 1.10 to 1.15. Without the combination of the above versions of TensorFlow and Python, you may encounter a TypeError related to a non-iterable NoneType object.

Please ensure that the mentioned versions are maintained in your local system and attempt to start the BERT server. Please let us know if this resolves the issue.

Saurabh_Kamal_4ltSd · March 13, 2024, 11:55pm

Hi Varun,

Thanks for your reply.

I set the environment again.

The only problem I figured out what in the bert-serving-start-model code, which doen't match the code provided in the set-up environment document. The difference was of inverted commas around the code.

bert-serving-start -model_dir "C:\Users\shefali sharma\Desktop\Saurabh\downloads\uncased_L-12_H-768_A-12-num_worker=2" -cpu

However, after seeing this message of Ready and Listening. My system hanged on the Anaconda Prompt. Is there any way, I can try on google colab? like creating an environment and implement BERT?

Thanks & Regards

Saurabh Kamal

varun_kumar_pothula · March 15, 2024, 8:03am

Hello Saurabh,

Yes, you can implement the same in Google Colab using the following code. This is an example implementation in Google Colab for the word embedding explained in Section 13, unit 5 of the course.

The code in the course uses the bert_serving library for BERT embeddings. This library starts a BERT server that handles the computation of BERT embeddings.

The following code for Google Colab uses the transformers library, which is a more modern and widely used library for working with transformer models like BERT. It directly utilises the BERT model available within the transformers library without the need for a separate server.

!pip install transformers
import torch
import numpy as np
from transformers import BertTokenizer, BertModel

def bert_encode(sentences, model_name='bert-base-uncased'):
    # Load pre-trained BERT model and tokenizer
    tokenizer = BertTokenizer.from_pretrained(model_name)
    model = BertModel.from_pretrained(model_name)

    # Tokenize and encode the sentences
    inputs = tokenizer(sentences, return_tensors="pt", padding=True, truncation=True)

    # Pass tokens through BERT model
    with torch.no_grad():
        outputs = model(**inputs)
        cls_embeddings = outputs.last_hidden_state[:, 0, :].numpy()  # Extract [CLS] token embeddings and convert to numpy array

    return cls_embeddings

# Example sentences
news_headlines = [
    "Amazon's Getting Aggressive This Holiday Season",
    "Morgan Stanley Analysts Give Microsoft (NASDAQ:MSFT) a $155.00 Price Target"
]

# Encode the sentences
embeddings = bert_encode(news_headlines)

I hope this helps! Feel free to comment in case of any queries on this.