Importing data from yahoo finance is showing an extra ticker row below the column headings

Amish_Shah · December 13, 2024, 6:45am

While downloading data from yahoo finance in jupyter notebook, there is an extra row of Ticker below the Column headings OHLCV, due to which converting the data to csv is also including the extra rows. How do I delete the Ticker row.

Ajay_Pawar · December 13, 2024, 12:37pm

Problem

When using yfinance to download data for multiple tickers, the resulting data often contains a multi-index structure:

Level 1: “Ticker” (e.g., RELIANCE.NS, TCS.NS).
Level 2: Column headings (Open, High, Low, Close, Volume).

This structure makes it challenging to analyze or export the data in a flat format.

Solution

The following steps address the issue:

Fetch Data: Use yfinance to download the stock data.
Melt Multi-Index Data: Use stack(level=0) to convert the hierarchical data into a long-format table.
Rename Columns: Rename level_1 to Ticker for clarity.
Export Clean Data: Save the cleaned dataset to a CSV file for further analysis.

Code

import yfinance as yf
import pandas as pd


tickers = ['RELIANCE.NS', 'TCS.NS', 'INFY.NS', 'HDFCBANK.NS', 'ICICIBANK.NS']

# Define the time period
period = "5d"

# Fetch data for all tickers in a single call
# The 'group_by="ticker"' parameter organizes the data by ticker symbols,
# making it easier to process multiple stocks in one DataFrame.
combined_data = yf.download(tickers, period=period, group_by='ticker')

# Print the columns to verify the structure of the DataFrame
# This helps ensure that the DataFrame has the expected MultiIndex columns.
print(combined_data.columns)

# Convert MultiIndex columns into a long format using stack
# Level 0 in the MultiIndex represents the tickers (e.g., 'TCS.NS', 'RELIANCE.NS').
# The stack(level=0) function moves these tickers into rows, creating a long-format DataFrame.
melted_data = combined_data.stack(level=0)
print(melted_data.head())

# Reset the index to flatten the DataFrame
melted_data = melted_data.reset_index()
print(melted_data.head())

# Rename 'level_1' to 'Ticker' for better clarity
melted_data.rename(columns={
    'level_1': 'Ticker', 
}, inplace=True)

# Display the cleaned and processed DataFrame
print(melted_data.head())

Amish_Shah · December 13, 2024, 1:15pm

The issue is with using yfinance to download data for a single ticker,

So for example after downloading Nifty 50 data it looks like:

Date Adj Close Open High Low Close Volume
Ticker ^NSEI ^NSEI ^NSEI ^NSEI ^NSEI ^NSEI

So why is this happening? and how do I remove the second row?

Ajay_Pawar · December 13, 2024, 1:39pm

try:

df.columns = df.columns.droplevel(0)

Amish_Shah · December 14, 2024, 12:52pm

Yes it is working now. Thank you