Data mismatch between Fundamental data sources

Shreyas_Balakrishna · February 27, 2025, 7:13pm

Course Name: [Data

Hi,

I tried to compare the fundamental data with two to three data sources and I see differences. Please let me know how do we handle these cases?
There are many missing values in the fundamental data in many features(more than 10-20%). Please let me know how to handle this?

Regards
Shreyas Balakrishna

Ajay_Pawar · February 28, 2025, 10:42am

Hi Shreyas,

For each fundamental metric (e.g., revenue, EPS) for a single company and period:

If Ratio > Threshold, verify with official filings.
If Ratio ≤ Threshold, take the mean of available values as a reasonable estimate.

Shreyas_Balakrishna · February 28, 2025, 10:59am

Ok thanks

Ajay_Pawar · February 28, 2025, 11:37am

Hi Shreyas,

for query 2:
If fundamental data has many missing values, here’s how to handle it:

Try to update missing values using official sources.
Perform exploratory data analysis (EDA) to look for patterns:
- Which companies have missing values?
- Are there specific periods when this happens?
- Are certain features consistently missing across companies?
If a feature has more than X% missing data, consider dropping it unless it’s crucial.
Forward fill (ffill) is not ideal for this case.
Some models like XGBoost can handle missing values internally, so imputation may not always be necessary.
Create a custom score using the Z-score approach within the same period across the industry.
- Standardize key metrics (e.g., ROE, EBITDA margin) using Z-scores.
- Skip NA values and compute the average Z-score of available metrics
Consider using:

from sklearn.impute import IterativeImputer

When imputing, include additional features like:
- % change in stock price for the respective period (e.g., quarter).
- Industry averages for the same and other related metrics.
- Financial ratios to improve imputation accuracy.

Also, the best approach depends on the model you’re building.

Shreyas_Balakrishna · February 28, 2025, 12:03pm

Sure. Thanks a lot!