Course Name: [Data
Hi,
-
I tried to compare the fundamental data with two to three data sources and I see differences. Please let me know how do we handle these cases?
-
There are many missing values in the fundamental data in many features(more than 10-20%). Please let me know how to handle this?
- Does forward filling help?
Regards
Shreyas Balakrishna
Ajay_Pawar
(Ajay Pawar)
2
Hi Shreyas,
Approach to Resolve Discrepancies
A. Prioritize Official Sources (If Accuracy is Critical)
- Verify with SEC filings (10-K, 10-Q) or equivalent official reports.
- If discrepancies exist, trust official filings over third-party data vendors.
B. Ballpark Estimate (If Precision is Not Critical, e.g., for ML modeling or quick analysis)
For each fundamental metric (e.g., revenue, EPS) for a single company and period:
- Gather fundamental values from different sources (typically 3-4 values).
- Compute the max/min ratio:
- Set a threshold (e.g., 1.1 or 1.2 based on acceptable variance).
- If Ratio > Threshold, verify with official filings.
- If Ratio ≤ Threshold, take the mean of available values as a reasonable estimate.
Ajay_Pawar
(Ajay Pawar)
4
Hi Shreyas,
for query 2:
If fundamental data has many missing values, here’s how to handle it:
-
Try to update missing values using official sources.
-
Perform exploratory data analysis (EDA) to look for patterns:
- Which companies have missing values?
- Are there specific periods when this happens?
- Are certain features consistently missing across companies?
-
If a feature has more than X% missing data, consider dropping it unless it’s crucial.
-
Forward fill (ffill) is not ideal for this case.
-
Some models like XGBoost can handle missing values internally, so imputation may not always be necessary.
-
Create a custom score using the Z-score approach within the same period across the industry.
- Standardize key metrics (e.g., ROE, EBITDA margin) using Z-scores.
- Skip NA values and compute the average Z-score of available metrics
-
Consider using:
from sklearn.impute import IterativeImputer
- When imputing, include additional features like:
- % change in stock price for the respective period (e.g., quarter).
- Industry averages for the same and other related metrics.
- Financial ratios to improve imputation accuracy.
Also, the best approach depends on the model you’re building.