Data & Feature Engineering for Trading
₹10150/-₹40599/-
75% OFF
Get for ₹9135 with Course Bundle
- Skills Covered
- Learning Track
- Prerequisites
- Syllabus
- About author
- Testimonials
- Faqs
Skills Covered
Data Engineering
- Financial data cleaning
- Exploratory data analysis
- Data types nuances
- Data merging
- Survivorship & Look-ahead Bias
Feature Engineering
- Triple barrier method
- Dollar and volume bars
- Information bars
- Stationarity
- Fractional differentiation
Python
- Itertools
- Numpy
- Pandas
- Matplotlib
- Pickle

Course Features
- Community
Faculty Support on Community
- Interactive Coding Exercises
Interactive Coding Practice
- Get Certified
Get Certified
learning track 4
This course is a part of the Learning Track: Machine Learning & Deep Learning in Trading Beginners
Course Fees
Full Learning Track
These courses are specially curated to help you with end-to-end learning of the subject.
Prerequisites
You should be familiar with basic machine learning principles such as train and test datasets. There are no prerequisites as such and anyone who is familiar with financial markets data can enroll in the course.
After this course you’ll be able to
- Preprocess price data to resolve outliers, duplicate values, multiple stock classes, survivorship bias, and look-ahead bias issues.
- Work with sentiment data to identify structural break and aggregate categorical features.
- Examine fundamental data and resolve multiple data merging issues.
- Create features and target variables for machine learning models.
- Explain various challenges associated with the financial data
Syllabus
- Introduction to the CourseIn this introductory section, you will learn the importance of data engineering and feature engineering which can be used either in your personal trading or in an institutional setting. Preprocessing of the financial dataset is essential to make it suitable for analysis. Extracting features from the datasets to feed into the machine learning algorithms, and setting the target variable for a particular ML problem increases the predictive power of your algorithm.
Challenges in Financial Data Engineering
Most of the time, trading strategies look great while backtesting but fail to live up to the expectations during live practice. Incorrect financial data has the potential to produce inaccurate inferences. Failure in identifying the flaws in data makes it completely useless. Learn the six most common challenges in financial datasets.Exploratory Data Analysis in Finance
Exploring the data helps to build familiarity with the data. After exploring the data, you will be able to describe what’s in the data and the characteristics of the data. It also helps you to identify the irregularities and anomalies and to discover the patterns and relationships in the data.Closer Look At the Data2m 4sImportance of EDA2mPython Pickle2mAdjusted Close Price2mHow to Use Jupyter Notebook?1m 54sWorking With Pickle File5mExamining the OHLCV Data10mRead a Pickle File5mFind Null Values5mGenerate Descriptive Statistics5mIrregularities2m 56sStock Classes2mMinimum Value of Adjusted Close2mDataframe Profiling2mTest on Challenges in Data engineering and Exploratory Analysis14mSurvivorship Bias for Stock Data
We often backtest on the stock universe that survived until today and ignore the stocks that no longer exist. This causes survivorship bias in the backtesting. In this section, you will learn the concept of survivorship bias, why it is important to use survivorship bias-free data in the backtesting, and how to deal with it. Also, learn to identify delisted stocks from the stock universe.- Redundant Stocks DataLearn to check for data redundancy. It is highly unlikely that two stocks or financial instruments will have the same prices across many dates. It can occur on a few dates coincidentally, but if it occurs across many numbers of dates and consecutively then something might be wrong with the data.Dealing With Redundant Stocks2m 43sEffects of Redundant Data2mSteps to Find Redundant Data2mHandling Duplicate Stock Data10mCreate Stock Pairs5mCompare the Stock Prices5mCalculate Number of Duplicates5mReasons for Redundancy2m
- Multiple Stock Classes: One or All?A listed company can issue stock with multiple classes. These stock classes have different voting rights. Learn whether you should keep the data for all the stock classes or one. If one then which stock class to keep and which to remove.Dealing With Multiple Stock Classes2m 1sOne Stock Class2mStock Class2mRetain All Classes2mMultiple Stock Classes10mIdentify Stock Classes5mUnique Symbols5m
- Outliers: How to Identify and Deal With Them?In this section, we talk about the outliers. An outlier is a data point that is significantly different from other data points. It can be due to data quality issues or can be real. Learn how to identify and deal with outliers.Dealing With Outliers2mOutliers2mDealing With Outliers2mInflated Profits2mDealing With Outliers10mNumber of Trading Days With Zero Volume5mSort Dataframe by Returns5mTest on Pre-Processing of Data14m
News Data: Numerical Features
This section covers how news data can be sourced, within the notebook, via webhose.ioOverview of the News Data1m 20sNumerical Features2m 34sRelevance2mNovelty2mCombine Numerical Features2m 15sCombine Numerical Features2mCalculate Feature Score2mAggregate News Items Daily2mNumerical Features10mCalculate Feature Score5mCalculate Trading Date for Each Headline5mCalculate Daily Feature Score5m- News Data: Categorical FeaturesCategorical Features2m 9sCategorical Features2mOne-Hot Encoding2mAggregating Categorical Attributes2m 16sAggregate Categorical Features2mIssues With Mean Aggregation2mLimitations of One-Hot Encoding2mAggregating Categorical Features10mOne-Hot Encoding5mAggregate Using Mean5mRecap1m 44s
- Structural Breaks in Financial DataSometimes there is an unexpected and prolonged change in the structure of the time-series data. This leads to a structural break. Learn to identify structural breaks in the sentiment data and list the probable solutions to deal with that.Structural Breaks2m 48sStructural Breaks in Time Series Data2mDealing With Structural Breaks2mEffects of Structural Breaks2mTest on News Data and Structural Breaks16m
- Fundamental Data: Merge Them CorrectlyThis section covers the merging of fundamental data of two popular data sources, sharader and WSH. Although these sources are not free, the notebook also elaborates on what the data looks like and how to parse it.Precap of Fundamental Data2m 58sSources of Fundamental Data3m 18sSharadar Data2mAnnouncement and Filing Date2mActual Vs Expected Earnings Date2mSharadar Data10mDimension Fields2mWhy Dimension Fields?2mWall Street Horizon Data10mWhich Format?2mExamining the Data2m 18sChallenges in the Datasets2mIdentify the Issues2mMultiple EPS Values2mChallenges in Merging Dataset1m 5sCommon Tickers2mInvestigate the Issues2mTest on Merging of Fundamental Data16m
Look-ahead Bias: Deceptive Returns
Get introduced to the issues of and scenarios where data from the future is used for backtesting. This leads to deceptive returns while testing. Learn about ways to get around this ubiquitous bias or problem.Futures Prerequisite10mFutures Contract2mMargin Requirements2mSettlement Price2mRoll Return2mCalculate the Roll Returns2mLook-ahead Bias in Futures3m 9sWhat is Look Ahead Bias?2mGood Results2mFutures' Mean2mRemove Bias2mCalendar Spread Strategy Prerequisite10mCalendar Spread2mDisadvantages of CS Strategy2mLook-ahead Bias in CS Strategy2mProblem With Two Instruments2mSolving the Problem2mIlliquid Futures2mBid-Ask Time Quote2mLiquid Futures2m- Types of Bars: Features ExtractionThe market transaction data can be sampled in a variety of ways. For example, time, number, volume and value of transactions are different data features that can be used. But some ways might be more useful than others. Get introduced to the criteria which can be used to sample the transaction data. Also, learn about how these bars differ in their statistical properties.Tick and Time Bars3m 20sTrue for a Bar2mDifference Between Time and Tick Bar2mLimitation of Time Bar2mCreating Time Bars10mResample Price Data5mCalculate Open Price of Time Bar5mCalculate Total Volume of Bars5mCreating Tick Bars10mAggregate Price Data5mAggregate Volume Data5mVolume Bars2mLimitations of Volume Bars2mVolume Bar2mHigh Value of Volume Bar2mLimitation of Tick Bar2mCreating Volume Bars10mCreate New Group ID5mDollar Bars1m 48sWhat Are Dollar Bars?2mIdentical Bars2mAdvantages of Dollar Bars2mCreating Dollar Bars10m
- Information Bars: Market Order ImbalancesIn this section, you will get introduced to some of the advanced ways used to sample transaction data based on market order imbalances. You will also learn market imbalances and run bars and its implementation.Information Bars2m 22sMeasure of Information2mImbalance Bar2mDifference Between Run and Tick Bars2mImbalance Bars10mCalculate Rolling Imbalance5mAdditional Reading10mTypes of Bars12m
- Data Labelling for Better OutcomesSupervised machine learning algorithms need either of the two, input and a label to learn nuances of real data. In financial time series, the input is generally a window of price data. Whereas, ground truth or labels need to be explicitly generated based on the position that needs to be taken. Learn various methods like fixed time-horizon and triple barrier methods that can be used to label your data.Fixed-Time Horizon3m 28sML Paradigm Labelling2mLabelling Fixed Threshold2mThe Fixed-Time Horizon Method10mCalculate Future Returns5mLabelling the Target Class2mCalculate Daily Returns5mCalculate Rolling Standard Deviation5mTriple Barrier Method2mFixed-Horizon V/s Triple-Barrier2mCalculating Horizontal Bars2mHorizontal Bars and Volatility2mFinding the Target Class2mVertical Bar2mThe Triple Barrier Method10mCalculate Daily Returns5mCall Triple Barrier Method5m
- Why Stationary Features?The right input into a machine learning model can make all the difference in the world. Learn about the need for stationary features. Decipher the price level information vs stationarity tradeoff. Learn about fractional differentiation to create effective features.Dealing With Features Selection3m 10sPrice Series2mSeries Stationarity2mInferential analysis5mAdjusted Close Price2mFractional Differentiation10mConcept of Fractional Differentiation5mDilemma5mCalculating Binomial Distribution Weights2mCalculate the ADF Statistics5mTest on Data Labelling and Stationary Features14m
- Run Codes Locally on Your MachineLearn to install the Python environment in your local machine.Uninterrupted Learning Journey with Quantra2mPython Installation Overview2m 18sFlow Diagram10mInstall Anaconda on Windows10mInstall Anaconda on Mac10mKnow your Current Environment2mTroubleshooting Anaconda Installation Problems10mCreating a Python Environment10mChanging Environments2mQuantra Environment2mTroubleshooting Tips For Setting Up Environment10mHow to Run Files in Downloadable Section?10mTroubleshooting For Running Files in Downloadable Section10m
- SummaryThis section consists of the summary of the course along with the downloadable files which include the data modules as well as the strategy notebooks.Summary2m 50sDownloadable Code2m
Registered Successfully!
You will receive webinar joining details on your registered email
Would you like to start learning immediately?
about author



Why quantra®?
- More in Less Time
Gain more in less time
- Expert Faculty
Get taught by practitioners
- Self-paced
Learn at your own pace
- Data & Strategy Models
Get data & strategy models to practice on your own
learning experience

Faqs
- When will I have access to the course content, including videos and strategies?
You will gain access to the entire course content including videos and strategies, as soon as you complete the payment and successfully enroll in the course.
- Will I get a certificate at the completion of the course?
Yes, you will be awarded with a certification from QuantInsti after successfully completing the online learning units.
- Are there any webinars, live or classroom sessions available in the course?
No, there are no live or classroom sessions in the course. You can ask your queries on community and get responses from fellow learners and faculty members.
- Is there any support available after I purchase the course?
Yes, you can ask your queries related to the course on the community: https://quantra.quantinsti.com/community
- What are the system requirements to do this course?
Fast-speed internet connection and a browser application are required for this course. For best experience, use Chrome.
- What is the admission criteria?
There is no admission criterion. You are recommended to go through the prerequisites section and be aware of skill sets gained and required to learn most from the course.
- Is there a refund available?
We respect your time, and hence, we offer concise but effective short-term courses created under professional guidance. We try to offer the most value within the shortest time. There are a few courses on Quantra which are free of cost. Please check the price of the course before enrolling in it. Once a purchase is made, we offer complete course content. For paid courses, we follow a 'no refund' policy.
- Is the course downloadable?
Some of the course material is downloadable such as Python notebooks with strategy codes. We also guide you how to use these codes on your own system to practice further.
- Can the python strategies provided in the course be immediately used for trading?
We focus on teaching these quantitative and machine learning techniques and how learners can use them for developing their own strategies. You may or may not be able to directly use them in your own system. Please do note that we are not advising or offering any trading/investment services. The strategies are used for learning & understanding purposes and we don't take any responsibility for the performance or any profit or losses that using these techniques results in.
- I want to develop my own algorithmic trading strategy. Can I use a Quantra course notebook for the same?
Quantra environment is a zero-installation solution to get beginners to start off with coding in Python. While learning you won't have to download or install anything! However, if you wish to later implement the learning on your system, you can definitely do that. All the notebooks in the Quantra portal are available for download at the end of each course and they can be run in the local system just the same as they run in the portal. The user can modify/tweak/rework all such code files as per his need. We encourage you to implement different concepts learnt from different learning tracks into your trading strategy to make it more suited to the real-world scenario.
- If I plug in the Quantra code to my trading system, am I sure to make money?
No. We provide you guidance on how to create strategy using different techniques and indicators, but no strategy is plug and play. A lot of effort is required to backtest any strategy, after which we fine-tune the strategy parameters and see the performance on paper trading before we finally implement the live execution of trades.
- Do you need to have knowledge of coding in order to learn through Quantra courses?
You can learn with or without coding knowledge. If you would like to do the analysis on excel, we would suggest you to start with course on Statistical Arbitrage in Trading. You can create and test your trading strategies using excel.
Alternatively, you can do the course on Python for Trading which will help you gain knowledge in all these fields: Python, Analysis and Financial markets. - What does "lifetime access" mean?
Lifetime access means that once you enroll in the course, you will have unlimited access to all course materials, including videos, resources, readings, and other learning materials for as long as the course remains available online. There are no time limits or expiration dates on your access, allowing you to learn at your own pace and revisit the content whenever you need it, even after you've completed the course. It's important to note that "lifetime" refers to the lifetime of the course itself—if the platform or course is discontinued for any reason, we will inform you in advance. This will allow you enough time to download or access any course materials you need for future use.
- What is Data Engineering?
Pipelines are needed which create production-ready high-quality data which can be used for planning and getting insights. Such pipelines or processes that collectively help achieve this are said to be performing Data Engineering. As the name suggests, we are engineering or modifying data so that better insights can be garnered. Many types of data engineering methods specific to financial data are covered in various sections of this course.
- What is Survivorship Bias and why is it dangerous?
Generally, Survivorship Bias is the error of looking at entities that survived a situation while overlooking the ones that did not. After that, making strategies and taking decisions based on this error. This often happens while looking at a group of listed stocks which did well for a period of time while ignoring the ones that didn't and making strategies for future periods based on this logical error.
This has been covered in section 6 in detail. - What is alternative data in finance?
Alternative data in finance is any non-market data set which can be used to enhance market prediction accuracy. For example, data such as news data which isn't price or market data is increasingly being used to predict the price movement of financial instruments. News data which is alternate data has been covered in section 8 and 9 of this course.
- How to find outliers in data?
Technically, outliers are data points that are abnormally away from other points in the population or dataset. They might affect the predictive power of the model given the algorithm used to create the model. There are a couple of ways they can be identified and removed. One way is to calculate the z-score of the particular point to find the likelihood of the point belonging to a given population. Another way is to find the distance of the data point from a chosen function of the inter-quartile range.
- How to get data for sentiment analysis?
Data for sentiment analysis can be procured from online vendors like webhose. Similar data can also be scraped from social media sites like Twitter etc. It can also be collected on the fly.
This has been covered in section 8 and 9 of this course. - How to analyse fundamental data?
Fundamental analysis is a method of measuring an instrument's value by examining related economic and financial factors. This is done by studying macroeconomic factors such as the state of the economy and industry conditions to microeconomic factors like the earnings of the company and its sector. Various indicators are used for the same like P/E ratio etc.
This has been covered in section 7 of this course. - What is the difference between technical and fundamental analysis?
Fundamental analysis refers to the analysis of financial aspects of business like financial statements, financial ratios and other factors like economics and others affecting the business to analyze the fair market value of its share/security. Technical analysis refers to the analysis of share/security fair price by examining and analyzing the past trends and changes in the price of shares and by studying historical information of the business.
This has been covered in various sections of this course. - What is look-ahead bias?
Look-ahead bias occurs by using data in a backtest or simulation that wouldn't be available during the period being analysed in the test or simulation. This often leads to good results in the test or simulation which aren't correct.
This is covered in section 12 of this course.