MLNeurons Logo

MLNeurons Artificial Intelligence 🤖, Machine Learning ⚙

Fintech
Algorithmic trading and machine learning †
Bird's eye view

Starting with basic definitions, machine learning (ML) is to learn rules or patterns from data to achieve a goal such as minimizing a prediction error; factor is a quantifiable signal, attribute, or any variable that has historically correlated with future stock returns and is expected to remain correlated in the future; alpha is portfolio return in excess of the benchmark used for evaluation; alpha factors extract signals from data to predict returns for a given investment universe over the trading horizon; ECNs are automated alternative trading systems (ATS) that match buy-and-sell orders at specified prices, primarily for equities and currencies, and are registered as broker-dealers; dark pools are another type of private ATS that allows institutional investors to trade large orders without publishing pre-trade bids and offers, and publishing trade prices only some time after execution; beta tells us how risky an asset is relative to the market by measuring the asset's sensitivity to market movements; algorithmic trading is the use of algorithms to automatically make trading decisions, submit orders, and manage those orders after submission; and high-frequency trading (HFT) is a subset of algorithmic trading that uses sophisticated algorithms to make trading decisions in milliseconds or microseconds.

The discovery and successful forecasting of risk factors that, either individually or in combination with other risk factors, significantly impact future asset returns across asset classes is a key driver of the surge in ML in the investment industry.

Informational advantage

Hedge funds have long looked for alpha through informational advantage and the ability to uncover new uncorrelated signals. Historically, this included things such as proprietary surveys of shoppers, or of voters ahead of elections or referendums.

In contrast, the informational advantage from exploiting conventional and alternative data sources using ML is not related to expert and industry networks or access to corporate management, but rather the ability to collect large quantities of very diverse data sources and analyze them in real time.

Conventional data includes economic statistics, trading data, or corporate reports. Alternative data is much broader and includes sources such as satellite images, credit card sales, sentiment analysis, mobile geolocation data, and website scraping, as well as the conversion of data generated in the ordinary course of business into valuable intelligence. It includes, in principle, any data source containing (potential) trading signals.

For instance, data from an insurance company on the sales of new car insurance policies captures not only the volumes of new car sales but can be broken down into brands or geographies. Many vendors scrape websites for valuable data, ranging from app downloads and user reviews to airline and hotel bookings. Social media sites can also be scraped for hints on consumer views and trends.

Typically, the datasets are large and require storage, access, and analysis using scalable data solutions for parallel processing, such as Hadoop and Spark.

Real-time insights into a company's prospects, long before their results are released, can be gleaned from a decline in job listings on its website, the internal rating of its chief executive by employees on the recruitment site Glassdoor, or a dip in the average price of clothes on its website. Such information can be combined with satellite images of car parks and geolocation data from mobile phones that indicate how many people are visiting stores. On the other hand, strategic moves can be learned from a jump in job postings for specific functional areas or in certain geographies.

Among the most valuable sources is data that directly reveals consumer expenditures, with credit card information as a primary source. This data offers only a partial view of sales trends, but it can offer vital insights when combined with other data.

More recently, several algorithmic trading firms have begun to offer investment platforms that provide access to data and a programming environment to crowdsource risk factors that become part of an investment strategy or entire trading algorithms.

ML based trading strategy

A ML-based strategy is driven by data sources that contain predictive signals for the target universe and strategy, which, after suitable preprocessing and feature engineering, permit an ML model to predict asset returns or other strategy inputs. The model predictions, in turn, translate into buy or sell orders based on human discretion or automated rules, which in turn may be manually encoded or learned by another ML algorithm in an end-to-end approach.

Strategy backtesting

Incorporating an investment idea into a real-life algorithmic strategy implies a significant risk that requires extensive empirical tests with the goal of rejecting the idea based on its performance in alternative out-of-sample market scenarios. Testing may involve simulated data to capture scenarios deemed possible but not reflected in historic data.

ML uses cases for trading

ML can be used to identify patterns, extract data and generate insights in data; supervised learning can be used to generate risk factors or alphas and create trade ideas; algorithms can allocate assets based on risk profiles; and reinforcement learning can be used to optimize trading strategies.

Sudhir Shetty, Mar 09 2025.
† Significant portions adapted from Packt GitHub: Machine Learning for Algorithmic Trading (2nd Edition) .