Beating Benchmark Indices with Machine Learning

Posted by

In less than two weeks, on May 11, many business and technical decision makers will gather at 2ML: Madrid Machine Learning in Madrid, Spain, to discover the impact that ML is going to have on their business. This ML event will be divided in two parts, keynotes before lunch time and four specific ML vertical oriented sessions after the lunch break, including: Finance, Telecom and Technology, Marketing, Sales, Sports, and Industry.

This guest post, written by Michael Allan Stafne, Co-Founder and Vice-President of Stats4Trade, and Jean-Marc Guillard, CEO at Stats4Trade, reveals some highlights of their presentation. If you are curious to hear the full story, join 2ML to attend the event and discover all the details of how S4T applies ML to help active investment fund managers select stocks and make buy/sell decisions using a new software-driven approach.

Imagine that you are a typical private investor. You wish to invest some of your money in stocks. However you are risk-averse. Therefore you want to minimize volatility and any potential losses while maximizing gains. What options are available to you?

Well, you could simply adopt a buy-and-hold approach invest your money in a low-cost passive fund that tracks a broad index. Such funds are cheap because they simply mimic indices’ portfolios. Or you could invest your money in a traditional active fund that strives to beat a benchmark index. The managers of such funds use a combination of experience and research in an attempt to select winning stocks. Finally you could adopt a “do-it-yourself” approach and trade stocks with your own expertise and research.

Nonetheless, and like any investment approach, each of these options comes with trade-offs. For example, the passive approach with index funds assumes ever-increasing markets over time; however this is not always a viable assumption in Japanese and some European markets. On the other hand the two active approaches rely on humans. Unfortunately though, humans (even seasoned fund managers) are frequently emotional, subjective and hence irrational, which often leads to poor investment decisions. For proof, just look at the rather dismal performance of actively-managed funds versus benchmarks. Moreover, active funds charge a wide variety of annual and one-time fees oftentimes quite exorbitant.

But what if there was another option that combines the diversification and low-cost of passive funds with the index-beating performance of active funds – all the while allowing private investors the freedom to design and simulate their own data-driven investment strategies and ultimately trade on their own? Thanks to modern Machine Learning tools and low-cost cloud-computing platforms, such an approach now exists.

The advent of modern Machine Learning technologies and cloud-based platforms now allows the development of data-driven software applications that automatically generate buy/sell signals for a wide range of equity markets. The applications allow individual investors the opportunity to design and simulate various investment scenarios and ultimately choose an optimal strategy that fits their risk-return needs. Investors can then choose to actively trade with no-fee brokerages.

The key to any Machine Learning approach is to statistically increase the odds of making a winning stock-trade from just above chance (e.g. 50%) to a higher level (say to at least 60%). Of course statistics implies uncertainty and some trades will indeed incur losses. Nonetheless, in the long-term with enough trades to become statistically relevant (at least thirty), this increase in odds yields better performance than benchmark indices with respect to both return and risk.

None of this is magic however. The process of developing trading applications that integrate machine-learning technologies requires much time in both acquiring and preparing input data and then building, training and testing data models that balance accuracy and consistency over different time-spans both short and long.

The general process starts with data acquisition in the form of historical price-data for stocks in a particular market like the Dow Jones 30 in the United States or the CAC 40 in France. Today price-data for many markets over multiple decades is widely available electronically at low cost and updated daily.

These price-data then undergo time-consuming transformation into time-implicit data-sets that convert the raw price-data into forms more conducive to multi-frequency statistical analyses. In fact the transformations are analog to the Fourier transformations that occur for digital signal analysis. During this step a balance must be struck between precision and consistency in the transformed data to prevent overfitting.

Next, statistical models are created using tools from recognized providers such as BigML. These models map the underlying and often obscure statistical relationships between the transformed price-data – called “training data” – and user-defined outputs such as stock-price movements over different forecast time-periods. In effect the models are trained to recognize the statistical relationships between prices at one period in time and price movements later in time.

Lastly, the models are tested against varying time-spans. The models are tested with “test data” in the form of price-data, which were not used to train the models. Overall the goal during testing is to refine the models with respect to precision and consistency over any time period and verify their validity. Only by conducting robust testing can the models be optimized for the two metrics, which most interest investors: return and risk (i.e., volatility and its statistical definition, variance).

The foregoing description for creating trading applications that embed Machine Learning technologies admittedly skims over several interesting technical details. However the ultimate objective for investors remains the same – namely increase the odds of making winning trades to at least 60%. Over time and with enough trades the relative frequency of gains to losses begins to increase. In doing so we are able to gradually and steadily outperform benchmark indices with higher returns and less risk as measured by variance.

Find out more details by watching this Stats4Trade – BigML Case Study:

Want to know more about ML in Finance?

Join us at #2ML17 on May 11 in Madrid, Spain, and discover the impact of Machine Learning in the financial sector presented by Stats4Trade. Other companies presenting their use cases will be ABN AMRO and Danske Bank, which we will also cover in upcoming blog posts.

If you have any questions about this use case and/or the Machine Learning behind it, please do not hesitate to contact BigML at or STATS4TRADE directly. We’re looking forward to hearing from you!

One comment

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s