In this blog post, we refresh one of the core yet challenging problems that quantitative investment teams face: the development of trading strategies by analyzing and forecasting financial time series. Whatever the strategy, algorithmic or fundamental, there are two intuitive goals we would like to achieve:

- Little to none missed trading opportunities;
- Little to none mistakes.

There is an inherent and delicate trade-off between such goals. Intuitively, we can achieve the first one by always making trades barring the costs of mistakes; similarly, we can always achieve the second goal by avoiding trades altogether. Clearly, neither option is optimal and here is where machine learning (ML) comes to help.

A successful ML model should thus identify the best spots for making a trade (trading opportunities), market directions to choose whether trading long or short (side), and the magnitude of the bets (size). Intuitively speaking, side decisions and sizing follow different logics and trying to model them together with just one ML model may be ill-suited.

Trading opportunities and positions’ sides require understanding of markets, whether following a quant or fundamental strategy, whereas knowing how much to expose ourselves in the trade — the size — refers to risk management logics. For instance, being successful on small trades, whereas making several mistakes on large ones may be detrimental. A one-size-fit-all model, which aims at predicting both side and size of a trade, may thus fail to strike a balance to the aforementioned goals. This often leads, in ML terms, to high Precision (fewer mistakes) at the cost of a low Recall (several missed trading opportunities).

## Artificial Intelligence benefits from the investor’s intuition

The solution is thus to start off by developing a **base primary strategy**, whether quant or fundamental. Without any lack of generality, we can call this base primary strategy as **primary model**. The goal of the primary model is to seize trading opportunities and market direction (side), with the goal of achieving a high Recall. Then, likely mistakes, or false positives, are filtered out leveraging a **secondary ML model**, increasing Precision and ultimately improving Sharpe ratio. This process, known as **meta-labeling** [1], allows traders to model sides and sizes separately.

Meta-labeling fits a *secondary ML model* that might be using the same category of data the primary model relies on as well as alternative data that help to identify successes and mistakes of the primary model.

“Confidence is not “I will profit on this trade.” Confidence is “I will be fine if I don’t profit from this trade.”

―Yvan Byeajee

Meta-labeling helps to filter out bets that would have likely yielded to mistakes (false positives), while sizing positions accordingly: the output of the model is the probability of a positive (profitable) outcome; the larger the probability, the larger the position size.

What seems to be a straightforward process is still permeated by a rather complex pipeline, which needs to collect raw data, compute bars to engineer features, identify suitable models, perform unbiased backtest assessments, offer opportunities to interpret and explain predictions, and monitor live deployments for execution purposes. Each stage has specific challenges that make FinML difficult yet exciting. Stay tuned for the next posts in which we will look at such stages in details.

[1] Advances in Financial Machine Learning. 2018

―Marcos Lopez de Prado.