Using unsupervised machine learning to label FT articles
FT is one of the largest providers of financial news in the world. We publish hundreds of articles every day. One of the most challenging tasks is consistently categorising these articles.
While FT journalists tag the articles manually, it is hard to ensure that similar articles will have the same tag. Having consistent labels attached to articles is very important when we want to use them for machine learning models and analysis of customer reading trends.
To make these classifications, the only data available to us is article text. …
Identifying signals using time-series analysis and unsupervised machine learning.
Understanding the preferences of Financial Times readers is crucial for improving user experience and maintaining engagement with our products. Having accurate indicators showing which area is increasingly important can augment journalists’ work, by helping them to focus on topics of interest.
Trending topics prediction is a data science model built using machine learning and time-series analysis. We define article topics by an unsupervised machine learning algorithm and use time-series analysis to flag anomalies in data.
Over time, different topics arise reflecting the changing interests in society. The streams of data we…
This story explains how to implement the moving average trading algorithm with R. If you’re interested in setting up your automated trading pipeline, you should first read this article. This story is a purely technical guide focusing on programming and statistics, not financial advice.
Throughout this story, we will build an R function which takes historical stock data and arbitrary threshold as inputs and based on it decides whether it is a good time to purchase given stock. We will look at Apple stocks. This article may require a certain level of statistical knowledge. …
This article explains how to create a trading pipeline using R. The trading pipeline consists of 4 main elements
Documentation is an important part of being a data scientist. I propose to use R Markdown and LaTeX to document data science models.
There are several desirable properties of good documentation
Using LaTeX inside R Markdown allows users to use consistent LaTeX formatting across numerous project, write professional mathematical formulas explaining given model, consistently reference figures/articles, and dynamically produce graphs from outputs of the model.
LaTeX is a document preparation system originally intended for academics to introduce consistency across formatting of scientific publications. …
Data Scientist. Holding degrees in economics, econometrics, and statistics. Employed in news industry.