Using unsupervised machine learning to label FT articles

Image for post
Image for post
Photo by Pietro Jeng on Unsplash

Why article clustering is important

FT is one of the largest providers of financial news in the world. We publish hundreds of articles every day. One of the most challenging tasks is consistently categorising these articles.

While FT journalists tag the articles manually, it is hard to ensure that similar articles will have the same tag. Having consistent labels attached to articles is very important when we want to use them for machine learning models and analysis of customer reading trends.

Labeling problem

To make these classifications, the only data available to us is article text. …

Image for post
Image for post
Photo by Chris Liverani on Unsplash

This story explains how to implement the moving average trading algorithm with R. If you’re interested in setting up your automated trading pipeline, you should first read this article. This story is a purely technical guide focusing on programming and statistics, not financial advice.

Throughout this story, we will build an R function which takes historical stock data and arbitrary threshold as inputs and based on it decides whether it is a good time to purchase given stock. We will look at Apple stocks. This article may require a certain level of statistical knowledge. …

Image for post
Image for post
Photo by Jason Briscoe on Unsplash

This article explains how to create a trading pipeline using R. The trading pipeline consists of 4 main elements

  • Connecting with Google API and loading current holdings data
  • Connecting with Robinhood API and getting current stock prices
  • Getting historic market data using Yahoo API
  • Decision algorithm and executing an order

Image for post
Image for post
Photo by Annie Spratt on Unsplash

Documentation is an important part of being a data scientist. I propose to use R Markdown and LaTeX to document data science models.

There are several desirable properties of good documentation

  • easily accessible
  • readable
  • consistent across projects
  • reproducible

Using LaTeX inside R Markdown allows users to use consistent LaTeX formatting across numerous project, write professional mathematical formulas explaining given model, consistently reference figures/articles, and dynamically produce graphs from outputs of the model.

LaTeX is a document preparation system originally intended for academics to introduce consistency across formatting of scientific publications. …

Adam Gajtkowski

Data Scientist. Holding degrees in economics, econometrics, and statistics. Employed in news industry.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store