Menu

Inside Tink: changing our approach to outlier detection

We get into the backstage of Tink tech to learn more about how our teams are tackling challenges and continuously upgrading our solutions – and ways of thinking. Here, Data Scientist Eliisabet Hein explores why Tink’s Enrichment Categorisation team uses outlier detection to filter out transactions with unusually large amounts – and how they improved performance by changing their approach.

TL;DR – Quick summary

  • At Tink, we assign a category to every transaction we process – and our Enrichment team has a goal to ensure that we use the best possible models for categorisation.

  • Accuracy by amount was an important metric for the team – but it’s very sensitive to outliers.

  • Eliisabet Hein, Data Scientist at Tink goes through how they changed their approach to outlier detection – and how it improved evaluation scores.

Inside Tink: changing our approach to outlier detection

By Eliisabet Hein

At Tink, we assign a category to every transaction we process. These categories can be used by end users to set budgets and manage their finances, and feed into Tink’s and our customers’ other products.

We pass the text description attached to the transaction through a machine learning model to identify the ‘domain’ in which the purchase was made – for example, grabbing a coffee at a convenience store should be categorised as Coffee & Snacks, while your monthly electricity bill should go under Utilities.

Our goal in the Enrichment Categorisation team is to ensure that we use the best possible models for categorisation. For this, we use a variety of different metrics to evaluate and compare the quality of different models. One of these metrics is accuracy by amount. You might be familiar with the standard definition of accuracy (which is another metric we use):

Inside Tink: changing our approach to outlier detection

To compute accuracy by amount, we simply weigh each transaction by its corresponding amount:

Inside Tink: changing our approach to outlier detection

This metric allows us to measure how good we are at identifying transactions with higher amounts, like your monthly rent, mortgage and other loan payments. These transactions statistically occur less frequently than lower-amount daily purchases for things like groceries, and coffee. However, the effect on the user’s budgeting of getting this one high-impact transaction wrong is larger than miscategorising a single small purchase, which is why accuracy by amount is an important component of our evaluation.

The outlier problem

If you’re a data scientist, or simply very sharp-eyed, you might be asking yourself: ‘Wait a minute, isn’t this metric very sensitive to outliers?’ Yep. It is. Let’s look at an example:

Imagine that you have 100 transactions in the ‘Mortgage’ category, 99 of which represent monthly installments falling uniformly in a (hypothetical) range of 250 to 2500 EUR. However, in the same category, you have a single transaction for 10,000 EUR, which represents someone paying their remaining mortgage in a single installment.

This one transaction could account for 20% or more of the total amounts. If we want to measure accuracy by amount, the result will be very heavily influenced by whether we got this transaction correct or not. We might even decide whether we replace the existing model with a new one or not based solely on the prediction on this one transaction.

This is why we need outlier detection to find these transactions where the amounts fall outside the normal range for a given category, and filter them out before we perform any evaluation. If we plot this dataset (see below), a human observer would easily be able to identify the outlier, but how can we make our code perform the same analysis automatically?

Inside Tink: changing our approach to outlier detection

How we used to solve it

Our old approach to solving this problem was removing all amounts above the 99.5th percentile. This means that we find the amount in our dataset that is larger than 99.5% of all amounts in the set, and remove all points larger than this. Because we cannot remove only part of a data point, we round up and always remove at least one data point. The number of outliers we remove scales with the amount of data we have (the exact formula is ⌈n x 0.005⌉ where n is the number of data points).

Let’s look at this method in practice. In our earlier example with 100 mortgage transactions, we would remove one data point from the higher end of amounts, which in this case is our outlier – so now we have a clean dataset to use for evaluation. Success!

Inside Tink: changing our approach to outlier detection


The percentile p that we select encodes our assumption that outliers occur with probability (1-p). We selected the cutoff point of 99.5th percentile because we found that it usually worked well empirically with larger test sets, and was even an overestimate (we removed more data points than there were outliers, which is preferable to allowing outliers to slip through).

However, as you might have already noticed, this method has a crucial flaw when the dataset size is small.

As we saw before, if we have a smaller category such as our Mortgage example with only 50 data points, we always assume that there is at most one outlier.

Now, imagine if we had an additional outlier in this set with a transaction amount of 7,500 EUR. Our method would fail:

Inside Tink: changing our approach to outlier detection


So how can we be better at detecting outliers without depending as much on the dataset size or hard-coding any assumptions about their occurrence frequency?

How we solve it now

After our model evaluation score dropped by more than 10% from one week to the next because we failed to remove an outlier, we set out to improve our algorithm.

After considering a range of alternatives, we turned to the interquartile range (IQR). This is a simple and intuitive way to detect points that are outside the ‘norm’ for a given dataset. Here’s how that works.

The interquartile range is defined as the distance between by the 25th and 75th percentiles of the dataset, or the first and third quartile. Here’s what that looks like:

Inside Tink: changing our approach to outlier detection

Outliers are values that are more than a specified distance outside the IQR. The cutoff in the positive direction (which is what we are interested in here, although we could also have outliers in lower end of amounts) can be found using the following formula:

Inside Tink: changing our approach to outlier detection

The default in most applications is to use k=1.5 (derived from the normal distribution), but any value can be used to allow for more or less ‘slack’ – essentially, how much we allow the value to deviate from the IQR before considering it an outlier.

We use a higher value of k in production to compensate for the fact that real data is usually more noisy with outliers on the lower end of the distribution, and doesn't always follow a neat distribution.

Let’s go back to our example dataset to see what this would mean (with k=1.5):

Inside Tink: changing our approach to outlier detection

And for the example with two outliers:

Inside Tink: changing our approach to outlier detection

As you can see, this method works equally well for more outliers, and the cutoff is not affected significantly by adding another outlier to the dataset.

IQR has a breakdown point of 25%, which means that up to 25% of data points can be outliers before this method begins to fail (we will not cover the mathematical proof here, but it can be easily found by anyone who wants to dig into the details). This easily makes it sufficiently robust for our needs.

Compared to the percentile-based method, we are also much less likely to have false positives – which would remove amounts that are not actually outliers.

Seeing improvements

We have talked about why outlier detection is crucial to accurately evaluate the quality of our categorisation, and seen why it is important to choose a sufficiently flexible and robust approach with fewer assumptions encoded in the algorithm itself.

Since implementing the interquartile range method, we have been able to track the performance of our models with more stability, and can have more confidence in the evaluation scores.

TL;DR – Quick summary

  • At Tink, we assign a category to every transaction we process – and our Enrichment team has a goal to ensure that we use the best possible models for categorisation.

  • Accuracy by amount was an important metric for the team – but it’s very sensitive to outliers.

  • Eliisabet Hein, Data Scientist at Tink goes through how they changed their approach to outlier detection – and how it improved evaluation scores.

More in Inside Tink

Startup stories
2020-10-22 · 5 min read

Introducing ‘Startup stories’ – and reminiscing on Tink’s own journey

The life of a fledgling startup is rarely glamorous – but it is exhilarating. To celebrate entrepreneurship, we’ll share a few startup stories over the coming months – starting with Tink’s own, as described by an OG Tinker.

Inside Tink
Tink in the DACH region
2020-05-12 · 5 min read

Taking open banking up a notch in the DACH region

Our new Regional Director for DACH has his sights set on making the financial actors in the region the stars of the open banking show - by helping them create the slick digital experiences people are beginning to expect.

Inside Tink
Henry Kupty (left) and Jon Nilsson (right)
2020-03-20 · 7 min read

Talking with Tinkers: how to build the PFM of tomorrow

How do you build the future of financial services? We had a chat with Tinkers Jon Nilsson and Henry Kupty from the finance management team to learn more about how they work to transform financial data into delightful user experiences.

Inside Tink

Get started with Tink

Contact our team to learn more about what we can help you build – or create an account to get started right away.

Contact our team to learn more about our premium solutions or create a free account to get started right away.