Features that bring value to customers and to the banks. But they’re only possible if the transaction data is enriched with categorisation – and it’s only useful for customers if that categorisation is accurate.
The question has been whether to build a categorisation engine in-house or buy it. As it turns out, this question has an easy answer – but not because we have a vested interest in it. Great categorisation boils down to scale – and a tech partner has already tackled the challenges and opportunities that go along with it.
The more relevant question is what are your competitors doing? And they’re partnering with fintechs, who are becoming the standard when it comes to categorisation. There are myriad reasons, but here are the three key ones.
Read on to learn why scale is your friend – not foe – with a tech partner like Tink.
Machine learning vs. rules
If the goal is to build a categorisation engine in-house, you’ll likely start with a rule-based system, which creates a new rule for each merchant. So if a transaction contains the word “Starbucks”, it’s categorised as “coffee shop”. Rule-based systems are the first step because it’s what you can grasp when you’re building a proof of concept on a small scale with a small team.
Because of the complexity of transactions from multiple markets and languages – and the move in many European countries toward a cashless society – the number of transactions has grown exponentially, making it impossible for rule-based systems to correctly categorise information.
Instead, categorisation models are now based on machine learning that can leverage global data sets and text-processing algorithms to learn how to classify transactions accurately. With time, it gets smarter and hones its ability to adapt to a customer’s preferences.
Unlike a rule-based system, machine learning can categorise a transaction it’s never seen. It’s what we use here at Tink. We take the data from that transaction – and others – to offer a probability of it belonging to a certain category. Then we apply a user’s preferences for certain transactions. For example, they might eat at the same pub for lunch every week, so for them that transaction is “restaurant” and not “bar”. And so it goes for every transaction passing through our platform – about 10 million per day.
With machine learning, it doesn’t matter how many merchants enter the marketplace because the system can learn on its own and keep pace with the increasing scale. Rule-based systems simply can’t compete.
Building the right training set
For any machine-learning algorithm to work well, it needs data from which to learn. One source of data is a list of tens of thousands of transaction descriptions matched with the right category – and it needs to be compiled by people manually so it reflects how we think. It’s then given to the algorithm so it can learn to replicate that intelligence automatically – even for transactions it’s never seen.
From our experience, a great training set has tens to hundreds of thousands of categorised transactions. Without the proper tool, it can take weeks to compile. At Tink, we can build one in about 20 hours.
It’s because we have developed tools that allow dozens of people to compile data simultaneously. But this doesn’t just translate to speed; it also increases the quality of the training set. This kind of repetitive manual work is draining and can lead to mistakes, so we spread out the load. We also review the data twice to make sure the category selection is unanimous.
Collecting a good training data - both in terms of quality and size - is a true challenge of scale that requires tools and processes adapted to modern data collection techniques - something that’s at the core of our data-driven banking approach.
The bigger (the data set) the better
The last piece is having a big data set – and the bigger the better. When your algorithms have access to more data – and classify more transactions – they learn faster and offer more accurate categorisation. And to get that kind of scale, you have to go to a tech partner.
Let’s imagine you have one million transactions per day with your own categorisation engine. With a tech partner, you’d have 10 million. We sit on a ton of data, with five billion transactions on our platform from banks in seven different markets – and that’s what we use to build our models. And that’s just right now. Every time a new bank partners with us, we also add at least half a billion transactions.
The question then is, how long would it take you to add a similar amount of data into your own system?
Let a tech partner do the heavy lifting
The harsh reality is that comparing a bank’s ability to build a categorisation engine to a tech partner like Tink is unfair. Engineering a machine-learning system requires experience in text processing, automation capabilities and a huge amount of data. These are just three major hurdles – and there are more.
But letting a tech partner do this heavy lifting for you will mean you can bring services to market in a fraction of the time, gain access to an ever-growing set of machine-learning technologies – and offer a better PFM experience for your customers.