Categorisation is an essential step in extracting value from account and transactional data. In addition to dates (when) and amounts (what), the key lies in understanding why a particular transaction was made, the transaction’s purpose and what it says about the person making it. Categorisation models that use up-to-date data, straight from the bank, help lenders understand their applicants in real time – all while accurately fulfilling their credit policies.
In consumer lending, sufficiently understanding the person you’re about to enter into a credit agreement with is fundamental. Consequently, lenders are typically faced with two types of questions:
Can we lend to this individual? Does this person have sufficient and verified income – and can this person afford the loan over time?
Should we lend to this individual? What is the probability that this person will repay the loan – and are there other indications of risky behaviour?
The answers to these questions are met through credit policy rules that usually require, for example, the applicant’s income to reach a certain threshold before a loan is granted – or are based on a calculated probability of default. Without accurate categorisation of data, inaccurate assumptions can often be made – which can lead to the approval of bad payers or the rejection of good payers.
In many cases transactions have limited information, making it hard even for humans to analyse how a transaction should be categorised. To identify different types of income, most companies work with rule-based logic and regular expressions in their models. But there are some challenges around this approach:
discrepancies in transaction data that make them hard to understand, like unusual characters
requiring extensive human involvement to stay up to date, and
difficulty handling multiple languages – which is important in a world with an increasingly international consumer base.
Ultimately, it leads lenders to make credit decisions based on subpar or incorrect data, missing out on potential approvals.
By updating our model, we can solve many of the categorisation challenges facing lenders:
Sub categories – providing greater granularity for more precision when identifying income. This enables lenders to make more informed decisions that align with their credit policy and regulations.
Generative AI – the ability to process larger volumes of data, requiring less human involvement – while increasing the accuracy and robustness of our models.
Multilingual model – mitigating dependencies on local language, enabling credit decisions for a broader, international set of customers.
Our categorisation taxonomy is now even more granular with new ‘sub categories’. Through sub categories we provide more detailed data and context for categorised income streams. For example, lenders will not only be able to understand whether an income stream is a benefit payment – but specify whether it falls under child support, unemployment or another kind of benefit.
Overall, income data is still aggregated and easily identifiable in broad categories (Salary, Pension, Benefits, Cash Deposits and Other). But there are now as many as 21 sub categories at your fingertips – designed to help lenders better understand each income stream – from Insurance, Rental or Educational grants to Gig for any side projects a user may have.
All lenders have different risk appetites and their own credit policies to fulfil when making credit decisions. Some might consider less regular income such as bonuses as underwritable, and some do not.
Tink’s new sub categories optimise income verification tooling for lenders, no matter their credit policies. Thanks to the granular taxonomy created by these sub categories, lenders can obtain the high quality data necessary to make confident, policy-compliant decisions.
With more granular taxonomy, lenders can deepen their understanding of applicants and the type of income they have. These data points can help lenders achieve their target approval rates while simultaneously lowering the probability of default. This is particularly relevant when using risk decisioning products like Tink Income Check – a tool to help lenders verify an applicant’s income digitally, in real time. In other words, a lender gets the full picture of the borrower’s financials at the time of application.
For example, a recent Tink Income Check customer, as a result of using this product, has now approved 40% of its applicants who had in the past been marginally declined.
As generic categorisation does not allow for the same precision, it could increase risk if a bank or financial institution has specific credit policies around benefits or other income sources. In other words, strengthening generic categorisation with sub categories can enable lenders to apply more precision to their credit decisions – in turn, making it easier to comply with their credit policy and meet local regulations.
Historically, Tink’s categorisation has used machine learning and small language models (SLM) to more effectively identify and categorise transactions. Due to recent AI advancements, we have supplemented our categorisation to also leverage large language models (LLM). In short, LLMs are trained on vast amounts of data to make intelligent predictions. In contrast, SLMs are tailored for a specific purpose or task.
With additional parameters, LLMs can draw broader connections between vast data sets. In this context, LLMs help us “pre-annotate” our data by going through hundreds of millions of transactions and – based on everything they have seen so far – make a prediction on the category to which a given transaction should belong.
This pre-processed data helps our analysts to work with much larger quantities and process more data – enabling us to focus on refining the categorisation that the LLM provides us with. It serves as a preliminary step before our analysts validate the predictions.
As LLMs develop, we estimate that they will be able to reduce the time needed to label transactions by up to 90% – from around 10 seconds per transaction to one second per transaction.
The final, pre-checked and cleaned dataset then serves as the foundation for the tailored machine learning models serving our customers. The more clean data we train our machine learning models with, the better and more accurate they become. We anticipate the accuracy of our categorisation to continuously increase as LLMs evolve.
The largest benefit for lenders who leverage LLMs is substantial time saved. By using LLMs, we can more efficiently process large volumes of data, ultimately increasing the accuracy and robustness of our models – and resulting in higher quality reports.
"Tink's updated categorisation model aims to surpass human capacity to categorise transactions, empowering lenders to make informed assessments with confidence – and setting them up for success."
– Rickard Arbeus, Senior Product Manager, Tink
As the world grows, we want to help lenders grow with it – so our categorisation model has gone multilingual. A multilingual categorisation model means creating machine learning models trained on data from multiple markets and languages.
For example, if a person in Spain has a salary stream with a UK transaction description, it will be recognised by the multilingual model as a salary stream. Without a multilingual model, the domestic categorisation model could classify the salary stream incorrectly.
With a single, multilingual model powering all of Tink’s risk products across all markets, we can leverage local categorisation intelligence – everywhere. This “cross-pollination” leads to regular improvements. In short, one categorisation model that speaks all languages can outperform models that are domestic only.
With multilingual models, Tink can mitigate dependencies on recognising local languages. Lenders can then enjoy higher value service through better credit decisions on applicants from other countries or who speak different languages, by getting verified income information on a broader set of customers.
As we improve our model in one market, it improves everywhere at the same time. Over time we can accurately categorise transactions – regardless of where they occurred.
In contrast to domestic models, our multilingual model can, for example, learn from transactions made in the UK and apply this to other markets, even before any transactions of that type have actually occurred there.
Intelligent sub categories that create granular taxonomy. Generative AI that increases the accuracy and robustness of Tink’s machine learning models. A multilingual model that helps verify income information on a broader set of customers. These are the updates to our categorisation capabilities that help lenders obtain the high quality data necessary to make confident, policy-compliant decisions. And as our categorisation model continually grows and increases in precision, so will its benefits. Even now, lenders can:
go to market faster: leveraging multilingual categorisation accelerates efficient, effective deployment of risk products across new markets
make more confident decisions: using reliable, consistent data straight from the bank gives lenders the full picture at the time of the borrower’s application
get top-quality reports: enhancing categorisation with large language models (LLMs) achieves more accurate results, improves as models learn over time, builds understanding of edge cases – and ultimately helps the development of risk decisioning overall.
Ready to start reaping the benefits of Income Check like Bank Norwegian, or want to learn even more about our categorisation model? Feel free to get in touch with our team.
—
Case studies, statistics, research and recommendations are provided “AS IS” and intended for informational purposes only and should not be relied upon for operational, marketing, legal, technical, tax, financial or other advice. Tink AB does not make any warranty or representation as to the completeness or accuracy of the Information within this document, nor assume any liability or responsibility that may result from reliance on such Information. The Information contained herein is not intended as legal advice, and readers are encouraged to seek the advice of a competent legal professional where such advice is required.
2024-09-18
14 min read
We’ve previously explored small tweaks that get big results in open banking conversion rates. This deep dive drills further into how to reduce friction – and improve success rates through a fresh round of incremental changes in our UX.
Read more
2024-08-05
5 min read
Reaching financial goals can be daunting – so we’ve updated Savings Goals, a feature of Tink Money Manager designed to help banks empower customers to proactively save and achieve financial wellness.
Read more
2024-07-17
2 min read
This Tink white paper introduces new consumer and retail banking executive research from key European markets, setting the scene for banks to take the next step with Personal Finance Management (PFM).
Read more
Contact our team to learn more about what we can help you build – or create an account to get started right away.