Catch me if you can….

6 min readApr 13, 2021

Frank Abagnale successfully conned millions of dollars worth of checks as a PanAm pilot, doctor, and legal prosecutor. Ironically, most of us were rooting for the fraudster instead of Hanratty (played by Tom Hanks)…. not thinking of the immense burden created by fraudsters for Financial Institutions. In addition, we also have the hapless victims of fraud — who may run into other issues when at the receiving end of an Identity Thief (as Jason Bateman finds out after being a victim to Melissa McCarthy’s character in ‘Identity Thief’).

Financial Institutions & Merchants primarily bear losses through Credit losses and Fraud losses. With the advent of internet and more ‘Card-not-present’ type of transactions, fraud is exploding (Card payment fraud alone was $20BB+ in 2018) — especially since fraudsters have limited risk hiding behind keyboards in a place far far way.

New Account Fraud: These are the instances where fraudsters create new accounts under real (stolen or tricked victim), fake accounts or synthetic identities. Frank making real looking checks under fake ids is a prime example of this.
Existing Accounts: These are the instances where the fraudster takes over a legitimate account and then finds ways to generate value (exiting money, buying goods or services etc). This was when Frank was able to cash checks of legitimate account holders.
Payment Fraud: These fraudsters are actually the account holders (or 1st parties). They are able to ‘cash’ checks and have banks provide them available credit in good faith that they then use to buy more goods which will result in balances being much higher than credit limits. Frank definitely cut a lot of such checks that would bounce before the institution that accepted the check heard back about the insufficient balance from the bank. However, this is more of a Credit Fraud than a 3rd party Fraud.

Companies defend against Fraud by using a four-step Fraud Operations process — where they look to Prevent fraud from happening (primarily using fraud models), then have trigger processes (or Detection) to route to Product team for customer outreach (emails/SMS etc) and/or an Operations team (Investigation) to review, take appropriate Action or try to Recover in some marginal cases. Investigation team also helps to label the fraud tag faster (otherwise labeling can only happen when the customer notifies the institution of fraud) which can then feed back into model development.

Use of Data Science in the four-step process & what metric to consider

Models are primarily used in Prevention (and in the Detection process) to prevent and reduce fraud. However, institutions have to ensure that their fraud model usage is aligned with their risk appetite and business focus.

As like any other model, a fraud identification model is not going to be perfect — it will have its mix of true positives, false positives, false negatives and true negatives — when we are trying to predict fraud. For example, a false positive would be where we predict fraud but it is not fraud in reality. Similarly, a false negative is where we predict as not-fraud but it turns out to be fraudulent.

A high false positive rate may result in bad customer experience (ex: preventing transaction) and collateral damage, while a high false negative rate may result in higher fraud losses.

Note that fraud prevalence is normally quite low (and hopefully that is the case in your financial institution) ranging in single to double digits basis points for transactions. This creates 2 challenges. One challenge is that there is an imbalance in classes between what is labeled as Fraud vs Not — so one has to manage that during the model build using different methods such as SMOTE. The other is that it is easy to get, say, high 99%+ accuracy (lots of true negatives drive that) — which may not be the best metric for model evaluation. Given the above, institutions need to figure out what is the appropriate model evaluation (ex: Recall, Precision, F1 scores, AUROC etc) as well as the score threshold to leverage for defining declines for fraud — that aligns with their business objectives.

Also one wants to target high Precision (TP/(TP+FP)) for accounts that get routed to the Investigation team — so that they review heavier mix of fraud-rich accounts to get the right ROI for the Opex involved.

Data Science & Tools in various Fraud solutions:

Rule Engines: Institutions leverage Rules Engines to review new accounts and transactions. These assess the particular transaction against scores and other variables (ex: age of account, phone device, country of IP and transaction, distance etc.) and make determination of the right course of action (that is approve, decline, Detection trigger for Investigation team etc.)
Lists: These are very basic defenses against Fraud. Using this defense, Blacklisted lists (can be accounts, email addresses, device fingerprints, IP addresses) are prevented from transacting or even closing the account. Conversely, one may use Approval lists in case of some high value customers whose particular pattern may be incorrectly triggering models on a consistent basis. However, these defenses are limited since they are restricted to the fraud history that the institutions have seen with that list type and fraudsters continue changing their modus operandi. Also, devices and IP can be worked around by using emulators and VPNs — which make list based defenses very limited.
Supervised machine learning: In this family of fraud defense, institutions build models that are trained on prior historical fraud data. Sample features that can be in such fraud models are temporal/spectral features (features in last X-days potentially with some decay assumptions, frequency of IP/device/transactions, size of transaction/payments etc), distance based (location as per IP and transaction etc), event sequences (that are normally seen in Fraud), and other domain drivers (email quality type of goods/services purchased etc). Feature engineering using domain knowledge is absolutely critical for developing better models. Models can be built using Random Forest, Neural Networks, Gradient Boost Classifiers, Support Vector Machines or logistic regression in its minimum form.
A limitation of this model is that it can be blind to new fraud trends — since it was trained on historical fraud trends. In order to reduce this limitation, institutions would continue to need to keep updating their fraud models on a faster rebuild cadence to capture risk emerging from any new fraud labeling that has happened.
Unsupervised machine learning model: can identify this trend faster by identifying other related accounts using deep learning. This is enabled by massive feature engineering real-time (such as sequence based activities, graphs of merchants and consumers etc.) and is expected to be the next holy grail in driving lower fraud.
A limitation of Unsupervised machine learning is that these can take more time and more computational power (since models that approve/decline accounts or transactions need to return a fraud score within 100–300 milliseconds ) which may create execution challenges.

Frank would have struggled significantly in today’s world where a check image could be read real-time by a machine and some supervised/unsupervised model would have triggered some red flags about the check right away. Fraud prevention and the whole eco-system has come a long way compared to the 1980s — however, with the advent of the internet, fraudsters too continue to look to stay a step ahead of the various institutions and policing bodies. Fighting fraud continues to be an important need for such as connected world.

Citations:

Catch me if you can….

Written by Shashank R