Fraud detection using machine learning
Almost everyone must have received a phone call or a text message from their financial institution when they have
attempted to perform a financial transaction out of the normal. Hopefully, most of the time there are no bad
actors involved and the transaction has been attempted by you. So other than a minor annoyance and a delay in
completing your transaction, all goes good.
Ever wondered what goes behind the scenes there. What we have
is a rule-based way of working, similar to a risk underwriter who takes into consideration a whole lot of
variables into consideration. But this rule-based approach does have a set of drawbacks viz.
Rigid
definition
Once a rule has been defined, it kind of remains the same unless there is a periodic
human-involved review process.
Maintainability and relevance
The very fact that we need
humans to keep the rules up-to-date makes it maintenance-intensive and susceptible to being overrun by bad
actors.
More false than true alerts
From practical experience, you must have realized that
for every 1 genuine alert or near-zero alert which you receive, all the other alerts are for genuine
transactions you are attempting.
Machine Learning could be a good candidate in this use
case.
Why machine learning?
We can take every shortcoming of a rule-based
approach and turn it on its head:
Fluid definition
Machine learning models by definition are
able to self-learn and quickly adapt to changing patterns. The fact these models have at their disposal huge
amounts of data to work through is definitely going in their favor.
Easier to maintain
The
fact that humans are removed from the equations makes the entire system easier to maintain.
More true
alerts
Because the underlying mechanism of a machine learning model - viz. more amount of training
data and better statistical algorithms to go with it - inherently give them a leg up, the result is just much
better performance.
Well then how would we apply machine learning in this
case?
Well turns out not in any way different than a conventional ML model -
viz.:
Gather data
Collect a good amount of data - appropriately labeled - meaning a set of
financial transactions labeled as legitimate and a set of transactions labeled as fraudulent.
Feature
Engineering
Next would be the feature engineering step where we decide which features are of real
importance here. We can club them into 2 kinds - Features pertaining to properties of
transactions such as identity and location; Features pertaining to customer
behavior such as frequency and type of orders. To define these a little bit better:
- Identity: This could be any variable which uniquely identifies a consumer - say cell phone number.
- Orders: Nature and type of orders in terms of kind and quantity of products bought.
- Payment methods: Whether it is credit, debit, PayPal, Venmo, digital wallets.
- Locations: Usually the IP address of the machine, which can be spoofed through a VPN connection, so care must be taken here.
Train algorithm
Next would be the logical step of training an algorithm to come up with the least
cost mathematical function.
To get a little bit more technical on the machine learning techniques - those are of a few kinds here:
- Tree-based: Here we are referring to models such as Random Forest and XGBoost which perform great on many kinds of transaction datasets.
- Neural Networks: These are slowly gaining in popularity and are used in conjunction with anomaly detection which brings us to the next technique.
- Clustering: Wherein data points lying close to the centroid of a cluster are said to be safe, while outliers are labeled fraudulent.
- Nearest Neighbor: Normal data points occur in close proximity, while anomalous data points are far from any neighbors.
- Classification: Where we use learning based on labeled data to distinguish legitimate from fraudulent transactions.
- Deep learning: Where well-defined stochastic models are used to segregate normal data points occurring in high probability regions from abnormal data points occurring in low-probability regions.