A Data Science Central Community
Fraud detection is all about connecting the dots. We are going to see how to use graph analysis to identify stolen credit cards and fake identities. For the purpose of this article we have worked with Ralf Becher, irregular.bi. Ralf is Qlik Luminary and he provides solutions to integrate the Graph approach into Business Intelligence solutions like QlikView and Qlik Sense.
Third party fraud occurs when a criminal uses someone else’s identity to commit fraud. For a typical retail operation this takes the form of individuals or groups of individuals using stolen credit card to purchase high-value items.
Fighting it is a challenge. In particular, it means having a capability to detect potential fraud cases in large datasets and a capability to distinguish between real cases and false positives (the cases that look suspicious but are legitimate).
Traditional fraud detection systems focus on threshold related to customers activities. Suspicious activities include for example multiple purchases of the same product, high number of transactions per person or per credit card.
Graph analysis can add an extra layer of security by focusing on the relationships between fraudsters or fraud cases. It helps identify fraud cases that would otherwise go undetected…until too late. We recently explained how to use graph analysis to identify stolen credit cards.
For the this article, we have prepared a dummy dataset typical of an online retail operation. It includes:
To analyse the connections in our data, we stored it in a Neo4j, the leading graph database. The graph approach lies in modelling data as nodes and edges. Here is a schema of our data represented as a graph:
You can download the data here.
Now that the data is stored in Neo4j, we can analyse it.
First of all we need to set a benchmark for what’s normal. Here is an example of a transaction:
Now that we have an idea of what not to look we can start thinking about patterns specifically associated with fraud. One such pattern is a personal piece of information (IP, email, credit card, address) associated with multiple persons.
Neo4j includes a graph query language called Cypher that allows us to detect such a pattern. Here is how to do it:
//———————– //Detect fraud pattern //———————– MATCH (order:Order)<-[:ORDERED]-(person:Person) MATCH (order)-[]-(fact) WITH fact, collect(order) as orders, collect(distinct person) as people WHERE size(orders) > 1 and size(people) > 1 RETURN fact, orders, people LIMIT 20 |
What this query does is search for shared personal pieces of information. It returns all groups of at least two persons and two orders connected by a common personal information.
To verify the accuracy of our query, fine-tune it or evaluate how to act on the alerts it returns, we will use graph visualization.
The address [email protected] (center) is shared by 3 people (purple nodes)
Here we can see that 3 persons are sharing the same email. Are we looking at a potential fraud? If we expand the graph, we can see that 3 persons have distinct addresses, IPs, phones and credit cards.
Data associated with the 3 distinct people using [email protected]
In isolation, each of this person looks normal. Edmund Cagliostro for example seems like a legitimate customer.
The fact that these seemingly distinct accounts share a common address is suspicious. It justifies to further investigate Edmund Cagliostro and its connections.
Our query also reveals an IP address shared by multiple persons.
An IP address (center) with connections to 5 persons (purple) and orders (orange)
We can see that IP address 0.106.244.75 is shared by 5 people. Once again this is suspicious and should be investigated.
Graph visualization can help us inspect potential fraud cases and quickly evaluate them.
Now that we have found a couple of suspicious fraud cases, it’s time to dig deeper. We want to assess the full impact of an individual fraud to take appropriate actions.
Let’s say we noticed in our dummy dataset that a “Leisa Gugliotta” is involved in a fraud. Not only do we want to block any transactions from her but we also need to identify her potential accomplices. In order to do that, we need to see who else is using the personal information used by Leisa Gugliotta.
Here is how to do that via Cypher:
//———————– //Who are Leisa’s accomplices? //———————– MATCH (suspect:Person {full_name:”Leisa Gugliotta”}) MATCH (fact)<-[:USED_EMAIL|:USED_PHONE|:USED_IP|:USED_CREDIT_CARD|:USED_ADDRESS]-(suspect) MATCH (fact)<-[:USED_EMAIL|:USED_PHONE|:USED_IP|:USED_CREDIT_CARD|:USED_ADDRESS]-(other) WHERE suspect <> other RETURN suspect,other,collect(distinct fact) as facts LIMIT 20 |
We can run the same analysis via Linkurious. The result is the following graph:
This picture makes it easy to view that our retail operation has been targeted by a fraud ring. Leisa Gugliotta shares a credit card with one other person and a email address with 4 people. These fraudsters can all be identified by the connections between them. Now we can freeze their accounts and add their information to our blacklist.
Third party fraud means that personal pieces of information are reused to create fake identifies (know as synthetic identities). Graph analysis makes it possible to spot that pattern and prevent fraud. Through graph visualization, we can quickly evaluate potential fraud cases and make informed decisions. Try Linkurious now to learn more!
Comment
Great article - having been a victim of ID theft once, and stolen CC#s at least 3 times, this is great work. Kudos!
© 2021 TechTarget, Inc.
Powered by
Badges | Report an Issue | Privacy Policy | Terms of Service
Most Popular Content on DSC
To not miss this type of content in the future, subscribe to our newsletter.
Other popular resources
Archives: 2008-2014 | 2015-2016 | 2017-2019 | Book 1 | Book 2 | More
Most popular articles
You need to be a member of AnalyticBridge to add comments!
Join AnalyticBridge