A Data Science Central Community
Original post here: http://sctr7.com/2014/06/27/the-cutting-edge-network-analytics-for-...
The application of network analysis to the growing challenge of fraud and financial crime is a fast emerging advanced data analytics frontier. As any good fraud investigation knows, fraud and financial crime are as much deep social phenomenon as aspects of financial transactions gone awry. Thus the application of social network analysis is able to provide deep insights to detect and prevent tangled and complex cases of fraud.
Fraud is estimated to consume an estimated 5% of annual global revenues, resulting in a loss of more than €2.6 trillion*. Further, with the swelling rise of globalization and inexorable advance of communication technology, fraud is growing each year in volume, scope, and sophistication. Traditional fraud detection and mitigation approaches involve highly manual forensics efforts and a ‘roll-up-your-sleeves-and-dig’ approach. However, the growing scale and complexity of fraud schemes means that fraudsters are increasingly able to circumvent and evade such techniques. The increasing ineffective strategy of random spot-checking allows profits to be bled from businesses and institutions.
Network analysis is a growing challenge of fraud and financial crime. By applying Data analytics this approach is able to provide deep insights to detect and prevent tangled and complex cases of fraud.
Advanced analytics methods such as machine learning are applied to detect fraudulent transactions. With this approach the false positive ratio in fraud detection can be reduced dramatically, resulting in levels of operational efficiency and effectiveness not achievable via traditional fraud detection methods. The results are compelling: firms and agencies that apply machine learning to fraud detection have significantly improved their detection rates.
Along with machine learning, integrated ‘network graph’ analytics can be applied to detect and mitigate fraud. This end-to-end approach results in new insights for detecting and mitigating fraud: storing and retrieving interconnected information in a native ‘network graph’ format, delivering interactive network visualizations to discover hidden structures, locating clusters and patterns, identifying links in transaction chains, and applying specialized network-focused statistical algorithms to identify and extract patterns.
Figure 1 An end-to-end network analytics approach: encoding data in a graph format, producing interactive visualizations, identifying patterns, and conducting statistical analysis.
Fraud is a willful act combining highly social factors: incentives, means, opportunities, and a dose of rationalization. This confluence of enabling factors can be tracked and detected as phenomenon which occur in social networks: individuals committing wrong and breaking rules in highly interconnected webs of trust, institutions, transactions, and exchanges.
Network graph analytics allows for a comprehensive and ‘native’ examination of the world as sets of overlapping networks. The general method works by obtaining seemingly simple heterogeneous datasets describing connections between associated elements. For instance, a dataset mixing a large set of tax and banking transaction records, company ownership data, property ownership information, cellphone records, and email exchanges.
Figure 2 The pillars of fraud: highly social fraud factors are difficult to identify via structured datasets, but emerge via the agglomeration of data into a network, allowing for latent pattern detection.
By loading such seemingly simple ‘metadata’ into a native network format, insightful and powerful visualizations of hidden patterns and connections in networks of exchanges can be produced. Also via this approach advanced statistical analysis can be conducted concerning the nature of the network exchanges, identifying ‘normal’ types of transactions and quickly isolating and detecting ‘abnormal’ exchanges.
Figure 3 A network representation of email communication patterns preceding the collapse of the U.S. energy trading company Enron. Key actors emerge as central ‘nodes’, along with lesser known facilitators. The same technique can be applied to complex forensics investigations, providing for the detection of communication patterns and identifying key parties. This approach can be enhanced with semantic analytics to highlight key terms and sentiments used in communications.
Traditional SQL-based relational database management (RDBM) approaches have inherent limitations in storing and extracting highly interconnected information. While a powerful method for ensuring data integrity and retrieving structured information, RDBMs solutions have inherent limitations when attempting to represent networks. For example, identifying chains of friends-of-friends-of friends, a common feature of social networking sites such as Facebook and LinkedIn, is best served by Not-Only-SQL (NOSQL) solutions such as graph databases.
FSQL QUERY: Who co-owns a house with a friend-of-a-friend? SELECT [1Person].Person_Name, [2Address].Address, [2Address].City, [2Address].Country, [4Friend].ContactName, [2Address_2].Address, [2Address_2].City, [2Address_2].Country, [6Friend_of_Friend].ContactName, [2Address_1].Address, [2Address_1].City, [2Address_1].Country FROM (((6Friend_of_Friend INNER JOIN ((4Friend INNER JOIN (1Person INNER JOIN 3Person_Friends ON [1Person].Person_Key = [3Person_Friends].Person) ON [4Friend].Person_Key = [3Person_Friends].Friend) INNER JOIN 5Friend_Friends ON [4Friend].Person_Key = [5Friend_Friends].Person) ON [6Friend_of_Friend].Person_Key = [5Friend_Friends].Friend) INNER JOIN 2Address AS 2Address_1 ON [6Friend_of_Friend].Person_Key = [2Address_1].Person_ForKey) INNER JOIN 2Address AS 2Address_2 ON [4Friend].Person_Key = [2Address_2].Person_ForKey) INNER JOIN 2Address ON [1Person].Person_Key = [2Address].Person_ForKey ORDER BY [1Person].Person_Name, [4Friend].ContactName;
igure 4 Seemingly straight-forward questions, such as “who co-owns a house with a friend-of-a-friend” become quickly complex and computationally intensive when using relational databases.
NOSQL graph databases store and retrieve data in a native network format. Applying network data storage, management, and retrieval, advanced network analytics can be applied to quickly detect potential fraud. Techniques applied include advanced network pattern discovery, cluster analysis, applied graph mathematics, statistical analysis of transaction chains, and transaction chain identification and retrieval.
This approach can be used to detect possible tax fraud schemes, EU VAT carousel fraud for instance, given a set of tax posting, invoicing, banking transaction, and company ownership data supplemented with select third-party data (credit risk and criminal records, for instance). The same approach can be applied to credit card fraud risk. Each additional dataset is ‘layered’ onto the base network, creating increasingly rich patterns. The approach is also useful in identifying structural weaknesses and areas where increased monitoring and control should be applied.
Figure 5 An example of a particular EU cross-border tax fraud scheme encoded as a basic network pattern. Searches can be quickly conducted across very large datasets based on known patterns. As well, discovery-focused statistical analysis can be conducted to identify unusual patterns appropriate for follow-up investigation.
On the advanced forefront, network simulations can be run on network graph data. Once a particular market or set of transactions are sufficiently represented as network phenomenon, simulations can be run to better understand the nature of the market or phenomenon under investigation.
Figure 6 Multi-agent simulation utilizing network data can be used to examine the dynamic nature of networks, for instance to identify potential structural weaknesses in financial control or compliance systems based on game theory models of behavior.
Techniques such as multi-agent simulation of game theory scenarios can thus be applied to understand structural weaknesses in controls, markets, or transaction chains. For instance, by modeling a large trading operation as a network of transactions, trust, and incentives, trading operations can be simulated in order to detect and better understand the risk of trading fraud, a persistent problem causing ever-spiraling financial institutional losses.
WANT TO KNOW MORE? RECENT ACFE PRESENTATION ON ADVANCED ANALYTICS FOR FRAUD DETECTION AND MITIGATION
* Source: ACFE ‘Report to the Nations 2012 Global Fraud Study’
ABOUT THE AUTHOR
Scott Mongeau, MA MA GD MBA PhD (ABD) Analytics Manager, Risk Services Deloitte Netherlands
Scott Mongeau, Analytics Manager at Deloitte, has more than 20 years of experience in project-focused analytics functions in a range of industries. He is an active university researcher, lecturer, conference presenter and writer in the areas of data analytics, fraud analytics, and social network analysis (SNA).