A Data Science Central Community
For quite some time, Intelligence Analysts in key government agencies have used these four techniques to reveal significant insights hidden beneath layers of disparate data. Today, they can be applied to your business. Just follow these guidelines.
Whether you are a fraud analyst, cyber security expert or social networking analyst, you'll reveal essential information to manage your business when you focus on:
1. Data Preparation & Data Connectivity
2. Data Profiling
3. Advanced Analysis Using Relationship Graphs
4. Annotation, Collaboration and Presentation
For this blog entry, let's start with Data Profiling. In future posts, we will cover the other three topics.
During the data profiling stage, the analyst should focus on 4 stages of profiling: a) initial data profiles to better understand the data b) identification of data anomalies that reveal important insights c) listing a set of questions leading to further analysis and d) development of a hypothesis that can be resolved later in the analysis.
To provide a real example of this, let's use an example in fraud analysis. This process can be applied to any set of data. Every day, fraud alerts are firing off across banking business lines. It's the job of the fraud analyst to profile these alerts to determine which need additional investigation. Here are a few profiles that the analyst generates right away:
In essence, the larger bubbles show larger concentrations of fraud alerts in the Know-Your-Customer, Loan, Checking and Credit Card business lines. Let's explore another profile that focuses on the amount of money at risk to the bank:
Money at Risk by Banking Business Line and Alert Type
Almost all of the money at risk is within the loan business lines.
When the fraud analyst looks at loan officer data, a new profile reveals that one or two loan officers have more fraud alerts linked to them than others. It also reveals that many of the loans do not have any load officer assigned. Here's the profile:
Sum of Money at Risk by Loan Officer and Alert Name
This short set of data visualization profiles allows the analyst to become familiar with the data and identify data anomalies that require additional analysis. From this point, the analyst starts to list a set of questions which can be resolved during the advanced analysis phases. Here is a short sample:
Typically, the analyst will use additional profiles to expand on the set of questions. Once this step is completed, the fraud analyst developments a hypothesis for the rest of the analysis. Using a variety of additional profiles that show a disproportionate number of alerts by branch of the bank and loan officer, the analyst decides to investigate the customers linked to specific officers in the California, Florida and DC branches. Loan officers Charles Head and Jack Carnahan are assigned to these branches. Each has many fraud alerts linked to their loans. The analyst hypothesizes that some form of collusion associated with non-standard lending practices may be taking place.
In this short example, a set of data visualization profiles are developed and then used to compile a set of questions requiring additional analysis. Data anomalies are reveled using the profiles. A hypothesis is developed which needs to be addressed in more advanced analysis. We will explore this important step in our next post.
The full white paper on Data Visualization Techniques for Fraud Analysis is available on www.centrifugesystems.com under Resources-White Papers.