A Data Science Central Community
I have developed a predictive model to identify fraudulent claims in general insurance. The model is based on historical fraud records, although it has improved the fraud detection rate significantly by reducing the number of cases under investigation but is not able to identify any new fraudulent behaviour. Can any one suggest me a technique whereby I can identify new fraudulent technique adopted the fraudsters and catch them as and when they enter the system?
Predicting insurance fraud has a number of difficulties but the main issue is that when you build a historical model on known frauds the only frauds you know about are those that were investigated for fraud. Typically only a small fraction of claims are investigated in this manner so you are left with a large population of unknowns.
There are a couple of methods I can think of around this problem. First I would look at what the credit risk industry calls 'reject inference' - inferring the fraud performance of the 'not investigated' claims using some combination of models. This method should help your model score claims that typically are never looked at by investigators. But don't expect it to find completely new modus operandi as the data is still based upon known investigation outcomes.
Secondly I would look at combining your supervised analytics with unsupervised analytics. Any clustering, self-organising map, discriminant or peer group analysis (to name only a few of the many options here) could be used on your data to discover new sets of claims that look significantly different from the population. This may help you find those new modus operandi that your investigators are currently not seeing.
Last if you will pardon me I would recommend my company's product Detica NetReveal http://www.deticanetreveal.com. It uses social network analysis to find hidden connections between claims and is used extensively across the world most notably by the insurance fraud bureau here in the UK which look for organised fraud in all of the UK insurer's data.
Hope this helps.
Thank you for your response. I am interested to learn more about Detica NetReveal. Is it possible for you to share an overview of this technology?
Detica NetReveal for insurance fraud uses a combination of an insurer's claims data, policy data and third party data to create social networks. Social networks are bounded networks of people who share data or links. For example, If individual A was a witness to one motor accident claim and had a claim themselves then they would link every person on the two claims together in the network. These links are created in a fuzzy manner in order to overcome linking issues when fraudsters deliberately try to obfuscate their details.
Once these networks have been built, new claims can be scored for fraud in real-time. We can use a number of supervised techniques like neural networks, regression, decision trees etc. to take historical frauds and train on them. Alternatively, we can draw from a set of known frauds discovered due to our work with major insurers across the world to find risky claims even if an organisation hasn't seen that modus operandi before. Finally, unsupervised techniques such as clustering can be used to discover networks which look abnormal in some way.
All high scoring cases are fed into our case management system.
I hope this gives you a good overview of the technology but do contact me if you require any further information.
As Matt pointed out, the new fraud entering the system will be hard to detect since a pattern has not been established, but one thing you can do is look for outliers in a multivariate sense by clustering the variables according to "normal" behavior and then pinpointed those clusters whose centroids are relatively far from the others.