A Data Science Central Community
Dear all,
I have received a data where only 0.23% claims are fraudulent rest 99.73% are legitimate claims. Can I build a logistic regression model using this data set to identify future suspicious claims/ fraudulent claims?
My worry is such a low % of fraudulent claims in the present data set may not give me a proper result if I use it as it is.
Can you suggest me any particular technique?
Best regards,
Shounak Ghosal
Tags:
Dear All,
I have been trying to establish the best model for detecting fraud with regards to money laundering activities.
I have found that,
1. Logistic regression
2. CART
3. Custom neural models
4. MARS and
5. Time series Analysis
are the appropriate tools that can be used for the same. Kindly help me as to which one is the most appropriate in this case ?
regards,
prachir
Dear Mr Shoumak,
You may try out undersampling or oversampling or SMOTE in order to bring a sort balance to the dataset. Then any good algorithm will give reasonably high sensitivity meaning the number of frauds correctly identified by the model as fraud. You need not hesitate doing these, as they would not alter the boundary between the fraudulent and legitimate classes. In other words, they would not tinker with the physics of the system.
In addition, you may throw away some redundant predictor variables that are identified by algorithms or domain or both.
I am sure these things will help.
Bests,
Dr Ravi
© 2021 TechTarget, Inc.
Powered by
Badges | Report an Issue | Privacy Policy | Terms of Service
Most Popular Content on DSC
To not miss this type of content in the future, subscribe to our newsletter.
Other popular resources
Archives: 2008-2014 | 2015-2016 | 2017-2019 | Book 1 | Book 2 | More
Most popular articles