Subscribe to DSC Newsletter

10+ Great Metrics and Strategies for Fraud Detection

Emphasis here is on web log data. More than one rule must be triggered to fire an alarm. You may use a system such as hidden decision trees to assign a specific weight to each rule.

  1. Monte Carlo simulations to detect extreme events. Example: large cluster of non-proxy IP addresses that have exactly 8 clicks per day, day after day. What is the chance of this happening naturally
  2. IP address or referral domain belongs to a particular type of blacklist, or whitelist. Classify the space of IP addresses into major clusters: static IP, anonymous proxy, corporate proxy (white-listed), edu proxy (high risk), highly recycled IP (higher risk), etc.
  3. Referral domain statistics: time to load with variance (based on 3 measurements), page size with variance (based on 3 measurements), text strings found on web page (either in HTML or Javascript code). Create list of suspicious terms (viagra, online casino etc.) Create list of suspicious Javascript tags or codes but use white list of referral domains (e.g. top publishers) to eliminate false positives. 
  4. Analyse domain name patterns, example: a cluster of domain names, with exactly identical fraud scores, are all of the form, and their web page all have the same size (1 char).
  5. Association analysis: buckets of traffic with a huge proportion (>30%) of very short (< 15 seconds) sessions that have two or more unknown referrals (that is, referrals other than Facebook, Google, Yahoo or a top 500 domain). Aggregate all these mysterious referrals across these sessions - chances are that they are all part of a same Botnet scheme (used e.g. for click fraud).
  6. Mismatch in credit card fields: phone number in one country, email or IP adress from a proxy domain owned by someone located in another country, physical address yet in another state, name (e.g. Amy) and email address (e.g. [email protected]) look very different, and a Google search on the email address reveals previous scams operated from same account, or nothing at all
  7. Referral web page or search keyword attached to a paid click contains gibberish or text strings made of letters that are very close on the keyboard, such as fgdfrffrft. 
  8. Email address contains digits other than area code, year (e.g. 73) or zip-code (except if from someone in India or China)
  9. Time to 1st transaction after sign-up is very short
  10. Abnormal purchase pattern (Sunday at 2am, buy most expensive product on your e-store, from an IP outside US, on a B2B e-store targeted to US clients)
  11. Same small popular dollar amount (e.g. $9.99) across multiple merchants with same merchant category, with one or two transactions per cardholder

Related articles:

Views: 10496


You need to be a member of AnalyticBridge to add comments!

Join AnalyticBridge

Comment by Angela Waner on May 31, 2012 at 9:21am

When I read the title of this post, I thought you would be talking about insurance fraud. I have been working on projects related to insurance companies recent, which is why I jumped to this conclusion. 

I have found number 6 to be highly useful for website fraud detection.

On Data Science Central

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service