Subscribe to DSC Newsletter

Secret E-Scores Chart Consumers’ Buying Power | NewYorkTimes

Interesting article published in the NewYorkTimes, discussing how statistical scores are regulated by the government and how they could be used in different contexts. They discuss a company, eBureau, who (they claim) created new types of scores.

A few highlights:

  1. David Vladeck, the director of the Bureau of Consumer Protection at the Federal Trade Commission, does not like modern scoring methods based on ensemble techniques. Rejection has been based on one single test.
  2. The article presents e-Score (the potential value / chance to purchase associated with a user in the context of e-commerce) as something new, taking into accounts new metrics such as salary, home value etc. I disagree with this point: there's nothing new - just a selection of metrics, most of them have actually been used for a long time in different settings. I myself work on scores to predict success at work based on interview metrics, and success at University based on high school metrics. To put it differently, scores are not just used to predict credit worthiness. 
  3. They say that a system with 50,000 variables (per person) is better than one with fewer (say 50) variables. I strongly disagree with this idea. I'll publish an article that shows how approximate, smaller solutions based on a small number of carefully selected metrics  work as well (if not better - in the sense that they are more stable, robust and less sensitive to noise) than large systems using super-big (but sparse) data.

Related articleAn alternative to FICO scores?

Here's the article:

AMERICANS are obsessed with their scores. Credit scores, G.P.A.’s, SAT’s, blood pressure and cholesterol levels — you name it.

So here’s a new score to obsess about: the e-score, an online calculation that is assuming an increasingly important, and controversial, role in e-commerce.

These digital scores, known broadly as consumer valuation or buying-power scores, measure our potential value as customers. What’s your e-score? You’ll probably never know. That’s because they are largely invisible to the public. But they are highly valuable to companies that want — or in some cases, don’t want — to have you as their customer.

Online consumer scores are calculated by a handful of start-ups, as well as a few financial services stalwarts, that specialize in the flourishing field of predictive consumer analytics. It is a Google-esque business, one fueled by almost unimaginable amounts of data and powered by complex computer algorithms. The result is a private, digital ranking of American society unlike anything that has come before.

Read full article at http://www.nytimes.com/2012/08/19/business/electronic-scores-rank-c...

Views: 473

Comment

You need to be a member of AnalyticBridge to add comments!

Join AnalyticBridge

Comment by Douglas Dame on August 26, 2012 at 2:31am

I haven't looked for more information on the algorithms and modeling approaches used by eBureau. I'm willing to assume they know much more than I do about industrial scale modeling and are not oblivious to computationally efficient techniques. As do you, of course. So let's assume that your simulated annealing implementation at least matches their modeling approach/es, whatever they are.

   

One of the intriguing aspects of their business model is that they don't know today what dependent (target) variables they'll be asked to work with tomorrow. "Many and highly diverse" is the impression.

   

Surely your ability to "match results with 500 predictors" would be somewhat dependent on what additional information content their 50,000-500=49,500 additional potential predictors could give to problem of modeling the next unanticipated problem that hits the in-box.

   

Or to state this another way, modeling problems have irreducible error (the inherent variance in Y) and reducible error (what the modeling process leaves on the table.) Use of optimal algorithms and transformations for a given problem, if discoverable, theoretically reduce their contributions to the total model error to zero. But in practical terms, reducible error is always conditional, it's floored by the information content of the predictors considered.

    

"Better ingredients, better pizza." - Papa John's.

"We don’t have better algorithms than everyone else; we just have more data.” Google’s Chief Scientist Peter Norvig3-21-10

    

What am I missing  or misunderstanding ? (Feel free to point me to any references I need to add to my self-study efforts.)

Comment by Vincent Granville on August 25, 2012 at 11:20pm

@Douglas: If your collect (say) 500 attributes on each user, you can derive 500*499 combinations of 2 metrics (e.g. income per age), and 500*499*498 combinations of 3 metrics (e.g. income per age and zip-code). This is much more than the 50,000 attributes used by e-Bureau. Indeed, we are dealing with a computational optimization problem involving trillions of trillions of trillions of (compound) metrics. There are automated techniques such as simulated annealing that will quickly give you a robust (local) optimum to your problem, using less than 50 metrics (out of say 10^30 compound metrics combinations), with a lift superior to the one obtained using 50,000 attributes, and barely below the lift provided by using the entire 10^30 metrics combinations mentioned above (to provide the global optimum). There's no hand-crafting involved in my solution, indeed I plan to offer it as AaaS (Analytics as a Service) which means that the solution would be obtained without any human interaction of any kind (just machines talking to machines).

I'll soon publish a paper about "how to test trillions of attribute combinations at once to identify great predictors in a robust way". More on this when I will have completed my paper.

Comment by Douglas Dame on August 25, 2012 at 2:42pm

Vincent:

I can readily imagine that with (your skills and) your data, your target variables, and your methods, you can show that well-designed small models can equal or out-perform what I'll call massively wide models.

But will that prove that with eBureau's target variables (? scores for propensity to buy, financial quality, potential customer-lifetime-value, etc ?), their data, their methods ... including any business requirements for turnaround times for training and/or scoring ... that a small model approach would also work as well or better for them ? I'm inferring from the article that they're doing mongo-scale industrial modeling, highly automated ... so careful "hand-crafting" is not a viable option. (But that's just my guess.) 

Per eBureau's website, the eScore is not a single, pre-defined metric. It is specialized to the purpose at hand. ("eScores are highly effective because they are developed and customized for the particular needs of your business.") So as a customer I might have one flavor of eScores in regards to my personal computer purchases, one for my yacht purchases (that would be a zero), another for mail-order dehydrated mangos, and another for the probability that I will renew my PO Box new year, just to make up a few random things.

   

The cleverness of their approach is that with a humongously wide collection of consumer attributes, and mega-horsepower, they can (apparently) throw everything against the wall as a brute-force approach and see what sticks.

The value of information is a function of its usefulness, cost, and timeliness. At the very least, you have to say they've developed an interesting approach.

(If I were a credit regulator or consumer advocate, I would be very concerned with any scoring schemes for POTENTIAL customers that had the de facto effect of triaging them onto a good/cheap credit or price path vs a bad/expensive credit path. Clearly sometime in the near future society is going to need to revisit the issue of what is, or is not, discriminatory or predatory pricing, because the ability to do those things, instantaneously, is already highly advanced. In one of the cited examples, call-in customers were scored AND triaged before the phone was even answered.) 

I'm just thinking on paper here, don't ever recall having heard of eScores before.

Comment by Vincent Granville on August 23, 2012 at 2:47pm

Click here to read a potential application to improving ad relevancy.

Comment by Lance Olson on August 23, 2012 at 1:50pm

Nice find.
I would like to contribute to your statement "...scores are not just used to predict credit worthiness..."
Not long ago(March 2012), I bought a book titled "Who's #1?: The Science of Rating and Ranking" by Amy Langville and Carl Meyer. These are the same authors who wrote about Google's pagerank. The book gives a good introduction to Rating and Ranking, which is pretty much scoring. The primary focus is on sports teams but the mathematical methods could be applied to many other fields as mentioned by the authors. The book also talks about how Qbert is related to ranking. I love the book.
I thought that I would post a partial list of the uses of scores/rating/ranking applications:

  • Products (Amazon)
  • Services
  • Political Candidates
  • Web pages (Google's pagerank)
  • Sales quality or sales location
  • Sales channels 
  • Customer loyalty
  • Sports teams
  • Car Routes ( based on speed, season and etc )
  • Preferred Businesses
  • Tourist destinations
  • Business Operational Performance to meet some minimum requirement.
  • Customer Preferences
  • Personal Recommendation for up sale
  • Clicks (useful clicks OR fraud clicks)

... the list goes on and on.
Maybe some have more ideas to post here.

On Data Science Central

Follow Us

© 2019   AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service