A Data Science Central Community
This blog post is to retract an aspect of a story about Chase Bank that I reported in my book, Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die (published by Wiley, February 2013).
Although predictive analytics professionals usually tell accurate stories about their projects, sometimes the information that comes from human beings is less reliable than the data driven systems they build.
The Chase story appears in Chapter 4, where I cover predictive models in detail – specifically, decision trees – in a friendly way that is accessible to any reader.
The Machine That Learns: A Look Inside Chase's Prediction of Mortgage Risk
What form of risk has the perfect disguise? How does prediction transform risk to opportunity? What should all businesses learn from insurance companies? Why does machine learning require art in addition to science? What kind of predictive model can be understood by everyone? How can we confidently trust a machine's predictions? Why couldn't prediction prevent the global financial crisis?
To illustrate predictive models by way of a concrete business case, this chapter frames the topic in the context of Chase's use of decision trees to predict which mortgages they will lose by way of prepayment, i.e., when the mortgage holder pays off the entire remaining debt, often because they have defected to a competing bank and refinanced.
When this project was first executed in the late 1990s, such predictions were meant to help Chase in two possible ways: 1) determine which mortgage customers they may lose so they could work to retain them, and 2) forecast more precisely the future value of each mortgage vis-a-vis expected interest payments in order to improve decisions about which individual mortgages to sell to other banks at present market value.
The book newly breaks certain aspects of the business side of this story, which I obtained from a source involved with the Chase mortgage prediction project. This erroneously includes the reporting of Chase earning an additional profit of $600 million during the first year of employing predictive models for the second of two business applications above, i.e., driving per-mortgage sale decisions.
Subsequent to the publication of my book, my source unexpectedly came to me and informed me he is now doubtful of the $600 million figure he had previously divulged as truth. The cause of his uncertainty was not made clear to me, although it should be noted that the project took place about 15 years ago. Therefore I must now retract that figure as reported in my book. (Some journalists tell me the word "correct" may be more appropriate than "retract," but my intention here is to be as clear as possible to the non-journalist reader. By "retract" I mean "take back [the pertinent part of the story].")
Further, a second anonymous source who worked on the same project has recently questioned the degree to which prepay prediction was used by Chase to price mortgages for selling decisions (although the source verified Chase's use of prediction to target mortgage retention efforts). One reason the use of prediction to drive mortgage sale decisions may have been lessened or quashed is that this practice might have been considered potentially damaging to Chase's reputation in the eyes of partner banks, the new source told me. Since the degree to which Chase used mortgage prediction for sales decisions is not known, it is also not known how much the bank profited by doing so, if at all.
To correct these factual errors in my book, the mistaken $600 million figure will be removed from future print runs of the book, and other related minor modifications will be made in the book where it describe how Chase made use of its predictive models for mortgage portfolio management (these minor edits are mostly toward the beginning and end of Chapter 4, i.e., the business context surrounding the chapter's central scientific discussion of predictive models).
However, most of the Chase story holds, as presented in the book: Chase employed decision trees to predict mortgage prepayments, and the example decision trees in the chapter were generated over real data from the Chase project. The project was previously presented at a conference (Ali Moazami and Shaolin Li, "Mortgage Business Transformation Program Using CART-based Joint Risk Modeling," Salford Systems Data Mining Conference, 2005. www.salford-systems.com/doc/MoazamiLi.pdf).
This partial retraction was not compelled by any party other than myself as book author.
Among the many case study examples in the book (including 147 mini-case studies compiled in its central glossy table of examples), the book also newly broke two other stories to the general public: 1) The prediction of employee attrition by Hewlett-Packard, as first revealed by HP at Predictive Analytics World London in late 2011 (Chapter 2), and 2) the specific use of uplift modeling (aka, persuasion modeling) by the Obama 2012 presidential campaign to target individual voters more likely to be persuadable (i.e., convinceable by knocks on the door, phone calls, etc.), as reported to me by the campaign's Chief Data Scientist (Chapter 7).
I got these two breaking stories from sources more central to their execution at their respective organizations, and I do not expect any aspects thereof to be retracted. Subsequent to my book's publication, these two stories have been fact-checked and reported on by other publications such as the Wall Street Journal.
For me personally, this has been a surprising and distressing turn of events. Although this book is not centrally a conduit for reporting news – it is a conceptually complete primer for non-scientists (like a textbook) disguised as an entertaining "pop science" business reader – it does endeavor to enact journalism in some passages, since I found relevant newsbreaking stories and took the opportunity to include them.
In so doing, I have learned a hard lesson of journalism. Although I had many compelling reasons to believe my source on the Chase story was entirely credible, such as reputation, pedigree, and standing in the industry, I was mistaken in that belief. When writing, I did not seek a second source to back up all details of the story, as is the standard set internally by many news outlets. These things happen (e.g., the New York Times' "Corrections" webpages appear to report several every day) – however, there is a certain additional layer of skepticism that I lacked but which is found with seasoned journalists. I have come to observe and respect this skepticism in recent months as I interacted with many reporters who interviewed me about the book. I spoke about this turn of events with Columbia University School of Journalism Professor Richard Wald, who reflected, "The most painful lessons are the ones you learn in public."
I regret having reported this apparently false aspect of the Chase Bank story, and I assume responsibility for not further vetting what turned out to be unreliable information such as by seeking a second source. For this, I apologize to my readers.
This now retracted factoid is not substantially central to the book's (or even chapter's) coverage of predictive analytics, how it works, and why it is valuable. I have been gratified by the response to the book, including certain praises and reviews; its sales success within Amazon's ranking system has unleashed an obsessional need to continually check its status. I hope you will take the time to read it.