A Data Science Central Community

David Hand, a leading statistician, is Emeritus Professor of Mathematics and senior research investigator at Imperial College London. He’s the former president of the Royal Statistical Society and chief scientific advisor to hedge fund Winton Capital.

The title of his interview is Why Coincidences, Miracles And Rare Events Happen Every Day. My comment is that in big data, it happens every second. But what is actually mislead for signal (rare event) is most of the time, just pure noise. Yes, you can extract signal from big data, and obviously even more (or at least as much) than in smaller data sets, but you need to use the right tools and most importantly, the right methodology. Again, read my article "the curse of big data" for further details, which has little to do with "the curse of dimensionality".

**Here's the Forbes interview, by John Navin**:

**David J. Hand**: The key thing here is that it is clearly a very highly improbable event. But even so, the improbability principle can explain it. It does this mainly via its law of truly large numbers, but also using the law of near enough. Essentially, there are vast numbers of blackjack games played around the world each year, so although the chance of such an event on any particular game is very small, the chance that it occurs somewhere amongst all those millions of games is quite high. And we must also bear in mind that that’s not the only unusual event which would lead to a comment. Other striking configurations of cards would do so also. Add all these things up, and it soon reaches the level at which we should *expect* some apparently extraordinarily unlikely event to occur.

To work out the exact odds, I’d need more information – like how many decks were being used in the shoe for that particular game, whether the hands totaling 21 were all just two cards or occurred later in the game, and so on. But (roughly speaking as you say) here is an approximation to an answer – and as I explain in a moment, an approximation is all we need.

The approximation I’m going to make is that there were many decks in the shoe, so that dealing one card made little difference to the probability that a card of that value would be drawn at the next deal. It’s clear that if there was actually a small number of decks, then when an ace was dealt the probability that the next card would be an ace would be reduced (and the probability that it was *not* an ace would be increased), since there would remain fewer aces to draw. My approximation assumes that there are many decks, and hence many aces, so that dealing one ace only has a negligible effect on the chance that the next card is an ace. As I said, it’s an approximation, but it will do to get a handle on the problem.

I’m also going to assume, again just to give us a handle, that all the hands were just two card hands. That means they each consisted of one of {10, J, Q, K} along with an Ace. The probability that a randomly dealt pair of cards will contain such a pair is 16/52 × 4/52 × 2 = 0.047. That’s about 1 in 21 randomly dealt hands of two cards sum to 21. (The 16/52 is the probability of drawing one of {10, J, Q, K}, and the 4/52 is the probability of drawing an Ace. The factor of 2 comes in because the Ace might be dealt first or second.)

That’s the probability that a single hand of two cards will sum to 21. The chance that all seven hands (the five other players, you, and the dealer) will each sum to 21 is just that probability multiplied by itself six times: 0.0477. That’s about 1 in 1.9 billion. A very improbable event!

*However, *here is where the improbability principle comes in (and why we only needed to calculate things approximately). There are a great many games of blackjack played around the world every day. And they are played day in, day out, year after year. In fact, the website statistica.com says there are about 3,500 casinos in the world. If (on average) each has 10 simultaneous games of blackjack going on, and if each game takes 5 minutes, and the casinos are open for 5 hours on each of say 350 days per year, that means that each year there are about 735 million games of blackjack played in the world each year. You can plug in your own numbers if you feel mine are unrealistic, but I doubt it will change the conclusion, that there is *a very large number of games of blackjack* played around the world each year. (And, moreover, this is just counting casinos – not all the private games which go on.)

Anyway, the point is that even an event which has a probability as low as 1 in 1.9 billion is quite likely to happen if it has 735 million opportunities to happen. In fact the probability that such a set of hands will be drawn is about 0.31, which means we’d expect to see it happen about once every three years on average.

We might then factor in another of the laws, the law of near enough. We’ve been talking about one sort of configuration of hands which surprised people. But there are others – again increasing the chance of improbable events occurring.

Finally, I liked the icing on the story, that Stevie Wonder’s “superstition” began playing while the cards were being dealt. To work out the chance of that happening at the same time, I’d need to know how often that track was played, and so on. Whatever the answer, it’s clear that the joint event of getting both those striking card hands and also the track playing is much more improbable than just getting the cards. *But*, it would have been just as striking if, instead of that track, the sound system had broadcast “Good luck” by Basement Jaxx, or “Get lucky” by Daft Punk, or “Lucky day” by the Rolling Stones, or …. This is another manifestation of the law of near enough: a song about luck, fate, etc is near enough to one about superstition for you to recognise it as noteworthy. Incidentally, it wouldn’t surprise me if casinos favoured tracks which referred to superstitions, luck, fortune, and so on. So perhaps the joint event of your extraordinary card hands, along with Stevie Wonder, is not so startling after all.

**John Navin**: In your book, you detail five improbability laws. Which laws apply to my blackjack experience and how?

**Other links**

© 2019 AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

**Technical**

- Free Books and Resources for DSC Members
- Learn Machine Learning Coding Basics in a weekend
- New Machine Learning Cheat Sheet | Old one
- Advanced Machine Learning with Basic Excel
- 12 Algorithms Every Data Scientist Should Know
- Hitchhiker's Guide to Data Science, Machine Learning, R, Python
- Visualizations: Comparing Tableau, SPSS, R, Excel, Matlab, JS, Pyth...
- How to Automatically Determine the Number of Clusters in your Data
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- Fast Combinatorial Feature Selection with New Definition of Predict...
- 10 types of regressions. Which one to use?
- 40 Techniques Used by Data Scientists
- 15 Deep Learning Tutorials
- R: a survival guide to data science with R

**Non Technical**

- Advanced Analytic Platforms - Incumbents Fall - Challengers Rise
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- How to Become a Data Scientist - On your own
- 16 analytic disciplines compared to data science
- Six categories of Data Scientists
- 21 data science systems used by Amazon to operate its business
- 24 Uses of Statistical Modeling
- 33 unusual problems that can be solved with data science
- 22 Differences Between Junior and Senior Data Scientists
- Why You Should be a Data Science Generalist - and How to Become One
- Becoming a Billionaire Data Scientist vs Struggling to Get a $100k Job
- Why do people with no experience want to become data scientists?

**Articles from top bloggers**

- Kirk Borne | Stephanie Glen | Vincent Granville
- Ajit Jaokar | Ronald van Loon | Bernard Marr
- Steve Miller | Bill Schmarzo | Bill Vorhies

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives**: 2008-2014 | 2015-2016 | 2017-2019 | Book 1 | Book 2 | More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of AnalyticBridge to add comments!

Join AnalyticBridge