A Data Science Central Community
Your chance of having a car accident in the next 25,000 miles is independent of how many car accidents you had during the last 25,000 miles, or even during the last 300,000 miles for that matter.
In other words, the car accident process is a memory-less stochastic process. You'd think that if you have been driving for 15 years without a single car accident, the chance of having a car crash during your next trip is much higher than for a good driver who already had 2 car crashes over the last 15 years: you think that you have been very lucky so far, and that your luck won't last forever. Indeed, this is not the case - car crashes (for good drivers as well as for bad drivers) very closely follow a memory-less stochastic Poisson process: with low intensity for good drivers, and high intensity for bad drivers. The fact that you did not have a car crash over the last 15 years does not mean that you are more likely to have one tomorrow, and the other way around. This fact is very easy to prove, either based on car crash data, or via car driving simulations.
On the other side, it is also very easy to prove that the more you drive, the more you risk having a car crash. Indeed, the expected number of car crashes you will have in your lifetime - given how good or bad a driver you are - is proportional to the number of miles that you will drive.
How do you explain this paradox?
Comment
No real paradox. The first scenario talks about probability for a specified time period. The second talks about probability on an increasing time period. This is similar to the scenario of rolling a fair 6-sided die, your chances of getting a "6" on a single roll is 1/6. But your chances of getting at least one "6" in a dozen rolls is quite high, even though it remains 1/6 for each single roll. It is the number of exposures that is changing, not the probability of a single exposure.
It is so funny that I was just writing about flawed human thinking. How often we do that without realizing is so amazing to me. One would think that logic is simple. Once you know the principles you know all the answers. But that just is not true. That is why human success differs even though a group of people may have the same knowledge.
In this instance, there is no conditional probability. Each coin toss is independent of other coin tosses... collectively though it is another story.
The starting statement here is misleading. It is only accurate if we add, as in the last sentence '-given how good or bad a driver you are-'
If we do not condition on driving ability then the number of crashes I have had is a function of my driving ability. Therefore (using Bayes theorem if you like) my driving ability is a function of the number of crashes I have had and consequently my chance of having a crash in the next 25,000 miles is as well.
There are of course other things than your ability which influence your chance of a crash. Maybe you are a good driver but live in an area with a lot of bad ones (or vice versa) and really the conditioning should be on all those
Phil
The paradox you describe reminds me of another "paradox" where a naive demographer concludes from cross-sectional data that people in Florida change their religious preference in their lifetime because the % Catholic among young people in Florida is very high, and the % of Jewish people is very high in the older population. What that demographer didn't take into account in his model is the number of retirees who move to Florida from other states. If that demographer had an additional data point (the % of people born in Florida) for the elderly population, then the chances of making a wrong conclusion would be substantially diminished.
What this "paradox" tells me is that we're missing data/information in the first model that is available to us when we look at driver lifetime data AND potentially that we are comparing the top predictor from two different modeling passes & asking "why aren't they the same"? My first thought is that we're comparing conclusions drawn from samples that do not contain the same information -- perhaps because it isn't available when you try to construct the 25,000 mile increment snapshots out of lifetime data.
© 2020 AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC Powered by
Badges | Report an Issue | Privacy Policy | Terms of Service
Most Popular Content on DSC
To not miss this type of content in the future, subscribe to our newsletter.
Other popular resources
Archives: 2008-2014 | 2015-2016 | 2017-2019 | Book 1 | Book 2 | More
Most popular articles
You need to be a member of AnalyticBridge to add comments!
Join AnalyticBridge