Subscribe to DSC Newsletter

Let's say that the average lifespan of a human being is L = 70 years. The probability of dying this year, at any given age y, is p(y). Let's assume that the number of people that are y years old is M(y), and that p(y) is a monotonic decreasing function of age.

If we have 7 billion human beings on Earth today, given these assumptions, how many will die this year?

The answer is N = SUM{M(y) * p(y)}. Now the question is: what is the lower and upper bounds for N (number of people who will die this year), regardless of the functions M() and p(), provided p() is monotonic decreasing, and that average age at death is L = 70 years.

This is indeed a complex mathematical problem. You might even substitute this problem by its continuous version (if it makes it easier to solve), where M and p are functions with continuous rather than discrete arguments. It's a typical calculus of variations problem, and not the easiest kind.

Actuaries - who are data scientists - would typically predict N for the next 50 years using birth and death processes. But here we assume that you only have access to death statistics, not birth rates. And the model can be refined by breaking down into a few segments: men versus women, and developed countries versus others. Of course, M is indirectly related to p within each segment (there's a feedback loop), so hierarchical models incorporating birth rates would be ideal, and would yield more accurate lower and upper bounds (typically, they would yield a shorter confidence interval).

Also the assumption that p is monotonic is a very rough approximation, and reality is different: a 16-year old might be at a higher risk of death than a 26-year old, due to car accidents and suicide: in short, p(16) > p(26). A lot of babies also die before 1. But it is a good approximation nevertheless. Anyway, I included a spreadsheet with some simulation, to illustrate a moderately realistic example (simulation), excluding people younger than 1, modeling p and M with exponentially decreasing functions (M is exponentially decreasing in Africa but no longer in countries such as US where population is aging), and I came to the following conclusion: 135 million die each year (135 million = 1.93% x 7 billion, with 1.93% being cell K2 in the spreadsheet). It turns out that the real value is 53 million.

Here I ask you to do more than getting one measurement: instead, computing the absolute minimum and absolute maximum for N given that L = 70 years = average life span for people who will die this year. Regardless of age distribution (the function M) and death probabilities (the function p).

For those interested, the question first arisen as I was trying to assess if the sheer and exploding number of deaths (a result of exponential population growth) contributed to climate change, via gases (methane in particular) resulting from decomposition - especially when you add dogs to the equation (smaller animals, but dying at a faster rate).

You can find our previous challenges of the week in the resources below.

DSC Resources

Additional Reading

Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge

Views: 2370

Reply to This

Replies to This Discussion

The same models can be used for churn, user retention and lifetime value of a customer.


Follow Us

On Data Science Central

On DataViz

On Hadoop

© 2017 is a subsidiary and dedicated channel of Data Science Central LLC   Powered by

Badges  |  Report an Issue  |  Terms of Service