Subscribe to DSC Newsletter

Uniquely identify a human being with two questions

Here are two multiple-choice questions that could be used to uniquely characterize each human that will ever exist on Earth. Even twins will have different answers. It is expected no two human beings to have the same answers.

First question: Order the following types of food, from your favorite (#1) to the one you like least (#9). Possible choices: fruit, vegetable, dairy, carbohydrate, red meat, poultry, fish, seafood, dessert.

Second question: Order the following types of environment, from your favorite (#1) to the one you like least (#9). Possible choices: beach, mountain, desert, plain / rural, urban, small town, lake / river bank, hills, forest.

The number of potential answers (that is, the number of potential orderings) for each question is factorial 9. The total number of potential answers for both questions is square of factorial 9, that is 132 billion.

Of course some combinations are more likely to appear than others, some people will have a hard time ranking and would rather allow for ties, and if you've lived all your life in the same place eating the same food, you can't correctly answer these questions. Same if you are a little kid. But for most of us, this works and could even be used by companies such as match.com or advertisers. Also, this type of ID has the following advantages:

  • It is universal (it could even apply to dogs),
  • It is personal unlike arbitrary social security numbers,
  • You know what's in your ID (government IDs such as SSN might be hiding some encoded data about you, in your ID, for profiling purposes) 
  • It's easy to retrieve if lost (at least partially, which might be good enough) by answering the two questions
  • Unlike genome, this ID is (to a large extent) is independent from gender and race (or age)

It may change over time as tastes change, but I think this is OK, your ID follows your personality. You might want to add a third question (maybe about favorite colors or climates) to increase the discriminating power, but I think it is not necessary.

Potential Improvement

Another option is to have more questions with fewer choices. For instance, 8 questions each with 4 choices (rather than 2 questions, each with 6 choices) would allow for pretty much the same number of unique IDs (a bit above 100 billion) but would be less error-prone, as people are more likely to correctly remember how their rank 4 items (e.g. colors), rather than 6 items. If you allow for only 2 choices per question, then you would need to ask 37 questions to cover 100+ billion unique IDs.

Experimental design to choose good questions and good choices 

The possible choices (answers) should be determined using experimental design and testing, not the other way around. Let's say that your first question is about food, with two choices: fish versus dirt. You do a test, you realize everybody rank fish as #1.  The test tells you that this is not a good, there will be lots of people with same ID. You change you choices from fish/dirt to fish/meat. Now you see that the distribution is more uniform. You continue testing till you have something good enough.

You can even test choice stability: Ask a person to rank 9 choices today and in 7 days, retain the choices that

  1. are most stable over time and
  2. provide an even distribution (or as close as possible to uniform distribution)

Related articles

Views: 10974

Comment

You need to be a member of AnalyticBridge to add comments!

Join AnalyticBridge

Comment by Vincent Granville on December 16, 2013 at 10:24am

Also, to make sure answers are well distributed (with no concentration), you need to do some experimental design and testing first, to select the choices. For instance if you have two choices for food - lobster and dirt - everybody will choose lobster.

Comment by Vincent Granville on December 15, 2013 at 5:30pm

@Peter: SSN or credit card numbers are meaningless sequences of digits, hard to remember. I don't remember the number of any of my credit cards, driving licenses and SSN's from UK and Belgium.

Comment by Peter Lane on December 15, 2013 at 5:23pm

Such an ID would not only change over time, but also be subject to occasion-to-occasion variation, unless you take the trouble to actually memorize how you answer the questions the first time. I find it hard to put these lists into order, and have to make arbitrary decisions: whether I prefer red meat to seafood will depend on whether I think of a fillet steak and whelks, or minced pork and lobster, for example, and it will also depend on what I have eaten most recently and what mood I'm in. In other words, to succeed as an ID, you have to memorize your answers. I guess such a set of anaswers would be easy to learn than a set of numbers as in traditional IDs, however.

Comment by Vincent Granville on December 15, 2013 at 5:15pm

@Talbot: If the duplicate ID rate is less than 1  in 1,000 you've indeed a system that is as reliable as SSN or credit card numbers, regarding "uniqueness". Maybe SSN IDs are truly unique, but ask the people who got their ID or credit card number stolen, and you'll hear a different story about how "unique" these IDs are: both your ID thieve(s) and you share the same ID.

And pick up 3 questions rather than two, and you've pretty much eliminated risks of collision to virtually zero.

Comment by Talbot Katz on December 15, 2013 at 4:33pm

Hi Vincent!  There's an old parlor game we used to play, did you ever try this?  If you have a room full of people, 25 or 30 or more, ask everyone their birthday (just month and day, not year).  Chances are pretty good you'll find two people in the group with the same birthday -- if you assume uniform distribution of birthdays then the chances are better than fifty percent once you reach 23 people, and non-uniformity only makes the odds better of achieving a match.  So let's assume that the distribution of answers to your questions is uniform, so there's a 1 / 131681894400 chance of choosing each possible answer.  Now let's play the birthday game.  If you have a room or a field full of 428,000 people, like the Woodstock festival, perhaps, then you're likely to find two people with the same combination.  Not enough choices to guarantee uniqueness in the US population, let alone the whole world.  You need to jack up the number of choices to make it work.  Then it gets to be too much for people to remember.  But I like the way you think, maybe you can tweak this proposal a bit and come back with something workable.  Good luck!

Comment by Vincent Granville on December 15, 2013 at 12:45pm

Here's my answer to a comment posted on LinkedIn:

How many people got they ID (including SSN) stolen? How many companies got their credit card database hacked, with dozens of million of credit card numbers stolen? Also, all you need to do to steal SSN or credit card numbers is set up a website, offer some bogus stuff, maybe fake job offers, and ask people to provide their SSN.

With three questions (rather than two), the odds of you having the same ID as someone else is one in 7 million, lower than the risk of a data glitch causing two people to have same SSN. And while family members might know what you like, they would be unable to reconstruct the exact order of your preferences, especially if using three questions.

Finally, databases containing SSN (e.g. health care or census data) might be sold to third parties (insurances, marketing agencies, enforcement agencies, IRS), making SSN less secure, more dangerous than most people think. Indeed, given the reluctance from government agencies to use my new ID, it would make it safer than SSN, with much smaller risks of leaks and undesirable side effects.

Comment by Lynne Mysliwiec on December 14, 2013 at 6:02pm

An interesting proposal, certainly.  Ordering questions like that can be susceptible to test/retest error. Provided that a person would always number the choices in the same order no matter how many times they were asked.  Not sure there are more than 7 billion permutations, however.

Comment by Vincent Granville on December 14, 2013 at 5:59pm

A few people (including myself in the original article) mentioned that such an ID would change over time. My comment: So is citizenship, name (for women), address and so on. Why should an ID be permanent? I change over time, it makes sense for my ID to change. So far I've got a SSN (social security number) in Belgium, one in UK and one in US, not to mention driving licenses in multiple states and countries. All of them incompatible. My food, color and environment preferences have been more stable than these government IDs. Also small changes do not mean that suddenly your ID matches one from some other person (because the universe of potential IDs is much bigger than human population). 

Comment by Vincent Granville on December 14, 2013 at 12:45pm

For a specific dog (or any pet with a brain), answering question #1 could be done with simple experimental design. For each lunch, select two items from the list. See what the dog eats. Let say that he picks red meat over poultry. Then it means he likes red meat better. Next day offer red meat and dairy. And so on, until you have enough pairwise comparisons to reconstruct your dog's full preference order for the 9 types of food. Answering question #2 might be more tricky, unless there's a way of measuring the level of satisfaction when the dog placed in different environments. This is a pure statistical problems (for dogs, or for humans unable to talk, with DOW syndrome etc.)

On Data Science Central

© 2019   AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service