A Data Science Central Community
I was checking the IRS forms you have to fill in Massachusetts. Everybody has to fill out a 3-page tax form, a 3-page form to prove your health care compliance status, and one additional page to compute your health care tax penalty if needed.
For 85% of the state residents, it's probably not too complicated. Just a painful homework you need to do each year. Yet for some self-employed, under-insured, uninsured, part-time workers, and people filing jointly, it gets much more complicated than the math homeworks you got in high school.
Assuming similar forms will have to be filed by ALL Americans in 2014, we are talking about mathematical headaches for dozens of million of people. These forms require complex computations e.g. about how many months in a row you received "officially approved" health insurance, how many multi-month gaps you had in terms of coverage, how to measure a month (15 days count as month, 14 does not), how much time you were a Massachussets residents etc.
The form actually has questions that have flaws, such as question 10: "Did your employer (or your spouse’s employer if married filing jointly) offer affordable health"? What if you and your spouse had three different employers in 2010, plus unemployment time periods?
You can avoid the penalty if you under-pay taxes year after year, as the only way the State agencies can recoup your penalty is by not sending your your due tax refunds. After all, some states (California) are taxing tax refunds, so it is a good strategy to under-pay. You can also avoid the taxes for religious reasons. In my case, my religion (www.mathematology.com) promotes prevention and good diet over cure, recommends to save money for your kids rather than burn it on expensive terminal cancer treatments, warns you about the dangers of data collection and storage in unsafe government databases, and encourages you to avoid the spread of drug-resistant germs. It's not an "official" religion though, so I'm not sure I could claim exemption based on my beliefs.
You can check these tax forms at:
My point here is that you are going to have many million of tax returns that need to be automatically processed for accuracy, consistency, matching insurance ID's provided by tax filers with insurance ID's provided by insurance companies. Fraud detection will be a big problem to be solved: people finding loopholes to avoid penalties, or artificially reducing their income, and fake health insurance companies targeting the poor and uneducated. The IRS can't even figure out that I paid my $8,000 tax balance two weeks after their cashed it (as if they work with two different computer systems that take more than 2 weeks to communicate between each other), so how can we expect them to properly handle the data flood (a data tsunami indeed) that is coming?
So who will provide solutions for this big data problem? The good news: good times are ahead for data scientists.