I was born in Michigan in 1961, went to the University of Michigan for my bachelors, and then post-graduate studies in pure mathematics at Wisconsin and Illinois. I finally got tired of working on topics that I couldn't talk to even my office mates about, so I got out with a MS in statistics. My first real job was fraud analysis for Medicare. I spent most of the 90's working for a bleeding-edge data mining firm called NeoVista and later Accrue. This century I've worked in health research and managed modeling groups for telecoms and financial services, working on marketing problems. I am currently a VP of direct marketing modeling at Washington Mutual.
Which analytical fields are likely to experience growth, and why?
Nearly everything. What I've seen happen is the acceptance of the idea that advanced analytics can solve problems. I no longer have to convince people that models can help.
Which methodologies might become obsolete, which ones are likely to entertain growth?
When I first started going to the KDD conferences, the paper topics were new algorithms that people were thinking up. Later, the focus become general purpose approaches to help the entire analytic process. Lately, the papers were all about individual problems and the intricacies of applying advanced methods to actual issues.
I don't think any method is going to fall by the wayside. After all, I use the Student's T test nearly every day, and that's about as hoary as it gets. What has fallen by the wayside is the idea that we can do much of anything by shoving a data set at an algorithm.
Methods have to match problems. For some problems such as credit scoring it is absolutely necessary to give reasons for a score as well as the score itself, which means that linear methods are the way to go. Other problems (character recognition is the classic here) need some pretty high-powered methods.
In order to make sure the method fits the problem we have to know not just the data structure but the underlying domain as well. Being able to make domain knowledge and methodological approaches work together – that's the challenge.
Recommendations for students starting an analytical career or choosing a University curriculum
That's hard. When I'm looking to hire people I usually ask for a MS in stats or econ or the like, but I know lots of people in the field without any post-grad degrees at all.
Your opinion obout outsourcing
Outsourcing lets people from all over the world work on challenging problems and advance their career. I'm not sure how well it works out for the US companies.
Do you remember what I said earlier about combining domain knowledge with analytic methods? Kind of by definition, outsourced analysts can't provide the knowledge particular to a situation. That means the onshore company needs to have people that have both deep domain knowledge and at least a reasonable understanding of analytic methods to manage the work. The onshore managers also have to have very good project skills. Typically a company that's receiving the outsourced work will also have onshore managers, and they also have to understand analytic work. It is difficult to find people like that.
The hard part of analytic work – putting data sets together – is very difficult to outsource. I have had poor experiences in the past with letting others put data sets together for my projects without very tight supervision. Predictive analytics demands a unique perspective on data that most people don't have.
The short answer is that companies can successfully outsource parts of the analytic process but not the whole thing. This actually creates a lot of opportunities for people that are versed in analytic methods, good at communicating, and have decent project management abilities.
What are the biggest successes of data mining and statistical sciences in the corporate world, or for humanity in general? What are its biggest failures?
I'm going to differentiate between statistical sciences and data mining here, because very simple statistical tests have brought us modern experimental medicine and statistical quality improvement. It's hard to beat that.
For data mining and predictive analytics, I think the biggest success and the biggest failures have been in credit predictions. FICO scores are amazing: the fact that the models work that well and are that steady has revolutionized financial services. On the other hand, we have the fiasco of mortgage lending that is making rather depressing headlines, because the credit ratings on mortgage instruments were completely out of whack.
Comments on data mining and privacy
There is too much pressure on getting data on people. For marketing, security, medicine – there are a million reasons why data creates opportunity. I think that no matter what brakes we put on data transfer, in the end people will find ways around the barriers. Depressing but there you have it.
Think in terms of the ends that people want to perform with the data. Is it generally beneficial? For instance, I read recently an article about how we should prevent drug companies from using data mining to market products to doctors. The real issue isn't the data mining but how drug companies should be relating to doctors in the first place.
Best practices for analytic professionals: what are the most important items?
Think! Think about what you are doing. If it works, think about why it works. If it didn't work, think about why it didn't work. Think about the organization you are working for and all the problems it is trying to solve. Think about all the problems it isn't trying to solve but could.
Talk! Talk to people about what your experiences are, listen to their response, talk and listen to the people in your organization and profession. Even if it isn't directly related to a problem you are working on, you never know where it can lead.
Edmund's profile on AnalyticBridge: www.analyticbridge.com/profile/EmundFreeman