Artist and scientist Stephen von Worley announced the launch of a new blog, Data Pointed, showcasing his data visualization research. "One part magazine and two parts blog", the site tells the story of von Worley's own data visualizations, and surveys choice picks from others.
The lead story covers the results of the XKCD color name survey, illustrating the entertaining differences in the way different genders refer to colors.
Becoming a data scientist
The community Q&A site Quora is rich with information about data science, analytics and computing. An especially illuminating answer was given this week to the question How do I become a data scientist — how does someone with a computer science background get the math and statistics knowledge required for data science?
Providing an extensive reply, Alex Kamil gives eight points from his perspective as an undergraduate student. Many of these reference statistics and math, and Kamil provides an excellent list of papers, websites and technologies to tinker with.
Several of Kamil's suggested starting points struck me as common themes among those who define themselves as data scientists:
- Start learning statistics by coding with R: whatever the size of the data you're working with, many data analysts perform and prototype investigations using the R language. Some will later translate these into larger map-reduce jobs to be run on Hadoop, for instance. R provides a hands-on way for developers to teach themselves statistics in practice.
- Linear algebra: a grounding in linear algebra is common among many data scientists, and important because matrix math underpins many data mining applications, such as the famousPageRank.
- Machine learning: allowing computers to alter behavior based on input data is fundamental to many innovative data-based products and services. Many developers start this ad-hoc, but there is much available literature. Kamil references Bradford Cross' extensive list of machine learning resources.
There are many more starting points referenced in the full answer.
The field of data science is a place where book learning meets code and produces results. In the words of Kurt Lewin: "There's nothing so practical as a good theory."
Strata: The Business of Data
If you enjoyed any of the previous items, stay tuned — we're excited to announce the launch of Strata, an O'Reilly conference focusing on the business and practice of data. The conference will be held in Santa Clara, Calif. from Feb. 1-3, 2011.
At O'Reilly, we believe that the future belongs to those who understand how to collect and use their data successfully. There's a change in both the skills of data analysts and the technology they use that's sweeping through industry and science. Our aim with Strata is to be the defining event for that change: for practitioners, businesses and data vendors.
The call for participation is open until Sept. 28. We're looking for proposals from practitioners, business leaders, analysts, designers, and developers covering the spectrum of data business and practice. Suggested topics include:
- Distributed data processing, Hadoop ecosystem
- From research to product
- Streaming data processing
- Becoming a data-driven organization
- Data science best practices
- Data acquisition, cleaning, distribution and markets
- Machine learning
- Training and recruitment of data scientists
- Applications, case studies, and cautionary tales
- Visualization and design principles
- Augmented reality and immersive interfaces
- Data protection, privacy, and policy
- Changing role of business intelligence
Send us news
Email us news, tips and interesting tidbits at [email protected].