A Data Science Central Community
A very common approach to building and understanding customer segments is through the use of clustering techniques such as Principle Components Analysis (PCA). These clustering techniques will analyze your customer data and see if customers tend to cluster by certain features, or combinations of features. Through such an approach, a marketer can use clusters to define specific segments. For example, running a cluster analysis could end up showing two clusters: one with customers who have…Continue
Text (word) analysis and tokenized text modeling always give a chill air around ears, specially when you are new to machine learning. Thanks to Python and its extended libraries for its warm support around text analytics and machine learning. Scikit-learn is a savior and excellent support in text processing when you also understand some of the concept like "Bag of word", "Clustering" and "vectorization". Vectorization is must-to-know technique for all machine leaning learners, text miner…Continue
Added by Manish Bhoge on September 25, 2013 at 12:30pm — No Comments
Among many inspiring talks at the Social Good Summit in NYC today, several will focus on the crucial theme of Data and Social Good. If you have a moment and are interested in the growth of data-driven social good, I recommend tuning into the livestream here at the following times:
Added by Jaime Fitzgerald on September 24, 2013 at 10:00am — No Comments
Do the maverick mavens need managers?
The McKinsey and company report “Big data: The next frontier for innovation, competition and productivity” (May 2011) is a well publicized and circulated one on the internet .
The report projected that the demand for deep analytical positions in a big data world in the United States could exceed the supply based on the trends seen ( in 2011) , by 140,000 to 190,000 positions. While this was…Continue
Added by Somjit Amrit on September 23, 2013 at 1:53am — No Comments
In this series I reveal Natural Laws of Intelligence contained within grammar, that can be utilized to unleash intelligence through natural language in software. These laws are extremely simple, but still undiscovered by scientists.
Experts in knowledge technology should be familiar with the DIKW Hierarchy (Data, Information, Knowledge, Wisdom):…
Added by Menno Mafait on September 20, 2013 at 2:00am — No Comments
Google is the god of search, and most businesses are doing all they can to propitiate the search engine giant. Most of us turn to Google when we want an answer to almost any question, and the great god supplies it. Millions of businesses around the world want their website to show up when potential customers or visitors search for a specific set of keywords. And so we invest lots of time and effort in SEO.
Many business that had spent…Continue
Added by Rajveer Singh Rathore on September 19, 2013 at 10:39am — No Comments
Comparing crop acreage harvested per county, in US, 1948-1952 vs. 2008-2012. The article was posted in USA Today with the title Climate Change Changing Agriculture. It is an interesting visual presentation (in USA Today) as you can superimpose the two images for better comparisons. Here, you…Continue
An executive from IBM recently highlighted the need for more rigorous preparation for Big Data analytics within and beyond the financial industry inan article in the Wall Street Journal. The article outlined the dire need for qualified data scientists, how qualified business and finance students are, and how even liberal arts majors can and should be trained to…Continue
Added by Vincent Granville on September 17, 2013 at 8:00pm — No Comments
Extracting meaningful insights from data to address business needs has benefited immensely from the availability of data visualization tools that have data more approachable. Today the proliferation of off-the-shelf tools, which are easy to learn and are web enabled, have democratized the way data is presented and consumed. Tools like Spotfire, Tableau, Qikview have helped breathe life into data. They provide a professional look and feel and give an inherent feel of fidelity of the data…Continue
I have just returned from the first ever ICHI (= International Conference on Healthcare Informatics) at which I discovered a "new generation" of healthcare researchers and policymakers, mostly under the age of 35 ... BUT with an UNDERSTANDING and PASSION of what needed to take place to get out of being in about 33rd place in the world in the quality of health care delivery and replacing it with ACCURACY / PATIENT CENTERED EFFECTIVENESS / and EFFICIENT COST EFFECTIVE processes that match what…Continue
Added by Gary D. Miner, Ph.D. on September 16, 2013 at 2:00pm — No Comments
Bootstraps, Permutation Tests, and Sampling Orders of Magnitude Faster Using SAS, Computational Statistics-WIREs, Vol. 5, Issue 5, 391-405. Download @ http://www.datamineit.com/DMI_publications.htm
While permutation tests and bootstraps have very wide-ranging application, both share a common potential drawback: as data-intensive resampling methods, both can be runtime prohibitive when applied to large or even…
Added by J.D. Opdyke on September 16, 2013 at 8:27am — No Comments
Increasingly sophisticated analytics tools and methods are available to derive business insight from data. However, as a discipline which drives insight from data, the crucial ‘last step’ in the analytics process is about organizational decision making. A sophisticated, intensive analysis may all be…Continue
Statistics.com, a provider of online education in statistics and analytics, announces a partnership with CrowdANALYTIX, a predictive modeling “managed crowdsourcing” company, offering a new online course, “Applied Predictive Analytics in partnership with CrowdANALYTIX“, which will run from Oct. 11 to Nov 8, 2013.
The goal of this course is to teach users (who have basic knowledge of R programming, predictive analytics…Continue
Added by Janet Dobbins on September 11, 2013 at 10:59am — No Comments
Ronald Coase died last week. Coase was an economist and a Nobel laureate, not someone you would typically associate with modern data analytics. Still, Coase is noted in the field for coining the phrase, “If you torture data long enough it will confess.”
Coase’s quote, and his career, are reminders that analysis can have repercussions that go beyond the screen and the analysis, and have impacts on the work that we do…Continue
Added by Radhika Subramanian on September 10, 2013 at 1:53pm — No Comments
Barnes & Noble is one of the 500 largest companies in the world. It operates 1350 bookstores (730 stores in cities and 630 in campuses) and the largest online bookstore, with roughly 10 million customers, sells 300 million books per year, and offers a 6 million references catalog.
The company goals are doing better than the competition (especially better than Amazon), and dominate the eBook market. For this the company aims to thoroughly analyze and control its…
Added by Michel Bruley on September 9, 2013 at 2:18am — No Comments
The Comprehensive Analysis of Time Series (CATS) is an increasingly important use case in the field of Big Data analytics. Cat videos on the Internet notwithstanding, the prevalence of time series is perhaps even more universally ubiquitous in big data applications: customer purchase histories, web click logs, social events, human behaviors, speech patterns, weather reports, climate science, numerical simulation science, spread of infectious diseases, market…Continue
Added by Kirk Borne on September 6, 2013 at 9:30am — No Comments
Added by Vincent Granville on September 5, 2013 at 1:00pm — No Comments
No predictive model is going to be 100% accurate unless by chance. The nature of predictive modeling is to learn from the past and see into the future. Essentially, predictive modeling is just modeling. Think about why we use statistical models - so we can fit the data into a pattern of behavior and anticipate future results. It's all about how you use and interpret this model.
Crime analysts may use a tool similar to the following example on a robbery…Continue
Added by Nicole on September 5, 2013 at 12:00pm — No Comments
Conditionally formatting each row individually is an issue that I struggled with for some time and finally found an answer. I have a table that lists 28 different activities by day of the week. On the report I need to highlight the day with the highest count per activity.
The solution is to essentially conditionally format each row to highlight the highest number. But who wants to take time formatting 28 rows? Plus, there are several other cities to analyze. So it’s…Continue
Added by Nicole on September 5, 2013 at 5:29am — No Comments