Subscribe to DSC Newsletter

Featured Blog Posts – April 2013 Archive (20)

Shooting stars

This is a follow up to our video series From chaos to clusters, made with data points moving over time to form clusters, and produced with open source and home-made data science algorithms.…


Added by Vincent Granville on April 29, 2013 at 7:00pm — 1 Comment

Predicting the reality show winner using big data

I wonder, using big data and predictive analytic, can we predict the winner of x-factor or American Idols from the start of their audition performance? I think we might have a good chance to predict the winner right away.

What if we could only have the information from their first performance, what should be the variables to be used in the predictive model? Here’s from what I could think of:

  • The voice: quantified timbre, energy and rhythm
  • Song…

Added by Eka Aulia on April 29, 2013 at 9:25am — No Comments

Hadoop Herd : When to use What...

8 years ago not even Doug Cutting would have thought that the tool which he's naming after the name of his kid's soft toy would so soon become a rage and change the way people and organizations look at their data. Today Hadoop and BigData have almost become synonyms to each other. But Hadoop is not just Hadoop now. Over the time it has evolved into one big…


Added by Mohammad Tariq Iqbal on April 25, 2013 at 6:55pm — No Comments

From chaos to clusters - statistical modeling without models

Here I provide the mathematics, explanations and source code to produce the data and moving clusters in the From chaos to clusters video series.…


Added by Vincent Granville on April 24, 2013 at 11:00pm — 8 Comments

Hadoop+Ubuntu : The Big Fat Wedding.

Now, here is a treat for all you Hadoop and Ubuntu lovers. Last month, Canonical, the organization behind the Ubuntu operating system, partnered with MapR, one of the Hadoop heavyweights, in an effort to make Hadoop available as an integrated part of Ubuntu through its repositories. The partnership announced that…


Added by Mohammad Tariq Iqbal on April 20, 2013 at 8:10pm — No Comments

Take-aways from IE's first Predictive Analytics Summit in the Asia-Pacific

Having asked for a budget for it with special approval, I, of course, would take whatever the Event offered in the short-but fruitful 2 days in Hong Kong during 18th to 19th April, 2013. My overall feedback is positive and will recommend companies to spend their training budget for Analytics people to come to here instead of staying in the classroom to learn Stat 101.  This event meant to be for particioners

The Analytics market in…


Added by Jeffrey Ng on April 20, 2013 at 7:00pm — No Comments

Weekly Digest - April 22

Sponsored Announcements


Added by Vincent Granville on April 20, 2013 at 6:00pm — No Comments

Should business and engineering schools develop joint programs?

Businesses are increasingly using data-driven methods to make business decisions. Hence, there is a need for people with both good business skills and programming/quant skills. Finance/Accounting PhDs  and other business PhDs do have such skills, but they are few in number, are costly to hire, and the majority anyway prefer academia. This limits businesses to mainly hire bachelors or masters level candidates.

However, a majority of the…


Added by Srinivasan Krishnamurthy on April 19, 2013 at 10:30am — 2 Comments

The amateur data scientist and her projects

With so much data available for free everywhere, and so many open tools, I would expect to see the emergence of a new kind of analytic practitioners: the amateur data scientist.

Just like the amateur astronomer, the amateur data scientist will significantly contribute to the art and science, and will eventually solve…


Added by Vincent Granville on April 17, 2013 at 5:00pm — No Comments

How do you solve your "Pre-Etl" Source to target mapping problems?

It's one of the integration problems that most of the big palyers in the industry have pretty much left untouched, Anyone working in the data integration / data warehousing industy understands that when you build a data warehouse, you have to create these complex pre-ETL source mappings before the ETL developers start work. The way most organizations do this is with spreadsheets. Every organization has an exorbitant amount of spreadsheets that they use to document this stuff. Once…


Added by Mohammad Azad on April 15, 2013 at 2:30pm — 1 Comment

Unleashing Intelligence through natural language (Part 2 - Autonomously generated conclusions)

In this series I reveal rules of intelligence contained within grammar, and explain how they can be utilized to unleash intelligence in software. These rules are extremely simple, but still undiscovered by scientists.

Under certain conditions, three types of conclusions that can be generated autonomously:

1) Specification substitution conclusion:

• Given "John is a father" and "A father is a man";

• Because of the common word…


Added by Menno Mafait on April 15, 2013 at 2:48am — No Comments

Is your data really Big(Data)??

The advent of so many noticeable tools and technologies for handling BigData problems has made the lives of a lot of people and organizations easier. A lot of these are open source, they have good support, good community and are pretty active. But there is another aspect of it. When things become easy, free, with good support and in abundance,  we often start to over-utilize them. Having said that, I would like to share one incident.

We organize …


Added by Mohammad Tariq Iqbal on April 14, 2013 at 4:05pm — No Comments

Three classes of metrics: centrality, volatility, and bumpiness

All statistical textbooks focus on centrality (median, average or mean) and volatility (variance). None mention the third fundamental class of metrics: bumpiness.

Here we introduce the concept of bumpiness and…


Added by Vincent Granville on April 14, 2013 at 2:30pm — 7 Comments

How to better compete with other data scientists

In two weeks, you can greatly improve your resume by learning new stuff, at no cost, without attending any classes. All the links below contain actual material to get you started.

Check out the following resources:…


Added by Vincent Granville on April 12, 2013 at 10:30am — No Comments

BI reporting tools: To Pay or Not To Pay.

Free VS Non-Free

Our company recently implemented a BI Reporting solution intended for a retail company with both, an offline retail chain and an online web-shop. This was a Demo project, so during the planning phase our team decided to use two different platforms, including different back-end and front-end software.

In general, we planned to compare the free and non-free software available for similar BI Reporting implementation.

We chose the following platforms for…


Added by Alexey on April 8, 2013 at 4:58am — No Comments

Google search: three bugs to fix with better data science

These big data problems probably impact many search engines. It also proves that there is still room for new start-up to invent superior search engines. These problems can be fixed with improved analytics and data science.

Here are the problems, and the solutions:…


Added by Vincent Granville on April 7, 2013 at 11:00am — 6 Comments

Why the Quantified Will Inherit the Earth

I"m the co-founder of Koalify, a personal analytics startup.  Check out to start improving your life with personal analytics today.

Someone I respect told me “it’s easier to sell painkillers than vitamins.”

He’s right, but it’s also an oversimplification.  Today’s skipped vitamins are tomorrow’s painkillers – just ask retail execs who didn’t take big data seriously during the rise of Walmart.  There…


Added by Dave Heller on April 5, 2013 at 9:01am — No Comments

Stat models to solve astronomical mysteries - application to business data

If you look at the picture below (Pleiades constellation), you will see - with the naked eye - that many star systems appear to be binary: that is, involving two (or more) stars orbiting around each other.

Is this a coincidence, or can we prove that from a statistical point of view, based on the theory of stochastic point…


Added by Mirko Krivanek on April 4, 2013 at 9:30pm — 1 Comment

Predicting Lying and Predicting Dying

Who benefits by predicting your behavior? Organizations do—companies, governments, hospitals, and political campaigns. They employ predictive analytics, technology that learns from data to render per-person predictions, one individual at a time.


People have been struck by the final words in the title of my new book on this subject, Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die (…


Added by Eric Siegel on April 4, 2013 at 1:53pm — No Comments

100 Savvy Sites on Statistics and Quantitative Analysis

From OnlineMathDegrees. It's broken down in different categories: 

  • Comprehensive Statistics Sites
  • Big Data & Machine Learning…

Added by Vincent Granville on April 2, 2013 at 9:00pm — No Comments

Featured Monthly Archives














On Data Science Central

© 2020   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service