A Data Science Central Community
In a recent post of mine I had mentioned about the partnership between MapR and Canonical towards an initiative to make Hadoop available with Ubuntu natively through Ubuntu Partner Archive. Since, the package has been released now, I thought of showing how to get it done. Trust me it's really cool to…Continue
Added by Mohammad Tariq Iqbal on April 30, 2013 at 7:17pm — No Comments
This is a follow up to our video series From chaos to clusters, made with data points moving over time to form clusters, and produced with open source and home-made data science algorithms.…Continue
from a practitioner's perspective, is that it is a measure of noise, a detector of outliers that may show up as unaccounted-for noise, from the way, say, a process is producing the data, even a data-entry process, or some other force or process/system giving rise to that (those) particular noise(s). Yes?
So the thing would be try various tactics for reducing bumpiness, maybe by screening those outliers, etc., and even running a TSA on the residuals after factoring or…Continue
I wonder, using big data and predictive analytic, can we predict the winner of x-factor or American Idols from the start of their audition performance? I think we might have a good chance to predict the winner right away.
What if we could only have the information from their first performance, what should be the variables to be used in the predictive model? Here’s from what I could think of:
Added by Eka Aulia on April 29, 2013 at 9:25am — No Comments
8 years ago not even Doug Cutting would have thought that the tool which he's naming after the name of his kid's soft toy would so soon become a rage and change the way people and organizations look at their data. Today Hadoop and BigData have almost become synonyms to each other. But Hadoop is not just Hadoop now. Over the time it has evolved into one big…Continue
Added by Mohammad Tariq Iqbal on April 25, 2013 at 6:55pm — No Comments
Here I provide the mathematics, explanations and source code to produce the data and moving clusters in the From chaos to clusters video series.…
Now, here is a treat for all you Hadoop and Ubuntu lovers. Last month, Canonical, the organization behind the Ubuntu operating system, partnered with MapR, one of the Hadoop heavyweights, in an effort to make Hadoop available as an integrated part of Ubuntu through its repositories. The partnership announced that…Continue
Added by Mohammad Tariq Iqbal on April 20, 2013 at 8:10pm — No Comments
Having asked for a budget for it with special approval, I, of course, would take whatever the Event offered in the short-but fruitful 2 days in Hong Kong during 18th to 19th April, 2013. My overall feedback is positive and will recommend companies to spend their training budget for Analytics people to come to here instead of staying in the classroom to learn Stat 101. This event meant to be for particioners
The Analytics market in…Continue
Added by Jeffrey Ng on April 20, 2013 at 7:00pm — No Comments
Added by Vincent Granville on April 20, 2013 at 6:00pm — No Comments
Businesses are increasingly using data-driven methods to make business decisions. Hence, there is a need for people with both good business skills and programming/quant skills. Finance/Accounting PhDs and other business PhDs do have such skills, but they are few in number, are costly to hire, and the majority anyway prefer academia. This limits businesses to mainly hire bachelors or masters level candidates.
However, a majority of the…Continue
Added by Vincent Granville on April 17, 2013 at 10:30pm — No Comments
With so much data available for free everywhere, and so many open tools, I would expect to see the emergence of a new kind of analytic practitioners: the amateur data scientist.
Just like the amateur astronomer, the amateur data scientist will significantly contribute to the art and science, and will eventually solve…Continue
Added by Vincent Granville on April 17, 2013 at 5:00pm — No Comments
It's one of the integration problems that most of the big palyers in the industry have pretty much left untouched, Anyone working in the data integration / data warehousing industy understands that when you build a data warehouse, you have to create these complex pre-ETL source mappings before the ETL developers start work. The way most organizations do this is with spreadsheets. Every organization has an exorbitant amount of spreadsheets that they use to document this stuff. Once…Continue
If you attract thousands of new customers this is worthless if an equal number leaves. Minimizing customer churn is surely a smart objective. But how can I predict when my customers will churn and did Big Data could help?
Facing this topic I have made a personal research, and realize a synthesis, which has helped me to clarify some ideas. The attached presentation does not intend to be exhaustive on the subject, but could perhaps bring you some useful…Continue
Added by Michel Bruley on April 15, 2013 at 6:36am — No Comments
Intellipaat provides Hadoop online Training.
Hi,We will start a new Hadoop Developer batch from 20th april’13. Certification will be provided after successful completion of training.
Interested candidates please drop an email for registration at [email protected] or give us a call.
Sales Intellipaat Team
Visit us at www.intellipaat.com.…Continue
Added by soniya on April 15, 2013 at 5:06am — No Comments
In this series I reveal rules of intelligence contained within grammar, and explain how they can be utilized to unleash intelligence in software. These rules are extremely simple, but still undiscovered by scientists.
Under certain conditions, three types of conclusions that can be generated autonomously:
1) Specification substitution conclusion:
• Given "John is a father" and "A father is a man";
• Because of the common word…
Added by Menno Mafait on April 15, 2013 at 2:48am — No Comments
The advent of so many noticeable tools and technologies for handling BigData problems has made the lives of a lot of people and organizations easier. A lot of these are open source, they have good support, good community and are pretty active. But there is another aspect of it. When things become easy, free, with good support and in abundance, we often start to over-utilize them. Having said that, I would like to share one incident.
We organize …
Added by Mohammad Tariq Iqbal on April 14, 2013 at 4:05pm — No Comments
All statistical textbooks focus on centrality (median, average or mean) and volatility (variance). None mention the third fundamental class of metrics: bumpiness.
Here we introduce the concept of bumpiness and…Continue
In two weeks, you can greatly improve your resume by learning new stuff, at no cost, without attending any classes. All the links below contain actual material to get you started.
Check out the following resources:…Continue
Added by Vincent Granville on April 12, 2013 at 10:30am — No Comments
Our company recently implemented a BI Reporting solution intended for a retail company with both, an offline retail chain and an online web-shop. This was a Demo project, so during the planning phase our team decided to use two different platforms, including different back-end and front-end software.
In general, we planned to compare the free and non-free software available for similar BI Reporting implementation.
We chose the following platforms for…Continue
Added by Alexey on April 8, 2013 at 4:58am — No Comments