A Data Science Central Community
Let's say that you have a large number n of elements a, b, c, etc. and you want to group them into clusters. Each cluster is supposed to contain few elements, say O(1).
You have one similarity metric d(a,b) to compare any two elements a, b. Also, you have a list of all pairs where d(a,b) > threshold, or in other words, all pairs (a,b) where a and b belong to the same cluster. The n x…Continue
See attached document, including the theorem, its proof and applications to business analytics (e.g. to produce model-free, data-driven confidence intervals for predictive scores). More explanations coming soon, in particular about how to leverage this deep statistical result when computing metrics against very large data sets.
Here are potential issues:
If you have more than 100 friends on Facebook, you've probably noticed that Facebook always show up the same 20 friends on your profile page, day after day. FB actually shows up to 10 friends, but they rotate from a list of 20 friends that, according to FB data mining algorithms, are deemed to be your best friends.
What makes a connection become one of your FB best friend is how frequently she visits your profile. Your can influence this list to some extent, by posting comments…
Added by Vincent Granville on May 28, 2011 at 6:30pm — No Comments
ARMONK, N.Y., May 20, 2011 /PRNewswire/ -- As companies seek to gain real-time insight from diverse types of data, IBM (NYSE: IBM) today unveiled new software and services to help clients more effectively gain competitive insight, optimize infrastructure and better manage resources to address Internet-scale data. For the first time, organizations can…Continue
Added by Vincent Granville on May 28, 2011 at 10:58am — No Comments
Added by Vincent Granville on May 24, 2011 at 6:15pm — No Comments
The American Statistical Association and CHANCE magazine have debuted The Statistics Forum, a blog to provide everyone the opportunity to participate in discussions about probability and statistics and their role in important and interesting topics. The blog, which is located on the CHANCE web site atchance.amstat.org, is edited by Andrew Gelman. Everyone is invited to read and comment on the…Continue
Added by Vincent Granville on May 19, 2011 at 5:43pm — No Comments
The American Statistical Association (ASA), the nation's preeminent statistical society, urges members of the House of Representatives to support the Statistics Teaching, Aptitude and Training Act of 2011 (STAT Act of 2011), which was introduced today by Congressman Dave Loebsack (D-Iowa). A copy of the bill may be viewed at…Continue
Added by Vincent Granville on May 19, 2011 at 5:41pm — No Comments
Math majors, rejoice. Businesses are going to need tens of thousands of you in the coming years as companies grapple with a growing mountain of data.
Data is a vital raw material of the information economy, much as coal and iron ore were in the Industrial Revolution. But the business world is just beginning to learn how to process it all.
The current data surge is coming from sophisticated computer tracking of shipments, sales, suppliers and customers, as well as e-mail, Web…Continue
Added by Vincent Granville on May 14, 2011 at 9:41am — No Comments
Analyzing large data sets—so called big data—will become a key basis of competition, underpinning new waves of productivity growth, innovation, and consumer surplus as long as the right policies and enablers are in place.
Research by MGI and McKinsey's Business Technology Office examines the state of digital data and documents the significant value that can potentially be unlocked.…Continue