A Data Science Central Community
Let's say that you have a large number n of elements a, b, c, etc. and you want to group them into clusters. Each cluster is supposed to contain few elements, say O(1).
You have one similarity metric d(a,b) to compare any two elements a, b. Also, you have a list of all pairs where d(a,b) > threshold, or in other words, all pairs (a,b) where a and b belong to the same cluster. The n x…Continue
See attached document, including the theorem, its proof and applications to business analytics (e.g. to produce model-free, data-driven confidence intervals for predictive scores). More explanations coming soon, in particular about how to leverage this deep statistical result when computing metrics against very large data sets.
Here are potential issues:
If you have more than 100 friends on Facebook, you've probably noticed that Facebook always show up the same 20 friends on your profile page, day after day. FB actually shows up to 10 friends, but they rotate from a list of 20 friends that, according to FB data mining algorithms, are deemed to be your best friends.
What makes a connection become one of your FB best friend is how frequently she visits your profile. Your can influence this list to some extent, by posting comments…
Added by Vincent Granville on May 28, 2011 at 6:30pm — No Comments
ARMONK, N.Y., May 20, 2011 /PRNewswire/ -- As companies seek to gain real-time insight from diverse types of data, IBM (NYSE: IBM) today unveiled new software and services to help clients more effectively gain competitive insight, optimize infrastructure and better manage resources to address Internet-scale data. For the first time, organizations can…Continue
Added by Vincent Granville on May 28, 2011 at 10:58am — No Comments
Added by Rakesh Ranjan on May 25, 2011 at 10:00am — No Comments
Added by Vincent Granville on May 24, 2011 at 6:15pm — No Comments
Added by Sandeep Raut on May 22, 2011 at 8:40pm — No Comments
Added by Manish Mohan on May 22, 2011 at 2:24pm — No Comments
Added by Amanda Shankle-Knowlton on May 20, 2011 at 7:30am — No Comments
The American Statistical Association and CHANCE magazine have debuted The Statistics Forum, a blog to provide everyone the opportunity to participate in discussions about probability and statistics and their role in important and interesting topics. The blog, which is located on the CHANCE web site atchance.amstat.org, is edited by Andrew Gelman. Everyone is invited to read and comment on the…Continue
Added by Vincent Granville on May 19, 2011 at 5:43pm — No Comments
The American Statistical Association (ASA), the nation's preeminent statistical society, urges members of the House of Representatives to support the Statistics Teaching, Aptitude and Training Act of 2011 (STAT Act of 2011), which was introduced today by Congressman Dave Loebsack (D-Iowa). A copy of the bill may be viewed at…Continue
Added by Vincent Granville on May 19, 2011 at 5:41pm — No Comments
"Evidence of plagiarism and complaints about the peer-review process have led a statistics journal to retract a federally funded study that condemned scientific support for global warming.
The study, which appeared in 2008 in the journal Computational Statistics and Data Analysis, was headed by statistician Edward Wegman of George Mason University in Fairfax, Va. Its analysis was an outgrowth of a controversial congressional report that Wegman headed in 2006. The 'Wegman Report'…Continue
Added by Richard on May 16, 2011 at 7:20pm — No Comments
FIND Technologies Inc. is a Canadian company that owns novel sensor technology for measuring electromagnetic signatures of materials. The sensor is a robust, inexpensive instrument that detects passive electromagnetic emission from all matter. It has biomedical, homeland security, engineering, geological, and other applications.
In order to provide real-time, automatic identification of materials, it is…Continue
Math majors, rejoice. Businesses are going to need tens of thousands of you in the coming years as companies grapple with a growing mountain of data.
Data is a vital raw material of the information economy, much as coal and iron ore were in the Industrial Revolution. But the business world is just beginning to learn how to process it all.
The current data surge is coming from sophisticated computer tracking of shipments, sales, suppliers and customers, as well as e-mail, Web…Continue
Added by Vincent Granville on May 14, 2011 at 9:41am — No Comments
Analyzing large data sets—so called big data—will become a key basis of competition, underpinning new waves of productivity growth, innovation, and consumer surplus as long as the right policies and enablers are in place.
Research by MGI and McKinsey's Business Technology Office examines the state of digital data and documents the significant value that can potentially be unlocked.…Continue
Added by Domenico "Dominic" Tassone on May 11, 2011 at 12:30pm — No Comments
The president of the data mining company I work for recently published a new article on data mining. This article from Tim Graettinger addresses some of the top questions he's been asked during a webinar on data mining that he helps to present:
Added by Daniel Graettinger on May 10, 2011 at 6:59pm — No Comments