# May 2011 Blog Posts (26)

### O(n) clustering algorithm for very large, unstructured data

Let's say that you have a large number n of elements a, b, c, etc. and you want to group them into clusters. Each cluster is supposed to contain few elements, say O(1).

You have one similarity metric d(a,b) to compare any two elements a, b. Also, you have a list of all pairs where d(a,b) > threshold, or in other words, all pairs (a,b) where a and b belong to the same cluster. The n x…

Added by Vincent Granville on May 30, 2011 at 11:00pm — 1 Comment

### SEO Strategies- The Secrets to boost your Online Visibility

For a business, a website serves as a medium to earn money and mark an online presence. But is it enough to create a website and leave it as it is? Probably it would then resemble a stationary car, which was started-on but nobody cared to put the pedal on the accelerator.

Many-a-times, you may come across a good-looking website, not giving any results to its owner. It could be heart-breaking! It usually happens…
Added by Manish Mohan on May 30, 2011 at 12:38pm — No Comments

### The Analyticbridge Theorem (AKA the Fundamental Business Analytics Theorem)

See attached document, including the theorem, its proof and applications to business analytics (e.g. to produce model-free, data-driven confidence intervals for predictive scores). More explanations coming soon, in particular about how to leverage this deep statistical result when computing metrics against very large data sets.

The AnalyticBridge Theorem

Added by Vincent Granville on May 29, 2011 at 7:00pm — 1 Comment

### What causes predictive models to fail - and how to fix it?

Here are potential issues:

• Over-fitting.If you perform a regression with 200 predictors (with strong cross-correlations among predictors), use meta regression coefficients: that is, use coefficients of the form f[Corr(Var, Response), a,b, c] where a, b, c are three meta-parameters (e.g. priors in a Bayesian framework). This will reduce your number of parameters from 200 to 3, and eliminate most of the over-fitting
• Perform the right type of…
Added by Vincent Granville on May 28, 2011 at 8:00pm — 8 Comments

### Facebook Friendlist Mysteriously Changed

If you have more than 100 friends on Facebook, you've probably noticed that Facebook always show up the same 20 friends on your profile page, day after day. FB actually shows up to 10 friends, but they rotate from a list of 20 friends that, according to FB data mining algorithms, are deemed to be your best friends.

What makes a connection become one of your FB best friend is how frequently she visits your profile. Your can influence this list to some extent, by posting comments…

Added by Vincent Granville on May 28, 2011 at 6:30pm — No Comments

### IBM Commits \$100 Million to Massive Scale Analytics Research

ARMONK, N.Y.May 20, 2011 /PRNewswire/ -- As companies seek to gain real-time insight from diverse types of data, IBM (NYSE: IBM) today unveiled new software and services to help clients more effectively gain competitive insight, optimize infrastructure and better manage resources to address Internet-scale data. For the first time, organizations can…

Added by Vincent Granville on May 28, 2011 at 10:58am — No Comments

### Analytics Driving Customer Engagements

Marketing has traditionally been perceived as a cost centre and defining an optimum marketing spend has never been that easy. Big companies spend huge on brand promotions or ATL activities. BTL managers are usually under pressure to justify ROI from each penny spent. The fact that BTL activities also promote the brand is very often ignored and all you have to answer is the sales…
Added by Rakesh Ranjan on May 25, 2011 at 10:00am — No Comments

### RapidMiner voted most popular data mining / analytic software on KDNuggets

The poll had a record participation (over 1,100 voters). Among them, 43% used only commercial software, 32% only free software, and 25% both. The average number of tools per user was 2.2.

RapidMiner, R, and Excel were again the most popular tools, with SAS remaining the top commercial tool. Pie chart shows the breakdown of voters by region. We also note that W. European data miners had the highest % of free tool use (due to popularuty of tools like RapidMiner and KNIME… Continue

Added by Vincent Granville on May 24, 2011 at 6:15pm — No Comments

### Analytics Lead at Ebay

The position would be located in Ebay’s Whitman Campus (Campbell, CA)

Business Title – Sr. Manager – Vertical Analytics

Position Type – Full Time Employee

Description - The Site Analytics group is responsible for delivering business insights and high impact analyses to the Global eBay Marketplace businesses (eg: eBay.com, eBay.co.uk).  Within this group, teams partner with business unit clients to address strategic and operational questions…

Added by Heena Tripathi on May 24, 2011 at 5:49pm — No Comments

### Credit Score Cards

Information Technology as a industry has grown up in leaps and bounds. You may not find any organization on the planet which does not have any IT involved.  This has given rise to lot of jobs supporting the IT functions. Salaries have increased tremendously in IT compared to other business areas. Overall economy had gone up which increased the tendency of people to afford & buy more & more.
This has increased the usage of Credit in everyday life. “Buy now pay later”…
Added by Sandeep Raut on May 22, 2011 at 8:40pm — No Comments

### Google introduce WebP: New image format for web

As we are familiar with WebM which has been introduced last year and successful implementation of that format in Youtube last month. Now Google announced a new format for image called WebP. WebP format of image allows you to compress your file space upto 40% without any change in its original resolution not only that but it also magnifies your pix resolution from all other formats like JPEG or PNG.

Added by Manish Mohan on May 22, 2011 at 2:24pm — No Comments

### GOOGLE goes Social by introducing +1 Button

Few days ago I was searching for something in Google. When I got result of my search then I saw that one blue button with +1 appeared in the right end of that search result. I just clicked that button and ignore that thing and just got involved with my search results which I had searched for. But few days later when I had gone through my Google profile I noticed that one new tab of +1’s had been appeared in my profile. When I…
Added by Manish Mohan on May 22, 2011 at 11:37am — 1 Comment

### Ethics of Graph-Making: Originally posted at StatSoft.com

In a few political and data-visualization blogs the past several days, there has been a kerfuffle concerning this bar chart that the Wall Street Journal published. The gist of the chart is that the bulk of the taxable income in this country…
Added by Amanda Shankle-Knowlton on May 20, 2011 at 7:30am — No Comments

### ASA and CHANCE Magazine Sponsor Blog to Foster Discussions of Probability, Statistics

The American Statistical Association and CHANCE magazine have debuted The Statistics Forum, a blog to provide everyone the opportunity to participate in discussions about probability and statistics and their role in important and interesting topics. The blog, which is located on the CHANCE web site atchance.amstat.org, is edited by Andrew Gelman. Everyone is invited to read and comment on the…

Added by Vincent Granville on May 19, 2011 at 5:43pm — No Comments

### American Statistical Association Urges Support of Statistical Literacy Bill

The American Statistical Association (ASA), the nation's preeminent statistical society, urges members of the House of Representatives to support the Statistics Teaching, Aptitude and Training Act of 2011 (STAT Act of 2011), which was introduced today by Congressman Dave Loebsack (D-Iowa). A copy of the bill may be viewed at…

Added by Vincent Granville on May 19, 2011 at 5:41pm — No Comments

### humans.txt: New Idea for Human not for Robots

This new creation is brought to you by humanstxt.org; just to make you aware about all those incorporated individuals who brainstorm day and night for their website. It’s a new era where the creator doesn’t take the credit himself but also admires the other helping hands.

Added by Manish Mohan on May 17, 2011 at 5:56am — No Comments

### Statistics Academic Journal Pulls Climate Denialist Study After Charges of Plagiarism

"Evidence of plagiarism and complaints about the peer-review process have led a statistics journal to retract a federally funded study that condemned scientific support for global warming.

The study, which appeared in 2008 in the journal Computational Statistics and Data Analysis, was headed by statistician Edward Wegman of George Mason University in Fairfax, Va. Its analysis was an outgrowth of a controversial congressional report that Wegman headed in 2006. The 'Wegman Report'…

Added by Richard on May 16, 2011 at 7:20pm — No Comments

### Canadian hi-tech company offers \$45,000 for the best algorithm for identification of substances from electromagnetic signatures.

FIND Technologies Inc. is a Canadian company that owns novel sensor technology for measuring electromagnetic signatures of materials. The sensor is a robust, inexpensive instrument that detects passive electromagnetic emission from all matter. It has biomedical, homeland security, engineering, geological, and other applications.

In order to provide real-time, automatic identification of materials, it is…

Added by Bart Blaszczyk on May 16, 2011 at 2:04am — 2 Comments

### New Ways to Exploit Raw Data May Bring Surge of Innovation | New York Times

Math majors, rejoice. Businesses are going to need tens of thousands of you in the coming years as companies grapple with a growing mountain of data.

Data is a vital raw material of the information economy, much as coal and iron ore were in the Industrial Revolution. But the business world is just beginning to learn how to process it all.

The current data surge is coming from sophisticated computer tracking of shipments, sales, suppliers and customers, as well as e-mail, Web…

Added by Vincent Granville on May 14, 2011 at 9:41am — No Comments

### About 200,000 data miners needed according to McKinsey

Analyzing large data sets—so called big data—will become a key basis of competition, underpinning new waves of productivity growth, innovation, and consumer surplus as long as the right policies and enablers are in place.

Research by MGI and McKinsey's Business Technology Office examines the state of digital data and documents the significant value that can potentially be unlocked.…

Added by Vincent Granville on May 13, 2011 at 5:28pm — 2 Comments

