R first appeared in 1996, when the statistics professors Robert Gentleman, left, and Ross Ihaka released the code as a free software package.
By ASHLEE VANCE
Published: January 6, 2009
To some people R is just the 18th letter of the alphabet. To others, it’s the rating on racy movies, a measure of an attic’s insulation or what pirates in movies say.
R is also the name of a popular programming language used by a growing number of data analysts inside… Continue
Added by Vincent Granville on January 7, 2009 at 11:00am —
By Gary Angel
My post on “Numbers it’s better NOT to know” got me thinking more closely about the relationship between a theory of error and the types of web analytic process organizations should adopt. That led to a more considered post “Defending the Indefensible” where I laid out some of the most common causes of error and talked a little bit about how these errors should influence our thinking about organization and process. Jacques Warren, whose comments certainly triggered some… Continue
Added by Vincent Granville on December 26, 2008 at 6:30pm —
1. What is click fraud?
Click fraud is usually defined as the act of purposely clicking on ads on pay-per-click programs with no interest in the target web site. Two types of fraud are usually mentioned:
- An advertiser clicking on competitor ads to deplete their ad spend budgets, with fraud frequently taking place early in the morning and through multiple distribution partners: AOL, Ask.com, MSN, Google, Yahoo, etc.
- A malicious distribution partner…
Added by Vincent Granville on December 8, 2008 at 3:30am —
We have started writing a new book: Data Mining with the Naked Eye
. We show that well chosen graphs combined with human brain interpretation is powerful to help with business decisions. We also show that simple but smart reporting, careful metric and data selection, when combined with appropriate visuals, provide higher efficiency than sophisticated statistical models. This is true even with the largest data sets, when data is seen through the eyes of a sharp data miner with a strong… Continue
Added by Vincent Granville on November 12, 2008 at 12:36am —
A couple companies ago, one of my mentors -- Jack Olson, author of Data Quality -- taught us team leaders to follow a formula for sizing software development groups. Of course this is simply a guidance, but it makes sense:
9:3:1 for dev/test/doc
In other words, a 3:1 ratio of developers to testers, and then a 9:1 ratio of developers to technical writers. Also figure in how a group that size (13) needs a manager/architect and some project management.
On the… Continue
Added by Vincent Granville on November 3, 2008 at 1:00pm —
October 22, 2008
Banks Mine Data and Pitch to Troubled Borrowers
By BRAD STONE
Brenda Jerez hardly seems like the kind of person lenders would fight over.
Three years ago, she became ill with cancer and ran up $50,000 on her credit cards after she was forced to leave her accounting job. She filed for bankruptcy protection last year.
For months after she emerged from insolvency last fall, 6 to 10 new credit card and auto loan offers arrived… Continue
Added by Vincent Granville on October 28, 2008 at 6:33pm —
Last week I attended the Microsoft BI Conference. I learned about project Gemini. This project will allow analytics power users in companies to use Excel to do powerful analytics, while simultaneously allowing collaboration among all stakeholders using PerformancePoint. It allows Excel to load over 100 million rows (and about 6 columns) in just a few seconds and then create interactive pivot tables. They are still working on calculations but the demonstration was powerful. If you see Ted… Continue
Added by Vincent Granville on October 14, 2008 at 2:54pm —
What prompted you take a career in science, and what has been the reason you stuck to it, and been a sucess in it
I was doing mathematics for fun at a very young age when my friends were interested in sports, cars and movies. When I finished my master, I was approached by one of the professors to pursue a PhD program. It was in statistics (image analysis, bayesian clustering), and I thought that choosing statistics rather than number theory or numerical analysis would increase… Continue
Added by Vincent Granville on September 3, 2008 at 11:37am —
For several years, Sun CEO, Jonathan Schwartz has lobbied the SEC to allow disclosure of financial information through corporate blogs. In a landmark announcement, it seems that Mr. Schwartz may indeed get his wish, and with it, a historical decision that could break the age-old shackles that bound businesses to traditional media and distribution channels in order to satisfy full disclosure. Continue
The SEC has announced that it will recognize corporate Web sites and blogs as channels for…
Added by Vincent Granville on August 13, 2008 at 5:00pm —
The U.S. government is using social media technologies to reach out to the world, to start a dialogue, to influence foreign policy and to change the perception of the United Stated with the rest of the world.
With that the Department of State has set up Project Dipnote and created a YouTube Channel, a Blog, a Flickr photo album, a Twitter account, an account iTunes for podcast, RSS feeds and just recently launched a Facebook page.
Secretary of State Condi Rice has called… Continue
Added by Vincent Granville on August 12, 2008 at 12:14am —
I am trying to solve the regression Y=AX
is the response, X
the input, and A
the regression coefficients. I came up with the following iterative algorithm:
Ak+1 = cYU + Ak (I-cXU),
- c is an arbitrary constant
- U is an arbitrary matrix such that YU has same dimension as A. For instance U = transposed(X)…
Added by Vincent Granville on July 30, 2008 at 6:00pm —
This survey of products is an update of the survey published in 2005. The biennial statistical software survey in this issue provides capsule information about 44 products selected from 31 vendors. The tools range from general tools that cover the standard techniques of inference and estimation as well as specialized activities such as nonlinear regression, forecasting and design of experiments. The product information contained in the survey was obtained from product vendors and is summarized… Continue
Added by Vincent Granville on July 26, 2008 at 12:00am —
The interaction and cooperation between computers and the human brain is at a crossroad. There are some who believe that decision support systems should be completely automated. There are others who believe that there are many areas of business, technology, and science that have not been discovered yet, and, hence, only part of a decision support system can be automated. I subscribe to the latter proposition.
Computer science is, at its core, an attempt to replicate the processing,… Continue
Added by Vincent Granville on July 10, 2008 at 11:30pm —
In the context of credit scoring, one tries to develop a predictive model using a regression formula such as Y = Σ wi Ri, where Y is the logarithm of odds ratio (fraud vs. non fraud). In a different but related framework, we are dealing with a logistic regression where Y is binary, e.g. Y = 1 means fraudulent transaction, Y = 0… Continue
Added by Vincent Granville on July 8, 2008 at 5:00pm —
The Netflix Prize seeks to substantially improve the accuracy of predictions about how much someone is going to love a movie based on their movie preferences. Improve it enough and you win one (or more) Prizes. Winning the Netflix Prize improves our ability to connect people to the movies they love.
Added by Vincent Granville on June 28, 2008 at 10:09am —
From the company that brought you the C programming language comes Hancock, a C variant developed by AT&T researchers to mine gigabytes of the company's telephone and internet records for surveillance purposes.
An AT&T research paper published in 2001 and unearthed today by Andrew Appel at Freedom to Tinker shows how the phone company uses Hancock-coded software to crunch through tens of millions of long distance phone records a night to draw up what AT&T calls… Continue
Added by Vincent Granville on June 26, 2008 at 4:07am —
Aims and Scope
Statistical Analysis and Data Mining addresses the broad area of data analysis, including data mining algorithms, statistical approaches, and practical applications. Topics include problems involving massive and complex datasets, solutions utilizing innovative data mining algorithms and/or novel statistical approaches, and the objective evaluation of analyses and solutions. Of special interest are articles that describe analytical techniques, and discuss their… Continue
Added by Vincent Granville on May 26, 2008 at 4:00pm —
By Lyndsey Layton and Ashley Surdin
Washington Post Staff Writers
Saturday, April 5, 2008; Page A02
Quick: Name the Western U.S. city most vulnerable to a terrorist attack. Is it Los Angeles, with its crowded roads that make quick escape impossible? San Francisco and its iconic bridge? Or Seattle with its Space Needle and busy port?
Try Boise, Idaho, with its, um, potatoes.
A new study funded largely by the Department of Homeland Security ranked 132… Continue
Added by Vincent Granville on May 26, 2008 at 3:30pm —
While social networking is a red-hot topic at Revenue and around the online marketing space, eMarketer has revised its U.S. social network ad spend projections downward. The market researcher estimates that advertisers will spend $1.4 billion to place ads on online social networks in 2008, down from the previous projection of $1.6 billion.
U.S. online social network ad spend is now projected to reach $2.6 billion in 2012. In its last projection, made in December 2007, eMarketer… Continue
Added by Vincent Granville on May 15, 2008 at 9:00am —
Added by Vincent Granville on May 5, 2008 at 11:30pm —