A Data Science Central Community
We are generating more data than ever before. Thanks to the data scientists who organize and analyze this information, this abundance of Big Data can be harnessed to serve the public interest in innovative ways.
Big Data for the Public Good is a four-part seminar series hosted by Code for America in San…Continue
Added by Vincent Granville on February 29, 2012 at 10:00pm — No Comments
StatSoft’s (www.statsoft.com) STATISTICA Decisioning Platform™ is the only enterprise predictive analytics and decision management software platform to…
Added by Vincent Granville on February 29, 2012 at 9:55pm — No Comments
In the last two post we have discussed about co - occurrences analysis to extract features in order to classify documents and extract "meta concepts" from the corpus.
We have also noticed that this approach doesn't return better than the traditional bag of words.
I would now explore some derivation of this approach, taking advantage of the graph theory.
the graph of the co occurrences is really huge and complex, how could we reduce its complexity without big information…Continue
Added by Cristian Mesiano on February 29, 2012 at 1:59pm — No Comments
Many customer behaviors have the flavor of a choice between two alternatives: Yes or no. Buy or sell. Renew or cancel. Suppose software called a “classifier” is available to predict customer choices in advance. Would you use it? Perhaps you’d like to test it to see how well it performs before you commit. In this installment of my series on the nuts and bolts of data mining, I discuss the use of classifiers and questions about their performance. Regarding performance, we specifically…Continue
Added by Daniel Graettinger on February 27, 2012 at 10:29am — No Comments
Quantifying of Extreme Events
Vicky Fasen Claudia Kluppelberg Annette Menzel
September 28, 2011
abstract / summary
Understanding and managing risks due extreme events is one of the most demanding topics of our society. We consider this problem as a statistical problem and present some of the probabilistic and statistical theory, which was developed to model and quantify extreme events. By the very nature of an extreme event…
Added by John A Morrison on February 24, 2012 at 9:10am — No Comments
I believe so. Here are some interesting thoughts on this:
You talk to a mortgage adviser at (say) Wells Fargo bank. You are interested in financing, own > 50%, have 2 salaries (your wife + yourself) that represents more than 50% of the amount you want to refinance, can make a 30% down payment and have an external income…Continue
The text analytic market is set to exceed £635mln as businesses look to capture customer sentiment to gain competitive advantage.
Companies from industries as diverse as financial services, pharmaceuticals and online retail are today looking to harness the voice of the customer across social networks to improve their services.
The technology to capture customer sentiment is becoming increasingly sophisticated, responsive, and flexible to distinct business needs. Despite the…Continue
Added by Vincent Granville on February 21, 2012 at 5:23pm — No Comments
Detecting Economic Events Using a Semantics-Based Pipeline
Alexander Hogenboom, Frederik Hogenboom, Flavius Frasincar, Uzay Kaymak, Otto van der Meer, and Kim Schouten
Erasmus University Rotterdam
In today's information-driven global economy, breaking news on economic…Continue
Added by John A Morrison on February 21, 2012 at 8:10am — No Comments
From Semantic Search & Integration to Analytics
LSDIS lab, University of Georgia, 415 Graduate Studies Research Center,
Athens, GA 30602-7404
Semagix Inc., 297 Prince Avenue,
Athens, GA 30601
Semantics is seen as the key ingredient in the next phase of the Web infrastructure as well as the next generation of enterprise content management. Ontology is the centerpiece of the most prevalent semantic technologies…Continue
Added by John A Morrison on February 21, 2012 at 6:30am — No Comments
We have seen few posts ago an approach to extract meta "concepts" from text based on latent semantic paradigm.
In this post we apply this approach to classify documents, and we do a comparison between this approach and the canonical bag of words.
The comparison test will be done through the ensemble method already showed in the last post.
To read the entire post click …Continue
Added by Cristian Mesiano on February 20, 2012 at 7:22am — No Comments
This came in my mailbox as a sales pitch by Autobox, however I thought that it is interesting:
Since we are always interested in learning about how others do time series and testing how our approaches work vis-à-vis other dated procedures, we pursued the data and would like to share our results.
Sometimes in an…
Added by Vincent Granville on February 16, 2012 at 10:00pm — No Comments
This is our first article in a series about good actionable KPI's to optimize various ROI. Future articles will focuse on metrics for fraud detection, user engagement etc. This one focuses on newsletter optimization.
If you run an online newsletter, here are a number of metrics you need to track:…Continue
By STEVE LOHR. GOOD with numbers? Fascinated by data? The sound you hear is opportunity knocking.…Continue
Added by Vincent Granville on February 12, 2012 at 10:29am — No Comments
A very efficient approach to random sampling in SAS® achieves speed increases orders of magnitude faster than the relevant "built-in" SAS® procedures. For sampling with replacement as applied to bootstraps, seven algorithms are compared, and the fastest ("OPDY"), based on the new approach, achieves speed increases over 220x faster than Proc SurveySelect. OPDY also handles datasets many times larger than those on which two hashing algorithms crash. For sampling without replacement as applied…Continue
Added by J.D. Opdyke on February 12, 2012 at 9:30am — No Comments
J.D. Opdyke and Alex Cavallo
In operational risk measurement, the estimation of severity distribution parameters is the main driver of capital estimates, yet this remains a non-trivial challenge for many reasons. Maximum likelihood estimation (MLE) does not adequately meet this challenge because of its well-documented non-robustness to modest violations of idealized textbook model assumptions, specifically that the data are independent and identically distributed (i.i.d.), which is…Continue
Added by J.D. Opdyke on February 10, 2012 at 3:56pm — No Comments
Monitoring Financial Stability in a Complex World
Mark D. Flood Allan
Office of Financial Research
Committee to Establish the Office of Financial Research
National Institute of Finance
Version 10 / January 19, 2012
Copyright 2012, M. Flood, A. Mendelowitz and W. Nichols
We offer a tour d’horizon of the data management issues facing…
Added by John A Morrison on February 9, 2012 at 10:54pm — No Comments
Added by Vincent Granville on February 9, 2012 at 7:30pm — No Comments
Pentaho’s Kettle data integration product cited for ‘richest functionality and most extensive integration with open source Apache Hadoop’
Added by Vincent Granville on February 9, 2012 at 6:56pm — No Comments
Added by Vincent Granville on February 9, 2012 at 6:30pm — No Comments
1. Short Bio
I started my career in the communications industry, where I spent 20 years with a Tier 1 carrier in probably 15 different jobs across the entire organization: Marketing, Advertising, Product Management, Operations, Sales, General Management, Strategy and Business Development. I basically…Continue
Added by Vincent Granville on February 9, 2012 at 4:00pm — No Comments