A Data Science Central Community
The methodology described here has broad applications, leading to new statistical tests, new type of ANOVA (analysis of variance), improved design of experiments, interesting fractional factorial designs, a better understanding of irrational numbers leading to cryptography, gaming and Fintech applications, and high quality random numbers generators (and when you really need them). It also features exact arithmetic / high performance computing and distributed algorithms to compute millions of…Continue
Added by Vincent Granville on February 29, 2020 at 11:00pm — No Comments
Summary: The Gartner Magic Quadrant for Data Science and Machine Learning Platforms is just out the big news is how much more capable all the platforms have become. Of course there are also some interesting winner and loser stories.
The Gartner Magic Quadrant for Data Science and Machine Learning Platforms is just out for 2020. The really big news is how many excellent choices are now available. In a remarkable move, the whole field…Continue
Added by Vincent Granville on February 21, 2020 at 9:25am — No Comments
In this notebook, we try to predict the positive (label 1) or negative (label 0) sentiment of the sentence. We use the UCI Sentiment Labelled Sentences Data Set.
Sentiment analysis is very useful in many areas. For example, it can be used for internet conversations moderation. Also, it is possible to predict ratings that users can assign to a certain product (food, household appliances, hotels,…Continue
Added by Vincent Granville on February 19, 2020 at 8:42pm — No Comments
By 2022, Gartner predicts that 90% of corporate strategies will explicitly mention information as a critical enterprise asset and analytics as an essential competency.
“Increasingly, leading and thriving organizations in every segment are wielding data and analytics as a competitive weapon, operational accelerant, and innovation catalyst,” notes analysts in …
Added by Tricia Morris on February 19, 2020 at 8:26am — No Comments
Probably the worst error is thinking there is a correlation when that correlation is purely artificial. Take a data set with 100,000 variables, say with 10 observations. Compute all the (99,999 * 100,000) / 2 cross-correlations. You are almost guaranteed to find one above 0.999. This is best illustrated in may article How to Lie with P-values (also discussing…Continue
Added by Vincent Granville on February 7, 2020 at 9:48am — No Comments