A Data Science Central Community

Here is our new selection of articles and resources recently posted:

ContinueAdded by Vincent Granville on December 15, 2017 at 3:30pm — No Comments

*Summary:** Here are our 6 predictions for data science, machine learning, and AI for 2018. Some are fast track and potentially disruptive, some take the hype off over blown claims and set realistic expectations for the coming year.*

It’s that time of year again when we do a look back in order to offer a look forward. What trends will speed up, what things will actually happen, and what things won’t in the coming year for data science, machine…

ContinueAdded by Vincent Granville on December 14, 2017 at 3:00pm — No Comments

Here is our selection of featured articles recently posted:

- Enterprise AI: Learning from the evolution of Robotic Process Autom...
- Book: Introduction to Statistics …

Added by Vincent Granville on December 9, 2017 at 1:42pm — No Comments

*Summary:** As a profession we do a pretty poor job of agreeing on good naming conventions for really important parts of our professional lives. “Machine Learning” is just the most recent case in point. It’s had a perfectly good definition for a very long time, but now the deep learning folks are trying to hijack the term. Come on folks. Let’s make up our minds.*

As a profession we do a pretty poor job of agreeing on good naming conventions for…

ContinueAdded by Vincent Granville on December 7, 2017 at 2:41pm — No Comments

Here is our selection of featured articles and resources posted recently:

- Free eBook: Applied Data Science (Columbia University)
- Some Deep Learning with Python, TensorFlow and Keras …

Added by Vincent Granville on December 2, 2017 at 1:47pm — No Comments

Added by Vincent Granville on November 25, 2017 at 4:46pm — No Comments

In some applications, using the standard precision in your programming language of choice, may not be enough, and can lead to disastrous errors. In some cases, you work with a library that is supposed to provide very high precision, when in fact the library in question does not work as advertised. In some cases, lack of precision results in obvious problems that are easy to spot, and in some cases, everything seems to be working fine and you are not aware that your simulations are completely…

ContinueAdded by Vincent Granville on November 19, 2017 at 9:30am — No Comments

Here is our selection of featured articles and resources posted in the last few days:

- The Gaussian Correlation Inequality in One Picture
- Handbook of Statistical Analysis and Data Mining Applications - 2nd... …

Added by Vincent Granville on November 18, 2017 at 10:00am — No Comments

Here we discuss an application of HPC (not high performance computing, instead high precision computing, which is a special case of HPC) applied to dynamical systems such as the logistic map in chaos theory. defined as X(k) = 4 X(k) (1 - X(k-1)).

For all these systems, the loss of precision propagates exponentially, to the point that after 50 iterations, all generated values are completely wrong. Tons of articles have been written on this subject - none of them acknowledging the…

ContinueAdded by Vincent Granville on November 13, 2017 at 7:00pm — No Comments

Here is our selection of featured articles and resources posted in the last few days:

ContinueAdded by Vincent Granville on November 11, 2017 at 4:55pm — No Comments

*Summary:** This is the first in a series about Chatbots. In this first installment we cover the basics including their brief technological history, uses, basic design choices, and where deep learning comes into play. In subsequent articles we’ll describe in more detail about how they are actually programmed and best practice dos and don’ts.…*

Added by Vincent Granville on November 8, 2017 at 1:30pm — No Comments

Data Science Central (DSC) is excited to announce our competition to solve a new, interesting problem in statistical science, pertaining to stochastic processes. DSC members are invited and encouraged, to submit a theoretical solution or an application to real life problems, including but not limited to fintech, operations research, statistical science, computer science, economics, engineering, social, actuarial, biological or physical sciences.

The statistical process central to this…

ContinueAdded by Vincent Granville on November 8, 2017 at 12:00pm — No Comments

In this post, we learn about building a basic search engine or document retrieval system using Vector space model. This use case is widely used in information retrieval systems. Given a set of documents and search term(s)/query we need to retrieve relevant documents that are similar to the search query.

The problem statement explained above is represented as in below image. …

ContinueAdded by suresh kumar Gorakala on November 7, 2017 at 6:30am — No Comments

Here we describe well-known chaotic sequences, including new generalizations, with application to random number generation, highly non-linear auto-regressive models for times series, simulation, random permutations, and the use of big numbers (libraries available in programming languages to work with numbers with hundreds of decimals) as standard computer precision almost always produces completely erroneous results after a few iterations -- a fact rarely if ever mentioned in the scientific…

ContinueAdded by Vincent Granville on November 6, 2017 at 8:30pm — No Comments

*Full title: Trend Analysis of Fragmented Time Series: Hypothesis Testing Based Adaptive Spline Filtering Method.*

Missing data present significant challenges to trend analysis of time series. Straightforward approaches consisting of supplementing missing data with constant or zero values or with linear trends can severely degrade the quality of the trend analysis, which significantly reduces the reliability of the…

ContinueAdded by Vincent Granville on November 3, 2017 at 10:23am — No Comments

There is no need to get confused with multiple linear regression, generalized linear model or general linear methods. The general linear model or multivariate regression model is a statistical linear model and is written as **Y = XB + U**.

Usually, a linear model includes a number of different statistical models such as ANOVA, ANCOVA, MANOVA, MANCOVA, ordinary linear regression, t-test and F-test. The GLM is a generalization of multiple…

Added by Chirag Shivalker on November 2, 2017 at 11:30pm — 1 Comment

The logistic map is the most basic recurrence formula exhibiting various levels of chaos depending on its parameter. It has been used in population demographics to model chaotic behavior. Here we explore this model in the context of randomness simulation, and revisit a bizarre non-periodic random number generator discovered 70 years ago, based on the logistic map equation. We then discuss flaws and strengths in widely used random number generators, as well as how to reverse-engineer…

ContinueAdded by Vincent Granville on October 30, 2017 at 12:00pm — No Comments

This famous statement -- the six degrees of separation -- claims that there is at most 6 degrees of separation between you and anyone else on Earth. Here we feature a simple algorithm that simulates how we are connected, and indeed confirms the claim. We also explain how it applies to web crawlers: Any web page is connected to any other web page by a path of 6 links at most.

The algorithm below is rudimentary and can be used for simulation purposes by any programmer: It does not even…

ContinueAdded by Vincent Granville on October 25, 2017 at 11:43am — No Comments

Here is our selection of featured articles and resources posted recently:

ContinueAdded by Vincent Granville on October 22, 2017 at 5:19pm — No Comments

**Non-traditional strategies for mid-career switch to Data Science and AI**

In this post, I explore strategies to switch to Data Science mid-career. This switch is not easy, but based on the experience of many who I have taught/mentored/recruited – it is possible. Most people consider PhD/MooC etc for switching their career to Data Science. But here, I will explore some non-traditional/unorthodox ways of switching to Data Science. …

ContinueAdded by Vincent Granville on October 20, 2017 at 12:10pm — No Comments

- 6 Predictions about Data Science, Machine Learning, and AI for 2018
- High Precision Computing in Python or R
- Linear Models Don’t have to Fit Exactly for P-Values To Be Accurate, Right, and Useful
- Information Retrieval Document Search Engine in R
- Fascinating Time Series with Cool Applications
- Interesting Problem: Self-correcting Random Walks
- Book on Computer Programming

- Data science jobs not requiring human interactions
- Data Science – the Foundation for Leading Banks
- 12 Statistical and Machine Learning Methods that Every Data Scientist Should Know
- Making data science accessible - Markov Chains
- The 8 worst predictive modeling techniques
- Type I and Type II Errors in One Picture
- Simple Analytics is Good for Business

**2017**

- December (6)
- November (14)
- October (16)
- September (13)
- August (17)
- July (13)
- June (9)
- May (11)
- April (23)
- March (8)
- February (8)
- January (10)

**2016**

- December (12)
- November (24)
- October (4)
- September (4)
- August (17)
- July (19)
- June (6)
- May (21)
- April (14)
- March (15)
- February (13)
- January (11)

**2015**

- December (25)
- November (19)
- October (24)
- September (21)
- August (26)
- July (34)
- June (30)
- May (16)
- April (21)
- March (17)
- February (25)
- January (19)

**2014**

- December (29)
- November (29)
- October (36)
- September (15)
- August (18)
- July (40)
- June (29)
- May (24)
- April (38)
- March (42)
- February (50)
- January (67)

**2013**

- December (66)
- November (76)
- October (79)
- September (91)
- August (106)
- July (89)
- June (75)
- May (72)
- April (63)
- March (61)
- February (74)
- January (54)

**2012**

- December (45)
- November (83)
- October (119)
- September (82)
- August (95)
- July (77)
- June (85)
- May (104)
- April (41)
- March (74)
- February (73)
- January (73)

**2011**

- December (83)
- November (64)
- October (77)
- September (105)
- August (39)
- July (25)
- June (44)
- May (64)
- April (46)
- March (34)
- February (50)
- January (40)

**2010**

- December (76)
- November (54)
- October (42)
- September (73)
- August (39)
- July (35)
- June (34)
- May (27)
- April (24)
- March (20)
- February (26)
- January (36)

**2009**

- December (49)
- November (57)
- October (48)
- September (44)
- August (39)
- July (27)
- June (41)
- May (38)
- April (53)
- March (47)
- February (37)
- January (38)

**2008**