A Data Science Central Community

We investigate a large class of auto-correlated, stationary time series, proposing a new statistical test to measure departure from the base model, known as Brownian motion. We also discuss a methodology to deconstruct these time series, in order to identify the root mechanism that generates the observations. The time series studied here can be discrete or continuous in time, they can have various degrees of smoothness (typically measured using the Hurst exponent) as well as long-range or…

ContinueAdded by Vincent Granville on April 1, 2019 at 1:00pm — No Comments

*The emergence of alternative data as a key enabler in expanding credit delivery and financial inclusion is unmistakable.*

The saying that the only thing that is constant is change, is attributed to Heraclitus, the Greek Philosopher. This is so very relevant today in the way lenders use technology and scoring solutions to understand the credit worthiness of applicants. Credit Risk Management has come a long way from the days when banks used just one credit score cut off to…

ContinueAdded by Naagesh Padmanaban on March 25, 2019 at 11:15pm — No Comments

I present here some innovative results from my most recent research on stochastic processes. chaos modeling, and dynamical systems, with applications to Fintech, cryptography, number theory, and random number generators. While covering advanced topics, this article is accessible to professionals with limited knowledge in statistical or mathematical theory. It introduces new material not covered in my recent book (available …

ContinueAdded by Vincent Granville on March 21, 2019 at 7:30am — No Comments

Determining the number of clusters when performing unsupervised clustering is a tricky problem. Many data sets don't exhibit well separated clusters, and two human beings asked to visually tell the number of clusters by looking at a chart, are likely to provide two different answers. Sometimes clusters overlap with each other, and large clusters contain sub-clusters, making a decision not easy.

For instance, how many clusters do you see in the picture below? What is the optimum number…

ContinueAdded by Vincent Granville on March 13, 2019 at 6:00pm — No Comments

Many times, complex models are not enough (or too heavy), or not necessary, to get great, robust, sustainable insights out of data. Deep analytical thinking may prove more useful, and can be done by people not necessarily trained in data science, even by people with limited coding experience. Here we explore what we mean by deep analytical thinking, using a case study, and how it works: combining craftsmanship, business acumen, the use and creation of tricks and rules of thumb, to provide…

ContinueAdded by Vincent Granville on March 7, 2019 at 1:46pm — No Comments

Graph analytics frameworks consist of a set of tools and methods developed to extract knowledge…

ContinueAdded by Elise Devaux on February 27, 2019 at 5:00am — No Comments

In this data science article, emphasis is placed on *science*, not just on data. State-of-the art material is presented in simple English, from multiple perspectives: applications, theoretical research asking more questions than it answers, scientific computing, machine learning, and algorithms. I attempt here to lay the foundations of a new statistical technology, hoping that it will plant the seeds for further research on a topic with a broad range of potential…

Added by Vincent Granville on February 23, 2019 at 11:00am — No Comments

Many of the following statistical tests are rarely discussed in textbooks or in college classes, much less in data camps. Yet they help answer a lot of different and interesting questions. I used most of them without even computing the underlying distribution under the null hypothesis, but instead, using simulations to check whether my assumptions were plausible or not. In short, my approach to statistical testing is is model-free, data-driven. Some are easy to implement even in Excel. Some…

ContinueAdded by Vincent Granville on February 13, 2019 at 7:00pm — No Comments

For background to this post, please see Learn Machine Learning Coding Basics in a weekend. Here,we present the glossary that we use for the coding and the mindmap attached to these classes and upcoming book. About 80 terms are included in the glossary, covering Ensembles, Regression, Classification,…

ContinueAdded by Vincent Granville on February 12, 2019 at 12:31pm — No Comments

**Logistic regression (LR)** models estimate the probability of a binary response, based on one or more predictor variables. Unlike linear regression models, the dependent variables are categorical. LR has become very popular, perhaps because of the wide availability of the procedure in software. Although LR is a good choice for many situations, it doesn't work well for *all* situations. For example:

- In propensity score…

Added by Vincent Granville on February 7, 2019 at 3:23pm — No Comments

This is another interesting problem, off-the-beaten-path. It ends up with a formula to compute the integral of a function, based on its derivatives solely.

For simplicity, I'll start with some notations used in the context of matrix theory, familiar to everyone: T(*f*) = *g*, where *f* and *g* are vectors, and T a square matrix. The notation T(*f*) represents the product between the matrix T, and the vector *f*. Now, imagine that the…

Added by Vincent Granville on February 3, 2019 at 5:30pm — 1 Comment

First days after the celebration of the New Year is the time when looking back we can analyze our actions, promises and draw conclusions whether our predictions and expectations came true. As 2018 came to its end, it is perfect time to analyze it and to set trends for the next year. The amount of data generated every minute is enormous. Therefore new approaches, techniques, and solutions have been developed.…

ContinueAdded by Vincent Granville on January 29, 2019 at 11:43am — No Comments

Extract from the upcoming Monday newsletter published by Data Science Central. Previous editions can be found here. The contribution flagged with a + is our selection for the picture of the week. To subscribe, …

ContinueAdded by Vincent Granville on January 27, 2019 at 3:20pm — No Comments

A passionate customer always provides feedback about his favorite product if it touches his emotional chord.

Product review contains wealth of information. Analyzing the review texts can unearth many hidden data points about the customer and the product. Such insights can help grow the business and gain revenue.

Lets look into a specific example. …

ContinueAdded by Kaniska Mandal on January 24, 2019 at 3:30pm — No Comments

Organizations across industries are adopting graph analytics to reinforce their anti-fraud programs. In this post, we examine three types of fraud graph analytics can help investigators combat: insurance fraud, credit card fraud, VAT fraud.

In many areas, fraud investigators have at their disposal large datasets in which clues are hidden. These clues are left behind by…

Added by Elise Devaux on January 22, 2019 at 12:30am — No Comments

Extract from the upcoming Monday newsletter published by Data Science Central. Previous editions can be found here. The contribution flagged with a + is our selection for the picture of the week. To subscribe, …

ContinueAdded by Vincent Granville on January 20, 2019 at 12:15pm — No Comments

This article was written by Ajit Joakar.

In this longish post, I have tried to explain Deep Learning starting from familiar ideas like machine learning. This approach forms a part of my forthcoming book. I have used this approach in my teaching. It is based on ‘learning by exception,' i.e. understanding one concept and it’s limitations and then understanding how the subsequent concept…

ContinueAdded by Vincent Granville on January 16, 2019 at 9:48am — No Comments

Why is graph visualization so important? How can it help businesses sifting through large amounts of complex data? We explore the answer in this post through 5 advantages of graph visualization and different use cases.

Also called network, a graph is a collection of nodes (or vertices) and edges (or links). Each node represents a single data point (a person, a phone number, a transaction) and each edge represents how two nodes…

ContinueAdded by Elise Devaux on January 11, 2019 at 9:25am — No Comments

*Summary:** Here are our 5 predictions for data science, machine learning, and AI for 2019. We also take a look back at last year’s predictions to see how we did.*

It’s that time of year again when we do a look back in order to offer a look forward. What trends will speed up, what things will actually happen,…

ContinueAdded by Vincent Granville on December 20, 2018 at 6:30pm — No Comments

We are in the process of writing and adding new material (compact eBooks) exclusively available to our members, and written in simple English, by world leading experts in AI, data science, and machine learning. In the upcoming months, the following will be added:

- The Machine Learning Coding Book
- Off-the-beaten-path Statistics and Machine Learning Techniques
- Encyclopedia of Statistical Science
- Original Math, Stat and Probability Problems - with…

Added by Vincent Granville on December 1, 2018 at 6:26pm — No Comments

- Introduction to privacy-preserving synthetic data
- How “anonymous” is anonymous data?
- Use the Data Insights Iceberg to Manage Stakeholder Expectations
- New Books in AI, Machine Learning, and Data Science
- Python for Automating Your Quality Analysis
- 40+ Modern Tutorials Covering All Aspects of Machine Learning
- Python as a tool benefiting data scientists in many ways

- Introduction to privacy-preserving synthetic data
- PETs: the technologies organization should consider adopting
- Bernouilli Lattice Models - Connection to Poisson Processes
- Explaining Data Science to a Non-Data Scientist
- New Probabilistic Approach to Factoring Big Numbers
- How “anonymous” is anonymous data?
- Simple Trick to Dramatically Improve Speed of Convergence

- Data science jobs not requiring human interactions
- Data Science – the Foundation for Leading Banks
- The 8 worst predictive modeling techniques
- Blog - R vs Python. Which one has higher demand on the job market? A short study
- 10 Tools For Working With Big Data For Successful Analytics
- Machine Learning with Python- Why do they form the best combination
- Twelve Emerging Trends in Data Analytics (part 1 of 4)

- analytics (142)
- data (141)
- asymptotix (131)
- Analytics (125)
- Data (113)
- Business (49)
- predictive (46)
- big (45)
- Intelligence (42)
- Big (42)

**2020**

**2019**

- December (5)
- November (6)
- October (5)
- September (3)
- August (9)
- July (4)
- June (5)
- May (5)
- April (7)
- March (5)
- February (9)
- January (7)

**2018**

- December (3)
- November (1)
- October (3)
- September (8)
- August (13)
- July (6)
- June (7)
- May (16)
- April (10)
- March (10)
- February (14)
- January (14)

**2017**

- December (7)
- November (14)
- October (15)
- September (13)
- August (17)
- July (13)
- June (9)
- May (10)
- April (23)
- March (8)
- February (8)
- January (10)

**2016**

- December (12)
- November (24)
- October (3)
- September (4)
- August (17)
- July (19)
- June (6)
- May (21)
- April (14)
- March (15)
- February (13)
- January (11)

**2015**

- December (25)
- November (19)
- October (24)
- September (21)
- August (26)
- July (34)
- June (30)
- May (16)
- April (21)
- March (17)
- February (25)
- January (19)

**2014**

- December (29)
- November (29)
- October (36)
- September (15)
- August (18)
- July (40)
- June (29)
- May (24)
- April (38)
- March (42)
- February (49)
- January (67)

**2013**

- December (66)
- November (76)
- October (79)
- September (90)
- August (106)
- July (89)
- June (72)
- May (72)
- April (63)
- March (61)
- February (74)
- January (54)

**2012**

- December (45)
- November (83)
- October (119)
- September (82)
- August (95)
- July (77)
- June (85)
- May (104)
- April (41)
- March (74)
- February (73)
- January (73)

**2011**

- December (83)
- November (64)
- October (77)
- September (105)
- August (39)
- July (25)
- June (44)
- May (64)
- April (46)
- March (34)
- February (50)
- January (40)

**2010**

- December (76)
- November (54)
- October (42)
- September (73)
- August (39)
- July (35)
- June (34)
- May (27)
- April (24)
- March (20)
- February (26)
- January (36)

**2009**

- December (49)
- November (57)
- October (48)
- September (44)
- August (39)
- July (27)
- June (41)
- May (38)
- April (53)
- March (47)
- February (37)
- January (38)

**2008**

© 2020 AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Statistics -- New Foundations, Toolbox, and Machine Learning Recipes
- Book: Classification and Regression In a Weekend - With Python
- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions