A Data Science Central Community

Continued fractions are usually considered as a beautiful, curious mathematical topic, but with applications mostly theoretical and limited to math and number theory. Here we show how it can be used in applied business and economics contexts, leveraging the mathematical theory developed for continued fraction, to model and explain natural phenomena. …

ContinueAdded by Vincent Granville on August 30, 2019 at 9:42am — No Comments

In the data-driven enterprise system, Spark has become a popular name that is easy to use, offer speed and versatility. The data can be understood at fast speed allowing one to make faster decisions. The Big Data has a huge benefit with the faster data processing of Spark. This clustering of large datasets works with a framework in open source that helps in analyzing. The codes are done in the Scala that has made it possible and easier for data processing that gives a certain boost to the…

ContinueAdded by Divyesh Aegis on August 13, 2019 at 12:51am — No Comments

In my previous posts, I compared model evaluation techniques using Statistical Tools & Tests and commonly used Classification and Clustering evaluation techniques

In this post, I'll take a look at how you can compare regression models. Comparing regression models is perhaps one of the trickiest tasks to complete in the "comparing models" arena; The reason is that there are literally dozens of statistics you can calculate to compare regression models, including:

**1.…**

Added by Vincent Granville on August 8, 2019 at 10:37am — No Comments

Sometimes, you see a diagram and it gives you an ‘aha ha’ moment. Here is one representing forward propagation and back propagation in a neural network:

A brief explanation is:

- Using the input variables x and y, The forwardpass (left half of the figure) calculates output z as a function of x and y i.e. f(x,y)
- The right side…

Added by Vincent Granville on August 8, 2019 at 10:29am — No Comments

Decision Trees, Random Forests and Boosting are among the top 16 data science and machine learning tools used by data scientists. The three methods are similar, with a significant amount of overlap. In a nutshell:

- A decision tree is a simple, decision making-diagram.
- Random forests are a large number of trees, combined (using averages or "majority rules") at the end of the process.
- Gradient boosting machines also combine decision trees, but start the combining…

Added by Vincent Granville on August 8, 2019 at 10:25am — No Comments

Properly implemented Machine Learning (ML) models can have a positive effect on organizational efficiency. It is first necessary to understand how these models are created, how they function, and how they are put into production.

**The Definition of a Machine Learning Model**

When a computer is presented with questions within a particular domain, a machine learning model will run an algorithm that will enable it to resolve those questions. These algorithms are not…

ContinueAdded by Arash Aghlara on August 7, 2019 at 3:30am — 1 Comment

Python is an extremely popular programming language. It is not just apt for generic purposes but it is extremely easy to read and use as well. The main reason why Python is used by a majority of people these days is the fact that it allows the programmers to save their time by using only limited lines of codes. In order to accomplish tasks, the developers do not have to spend a lot of time on coding, unlike the other languages. Rather, all they can do is, spend time on…

ContinueAdded by Divyesh Aegis on July 25, 2019 at 12:53am — No Comments

Python was introduced in 1991 by Guido Van Rossum as a high level, general purpose language. Even today, it supports multiple programming paradigms including procedural, object oriented and functional. Soon, it became one of the most popular languages in the industry, and in fact is the very language that influence Ruby and Swift. Even TIOBE Index reports mentions python as the third most popular…

ContinueAdded by Divyesh Aegis on July 16, 2019 at 12:55am — No Comments

In financial markets, two of the most common trading strategies used by investors are the momentum and mean reversion strategies. If a stock exhibits momentum (or trending behavior as shown in the figure below), its price on the current period is more likely to increase (decrease) if it has already increased (decreased) on the previous period.

When the return of a stock at time t depends in some way on the return at the previous time t-1, the returns are said to be autocorrelated. In…

ContinueAdded by Vincent Granville on July 8, 2019 at 10:25am — No Comments

*Summary:** The annual Burtch Works salary survey tells us a lot about which industries are using the most data scientists and the difference between higher and lower skilled data scientists. Salary increases show us whether demand is increasing, and finally we take a shot at determining which skills are most in demand.*

What a difference a few years can make. We used to say that everyone loves a data scientist – and wants to be one. …

ContinueAdded by Vincent Granville on July 8, 2019 at 10:18am — No Comments

By Ajit Jaokar. This post is a part of my forthcoming book on Mathematical foundations of Data Science. In this post, we use the Perceptron algorithm to bridge the gap between high school maths and deep learning.

**Background**

As part of my role as course director of the Artificial Intelligence: Cloud and Edge Computing at the University of Oxford, I see more students who are familiar with programming than with mathematics.

They have last learnt maths…

ContinueAdded by Vincent Granville on June 27, 2019 at 12:22pm — No Comments

Originally published in 2014 and viewed more than 200,000 times, this is the oldest data science cheat sheet - the mother of all the numerous cheat sheets that are so popular nowadays. I decided to update it in June 2019. While the first half, dealing with installing components on your laptop and learning UNIX, regular expressions, and file management hasn't changed much, the second half, dealing with machine learning, was rewritten entirely from scratch. It is amazing how things changed in…

ContinueAdded by Vincent Granville on June 6, 2019 at 8:27pm — No Comments

It will be unwise to expect you will generate lot of sales if you have significant amount of web traffic. It alone cannot be of much help in this matter. You will need to track the website metrics properly in order to take necessary measure to convert the traffic into your business prospects. You will need to analyze your website from time to time to ensure that it is not only accessible to the users but also provides all necessary guidance to show them the right way to make a…

ContinueAdded by Jenny Richards on June 6, 2019 at 1:30am — No Comments

We propose simple solutions to important problems that all data scientists face almost every day. In short, a toolbox for the handyman, useful to busy professionals in any field.

**1. Eliminating sample size effects**. Many statistics, such as correlations or R-squared, depend on the sample size, making it difficult to…

Added by Vincent Granville on June 4, 2019 at 12:00pm — No Comments

This simple introduction to matrix theory offers a refreshing perspective on the subject. Using a basic concept that leads to a simple formula for the power of a matrix, we see how it can solve time series, Markov chains, linear regression, data reduction, principal components analysis (PCA) and other machine learning problems. These problems are usually solved with more advanced matrix calculus, including eigenvalues, diagonalization, generalized inverse matrices, and other types of…

ContinueAdded by Vincent Granville on May 28, 2019 at 9:00pm — No Comments

We have added a new free book in our selection exclusively for DSC members. See the first entry below, to get started with machine learning with Python.

**1. Book: Classification and Regression In a Weekend**

This tutorial began as a series of weekend workshops created by Ajit Jaokar and Dan Howarth. The idea was to work with a specific (longish) program such that we explore as much of it as possible in one weekend. This book is an attempt to take this idea online.…

ContinueAdded by Vincent Granville on May 16, 2019 at 6:24pm — No Comments

We propose a simple model-free solution to compute any confidence interval and to extrapolate these intervals beyond the observations available in your data set. In addition we propose a mechanism to sharpen the confidence intervals, to reduce their width by an order of magnitude. The methodology works with any estimator (mean, median, variance, quantile, correlation and so on) even when the data set violates the classical requirements necessary to make traditional statistical techniques…

ContinueAdded by Vincent Granville on May 9, 2019 at 11:30am — No Comments

This crash course features a new fundamental statistics theorem -- even more important than the central limit theorem -- and a new set of statistical rules and recipes. We discuss concepts related to determining the optimum sample size, the optimum *k* in *k*-fold cross-validation, bootstrapping, new re-sampling techniques, simulations, tests of hypotheses, confidence intervals, and statistical inference using a unified, robust, simple…

Added by Vincent Granville on May 4, 2019 at 12:30pm — No Comments

So many fascinating and deep results have been written about the number (1 + SQRT(5)) / 2 and its related sequence - the Fibonacci numbers - that it would take years to read all of them. This number has been studied both for its applications (population growth, architecture) and its mathematical properties, for over 2,000 years. It is still a topic of active research.…

ContinueAdded by Vincent Granville on April 25, 2019 at 7:30am — No Comments

*Summary:** Finally there are tools that let us transcend ‘correlation is not causation’ and identify true causal factors and their relative strengths in our models. This is what prescriptive analytics was meant to be.*

Just when I thought we’d figured it all out,…

ContinueAdded by Vincent Granville on April 24, 2019 at 7:30pm — No Comments

- Data-driven innovation in healthcare: synthetical clinical data
- Introduction to privacy-preserving synthetic data
- How “anonymous” is anonymous data?
- Use the Data Insights Iceberg to Manage Stakeholder Expectations
- New Books in AI, Machine Learning, and Data Science
- Python for Automating Your Quality Analysis
- 40+ Modern Tutorials Covering All Aspects of Machine Learning

- Data-driven innovation in healthcare: synthetical clinical data
- The Exponential Mean: Alternative to Classic Means
- 10 use-cases for privacy-preserving synthetic data
- Introduction to privacy-preserving synthetic data
- PETs: the technologies organization should consider adopting
- Bernouilli Lattice Models - Connection to Poisson Processes
- Explaining Data Science to a Non-Data Scientist

- Data science jobs not requiring human interactions
- Data Science – the Foundation for Leading Banks
- Blog - R vs Python. Which one has higher demand on the job market? A short study
- 10 Tools For Working With Big Data For Successful Analytics
- The 8 worst predictive modeling techniques
- Common Errors in Machine Learning due to Poor Statistics Knowledge
- Machine Learning with Signal Processing Techniques

- data (143)
- analytics (142)
- asymptotix (131)
- Analytics (125)
- Data (114)
- Business (49)
- predictive (46)
- big (45)
- Intelligence (42)
- Big (42)

**2020**

**2019**

- December (5)
- November (6)
- October (5)
- September (3)
- August (9)
- July (4)
- June (5)
- May (5)
- April (7)
- March (5)
- February (9)
- January (7)

**2018**

- December (3)
- November (1)
- October (3)
- September (8)
- August (13)
- July (6)
- June (7)
- May (16)
- April (10)
- March (10)
- February (14)
- January (14)

**2017**

- December (7)
- November (14)
- October (15)
- September (13)
- August (17)
- July (13)
- June (9)
- May (10)
- April (23)
- March (8)
- February (8)
- January (10)

**2016**

- December (12)
- November (24)
- October (3)
- September (4)
- August (17)
- July (19)
- June (6)
- May (21)
- April (14)
- March (15)
- February (13)
- January (11)

**2015**

- December (25)
- November (19)
- October (24)
- September (21)
- August (26)
- July (34)
- June (30)
- May (16)
- April (21)
- March (17)
- February (25)
- January (19)

**2014**

- December (29)
- November (29)
- October (36)
- September (15)
- August (18)
- July (40)
- June (29)
- May (24)
- April (38)
- March (42)
- February (49)
- January (67)

**2013**

- December (66)
- November (76)
- October (79)
- September (90)
- August (106)
- July (89)
- June (72)
- May (72)
- April (63)
- March (61)
- February (74)
- January (54)

**2012**

- December (45)
- November (83)
- October (119)
- September (82)
- August (95)
- July (77)
- June (85)
- May (104)
- April (41)
- March (74)
- February (73)
- January (73)

**2011**

- December (83)
- November (64)
- October (77)
- September (105)
- August (39)
- July (25)
- June (44)
- May (64)
- April (46)
- March (34)
- February (50)
- January (40)

**2010**

- December (76)
- November (54)
- October (42)
- September (73)
- August (39)
- July (35)
- June (34)
- May (27)
- April (24)
- March (20)
- February (26)
- January (36)

**2009**

- December (49)
- November (57)
- October (48)
- September (44)
- August (39)
- July (27)
- June (41)
- May (38)
- April (53)
- March (47)
- February (37)
- January (38)

**2008**

© 2020 TechTarget, Inc. Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions