A Data Science Central Community
This article is by Jorge Castañón, Ph.D., Senior Data Scientist at the IBM Machine Learning Hub.
Data visualization plays two key roles:
1. Communicating results clearly to a general audience.
Added by Vincent Granville on November 12, 2019 at 10:00am — No Comments
Analyzing the quality of your software is crucial to any business. The process appears towards the end of your software development lifecycle but indeed decides the fate of it. In other words, quality analysis demonstrates a process in which the actual output of the software is tested with its expected output. There are a variety of test inputs that are used in the process of quality analysis so that the product sheds light on the actual truth of where it…Continue
Added by Divyesh Aegis on November 7, 2019 at 11:00pm — No Comments
Some original and very interesting material is presented here, with possible applications in Fintech. No need for a PhD in math to understand this article: I tried to make the presentation as simple as possible, focusing on high-level results rather than technicalities. Yet, professional statisticians and mathematicians, even academic researchers, will find some deep and fascinating results worth further exploring.…Continue
Added by Vincent Granville on October 26, 2019 at 6:00pm — No Comments
By Bill Vorhies.
Summary: Here’ a proposal for real ‘zero touch’, ‘set-em-and-forget-em’ machine learning from the researchers at Amazon. If you have an environment as fast changing as e-retail and a huge number of models matching buyers and products you could achieve real cost savings and revenue increases by making the refresh cycle faster and more accurate with automation. This capability likely will be coming soon to your favorite AML…Continue
Added by Vincent Granville on October 22, 2019 at 2:30pm — No Comments
This list of lists contains books, notebooks, presentations, cheat sheets, and tutorials covering all aspects of data science, machine learning, deep learning, statistics, math, and more, with most documents featuring Python or R code and numerous illustrations or case studies. All this material is available for free, and consists of content mostly created in 2019 and 2018, by various top experts in their respective fields. A few of these documents are available on LinkedIn: see last…Continue
Added by Vincent Granville on October 13, 2019 at 11:00am — No Comments
I have used synthetic data sets many times for simulation purposes, most recently in my articles Six degrees of Separations between any two Datasets and How to Lie with p-values. Many…Continue
Added by Vincent Granville on October 2, 2019 at 5:00pm — No Comments
This is an interesting data science conjecture, inspired by the well known six degrees of separation problem, stating that there is a link involving no more than 6 connections between any two people on Earth, say between you and anyone living (say) in North Korea.
Here the link is between any two univariate data sets…Continue
Added by Vincent Granville on September 9, 2019 at 10:30am — No Comments
The material discussed here is also of interest to machine learning, AI, big data, and data science practitioners, as much of the work is based on heavy data processing, algorithms, efficient coding, testing, and experimentation. Also, it's not just two new conjectures, but paths and suggestions to solve these problems. The last section contains a few new, original exercises, some with solutions, and may be useful to students, researchers, and instructors offering math and statistics classes…Continue
Added by Vincent Granville on September 8, 2019 at 4:09am — No Comments
Being extremely versatile general purpose, professional programming language, Python offers plenty of applications. Python language is user-friendly and simple to grasp and this made it popular throughout the world. Python plays a critical role for data scientists to find out lucrative job opportunities.
Today, Python has become the most in-demand programming language in the data science world. Python offers an extensive range…Continue
Added by Divyesh Aegis on September 5, 2019 at 12:00am — No Comments
Machine learning is a hot topic in research and industry, with new methodologies developed all the time. The speed and complexity of the field makes keeping up with new techniques difficult even for experts — and potentially overwhelming for beginners.
To demystify machine learning and to offer a learning path for those who are new to the core…Continue
Added by Vincent Granville on August 30, 2019 at 11:08am — No Comments
I introduce here a family of very peculiar statistical distributions governed by two parameters: p, a real number in [0, 1], and b, an integer > 1.
Potential applications are found in cryptography, Fintech (stock market modeling), Bitcoin, number theory, random number…Continue
Added by Vincent Granville on August 30, 2019 at 10:11am — No Comments
Continued fractions are usually considered as a beautiful, curious mathematical topic, but with applications mostly theoretical and limited to math and number theory. Here we show how it can be used in applied business and economics contexts, leveraging the mathematical theory developed for continued fraction, to model and explain natural phenomena. …Continue
Added by Vincent Granville on August 30, 2019 at 9:42am — No Comments
In the data-driven enterprise system, Spark has become a popular name that is easy to use, offer speed and versatility. The data can be understood at fast speed allowing one to make faster decisions. The Big Data has a huge benefit with the faster data processing of Spark. This clustering of large datasets works with a framework in open source that helps in analyzing. The codes are done in the Scala that has made it possible and easier for data processing that gives a certain boost to the…Continue
Added by Divyesh Aegis on August 13, 2019 at 12:51am — No Comments
In my previous posts, I compared model evaluation techniques using Statistical Tools & Tests and commonly used Classification and Clustering evaluation techniques
In this post, I'll take a look at how you can compare regression models. Comparing regression models is perhaps one of the trickiest tasks to complete in the "comparing models" arena; The reason is that there are literally dozens of statistics you can calculate to compare regression models, including:
Added by Vincent Granville on August 8, 2019 at 10:37am — No Comments
A brief explanation is:
Added by Vincent Granville on August 8, 2019 at 10:29am — No Comments
Decision Trees, Random Forests and Boosting are among the top 16 data science and machine learning tools used by data scientists. The three methods are similar, with a significant amount of overlap. In a nutshell:
Added by Vincent Granville on August 8, 2019 at 10:25am — No Comments
Properly implemented Machine Learning (ML) models can have a positive effect on organizational efficiency. It is first necessary to understand how these models are created, how they function, and how they are put into production.
The Definition of a Machine Learning Model
When a computer is presented with questions within a particular domain, a machine learning model will run an algorithm that will enable it to resolve those questions. These algorithms are not…Continue
Added by Arash Aghlara on August 7, 2019 at 3:30am — No Comments
Python is an extremely popular programming language. It is not just apt for generic purposes but it is extremely easy to read and use as well. The main reason why Python is used by a majority of people these days is the fact that it allows the programmers to save their time by using only limited lines of codes. In order to accomplish tasks, the developers do not have to spend a lot of time on coding, unlike the other languages. Rather, all they can do is, spend time on…Continue
Added by Divyesh Aegis on July 25, 2019 at 12:53am — No Comments
Python was introduced in 1991 by Guido Van Rossum as a high level, general purpose language. Even today, it supports multiple programming paradigms including procedural, object oriented and functional. Soon, it became one of the most popular languages in the industry, and in fact is the very language that influence Ruby and Swift. Even TIOBE Index reports mentions python as the third most popular…Continue
Added by Divyesh Aegis on July 16, 2019 at 12:55am — No Comments
In financial markets, two of the most common trading strategies used by investors are the momentum and mean reversion strategies. If a stock exhibits momentum (or trending behavior as shown in the figure below), its price on the current period is more likely to increase (decrease) if it has already increased (decreased) on the previous period.
When the return of a stock at time t depends in some way on the return at the previous time t-1, the returns are said to be autocorrelated. In…Continue
Added by Vincent Granville on July 8, 2019 at 10:25am — No Comments