Subscribe to DSC Newsletter

All Blog Posts (2,252)

Advanced Analytic Platforms – Changes in the Leaderboard 2020

Summary: The Gartner Magic Quadrant for Data Science and Machine Learning Platforms is just out the big news is how much more capable all the platforms have become.  Of course there are also some interesting winner and loser stories.

The Gartner Magic Quadrant for Data Science and Machine Learning Platforms is just out for 2020.  The really big news is how many excellent choices are now available.  In a remarkable move, the whole field…


Added by Vincent Granville on February 21, 2020 at 9:25am — No Comments

Sentiment Analysis with Naive Bayes and LSTM

In this notebook, we try to predict the positive (label 1) or negative (label 0) sentiment of the sentence. We use the UCI Sentiment Labelled Sentences Data Set.

Sentiment analysis is very useful in many areas. For example, it can be used for internet conversations moderation. Also, it is possible to predict ratings that users can assign to a certain product (food, household appliances, hotels,…


Added by Vincent Granville on February 19, 2020 at 8:42pm — No Comments

Common Errors in Machine Learning due to Poor Statistics Knowledge

Probably the worst error is thinking there is a correlation when that correlation is purely artificial. Take a data set with 100,000 variables, say with 10 observations. Compute all the (99,999 * 100,000) / 2 cross-correlations. You are almost guaranteed to find one above 0.999. This is best illustrated in may article How to Lie with P-values (also discussing…


Added by Vincent Granville on February 7, 2020 at 9:48am — No Comments

New Perspective on Fermat's Last Theorem

Fermat's last conjecture has puzzled mathematicians for 300 years, and was eventually proved only recently. In this note, I propose a generalization, that could actually lead to a much simpler proof and a more powerful result with broader applications, including to solve numerous similar equations. As usual, my research involves a significant amount of computations and experimental math, as an exploratory step before stating new conjectures, and eventually trying to prove them. The…


Added by Vincent Granville on January 30, 2020 at 1:09am — No Comments

Best Languages for Data Science and Statistics in One Picture

Hundreds of programming languages dominate the data science and statistics market: Python, R, SAS and SQL are standouts. If you're looking to branch out and add a new programming language to your skill set, which one should you learn? This one picture breaks down the differences between the four languages.…


Added by Vincent Granville on January 28, 2020 at 8:41pm — No Comments

Quick Primer On Graph Data Structure

While many of the programming libraries encapsulate the inner working details of graph and other algorithms, as a data scientist it helps a lot having a reasonably good familiarity of such details.  A solid understanding of the intuition behind such algorithms not only helps in appreciating the logic behind them but also helps in making conscious decisions about their applicability in real life cases.  There are several graph based algorithms and most notable are the shortest path…


Added by Vincent Granville on January 21, 2020 at 10:12am — No Comments

TensorFlow 1.x vs 2.x. – summary of changes

In 2019, Google announced TensorFlow 2.0, it is a major leap from the existing TensorFlow 1.0. The key differences are as follows:

Ease of use: Many old libraries (example tf.contrib) were removed, and some consolidated. For example, in TensorFlow1.x the model could be made using Contrib, layers, Keras or estimators, so many options for the same task confused many new users. TensorFlow 2.0 promotes TensorFlow Keras for model experimentation and Estimators…


Added by Vincent Granville on January 9, 2020 at 9:49am — No Comments

The Next Big Thing in AI/ML is…

Summary:  AI/ML itself is the next big thing for many fields if you’re on the outside looking in.  But if you’re a data scientist it’s possible to see those advancements that will propel AI/ML to its next phase of utility.


“The Next Big Thing in AI/ML is…” as the lead to an article is probably the most…


Added by Vincent Granville on January 7, 2020 at 7:41am — No Comments

How exactly do you determine causation?

Another good article by Ajit Joakar. 

Co-relation does not equal causation – is a mantra drilled into a Data Scientist from an early age

That’s fine. But very few talk of the follow-on question ..

How exactly do you determine causation?

This problem is further compounded because most books and examples are based on standard datasets (ex: Boston, Iris etc) . These examples do not discuss…


Added by Vincent Granville on December 17, 2019 at 2:30pm — No Comments

Rule of thumb: Which AI / ML algorithms to apply

Written by Ajit Jaokar.

Firstly, there are three broad categories of algorithms:

  • Supervised learning: You know how to classify the input data and the type of behavior you want to predict, but you need the algorithm to calculate it for you on new data
  • Unsupervised learning: You do not know how to classify the data, and you want the algorithm to find patterns and classify the data for…

Added by Vincent Granville on December 17, 2019 at 9:00am — No Comments

Statistics for Data Science in One Picture

There's no doubt about it, probability and statistics is an enormous field, encompassing topics from the familiar (like the average) to the complex (regression analysis, correlation coefficients and hypothesis testing to name but a few). If you want to be a great data scientist, you have to know some basic statistics. The following picture shows which statistics topics you must know if you're going to excel in data science.…


Added by Vincent Granville on December 12, 2019 at 6:30pm — No Comments

On Being a 50 Year Old Data Scientist

At the time of writing, I'm a 52 year-old working in the fields of mathematics and data science. In mathematics, that makes me well-seasoned (and probably well-tenured, if I had chosen to continue in academia). In data science, some would consider me a dinosaur. In fact, many older people considering a career in data science might be put off by the thought that data science is tough to break into at a later age. But is that statement true? Should the over 50 crowd put down their textbooks…


Added by Vincent Granville on December 10, 2019 at 11:51am — No Comments

How are Deep Neural Networks Adding to Advantage in Climate Change Studies?

Imagine yourself relocating to a more industrial place for living because your beach house was washed away in the tide. In another scenario imagine yourself wearing masks throughout the year. What if I say that, all this that you just imagined could be a reality very soon?


Whether you choose to believe it or not, climate change is happening for real. Even though you might not be able to spot a lot of its impact around you, the world…


Added by Divyesh Aegis on December 2, 2019 at 11:24pm — No Comments

Variance, Attractors and Behavior of Chaotic Statistical Systems

We study the properties of a typical chaotic system to derive general insights that apply to a large class of unusual statistical distributions. The purpose is to create a unified theory of these systems. These systems can be deterministic or random, yet due to their gentle chaotic nature, they exhibit the same behavior in both cases. They lead to new models with numerous applications in Fintech, cryptography, simulation and benchmarking tests of statistical hypotheses. They are also…


Added by Vincent Granville on November 29, 2019 at 2:30am — No Comments

A Lesson in Using NLP for Hidden Feature Extraction

Summary:  99% of our application of NLP has to do with chatbots or translation.  This is a very interesting story about expanding the bounds of NLP and feature creation to predict bestselling novels.  The authors created over 20,000 NLP features, about 2,700 of which proved to be predictive with a 90% accuracy rate in predicting NYT bestsellers.…


Added by Vincent Granville on November 28, 2019 at 10:00pm — No Comments

New Family of Generalized Gaussian Distributions

In this article, we explore a new type of generalized univariate normal distributions that satisfies useful statistical properties, with interesting applications. This new class of distributions is defined by its characteristic function, and applications are discussed in the last section. These distributions are semi-stable (we define what this means below). In short it is a much wider class than the stable distributions (the only stable distribution with a finite variance…


Added by Vincent Granville on November 27, 2019 at 11:14pm — No Comments

10 Machine Learning Methods that Every Data Scientist Should Know

Machine learning is a hot topic in research and industry, with new methodologies developed all the time. The speed and complexity of the field makes keeping up with new techniques difficult even for experts — and potentially overwhelming for beginners.

To demystify machine learning and to offer a learning path for those who are new to the core…


Added by Vincent Granville on November 27, 2019 at 10:58am — No Comments

10 Visualizations Every Data Scientist Should Know

This article is by Jorge Castañón, Ph.D., Senior Data Scientist at the IBM Machine Learning Hub.

Data visualization plays two key roles:

1. Communicating results clearly to a general audience.

2. …


Added by Vincent Granville on November 12, 2019 at 10:00am — No Comments

Python for Automating Your Quality Analysis

Analyzing the quality of your software is crucial to any business. The process appears towards the end of your software development lifecycle but indeed decides the fate of it. In other words, quality analysis demonstrates a process in which the actual output of the software is tested with its expected output. There are a variety of test inputs that are used in the process of quality analysis so that the product sheds light on the actual truth of where it…


Added by Divyesh Aegis on November 7, 2019 at 11:00pm — No Comments

More Weird Statistical Distributions

Some original and very interesting material is presented here, with possible applications in Fintech. No need for a PhD in math to understand this article: I tried to make the presentation as simple as possible, focusing on high-level results rather than technicalities. Yet, professional statisticians and mathematicians, even academic researchers, will find some deep and fascinating results worth further exploring.…


Added by Vincent Granville on October 26, 2019 at 6:00pm — No Comments

Blog Topics by Tags

Monthly Archives














On Data Science Central

© 2020 is a subsidiary and dedicated channel of Data Science Central LLC   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service