AnalyticBridge

A Data Science Central Community

Subscribe to DSC Newsletter

All Blog Posts (2,243)

Statistics for Data Science in One Picture

There's no doubt about it, probability and statistics is an enormous field, encompassing topics from the familiar (like the average) to the complex (regression analysis, correlation coefficients and hypothesis testing to name but a few). If you want to be a great data scientist, you have to know some basic statistics. The following picture shows which statistics topics you must know if you're going to excel in data science.…

Continue

Added by Vincent Granville on December 12, 2019 at 6:30pm — No Comments

On Being a 50 Year Old Data Scientist

At the time of writing, I'm a 52 year-old working in the fields of mathematics and data science. In mathematics, that makes me well-seasoned (and probably well-tenured, if I had chosen to continue in academia). In data science, some would consider me a dinosaur. In fact, many older people considering a career in data science might be put off by the thought that data science is tough to break into at a later age. But is that statement true? Should the over 50 crowd put down their textbooks…

Continue

Added by Vincent Granville on December 10, 2019 at 11:51am — No Comments

How are Deep Neural Networks Adding to Advantage in Climate Change Studies?

Imagine yourself relocating to a more industrial place for living because your beach house was washed away in the tide. In another scenario imagine yourself wearing masks throughout the year. What if I say that, all this that you just imagined could be a reality very soon?

Whether you choose to believe it or not, climate change is happening for real. Even though you might not be able to spot a lot of its impact around you, the world…

Continue

Added by Divyesh Aegis on December 2, 2019 at 11:24pm — No Comments

Variance, Attractors and Behavior of Chaotic Statistical Systems

We study the properties of a typical chaotic system to derive general insights that apply to a large class of unusual statistical distributions. The purpose is to create a unified theory of these systems. These systems can be deterministic or random, yet due to their gentle chaotic nature, they exhibit the same behavior in both cases. They lead to new models with numerous applications in Fintech, cryptography, simulation and benchmarking tests of statistical hypotheses. They are also…

Continue

Added by Vincent Granville on November 29, 2019 at 2:30am — No Comments

A Lesson in Using NLP for Hidden Feature Extraction

Summary:  99% of our application of NLP has to do with chatbots or translation.  This is a very interesting story about expanding the bounds of NLP and feature creation to predict bestselling novels.  The authors created over 20,000 NLP features, about 2,700 of which proved to be predictive with a 90% accuracy rate in predicting NYT bestsellers.…

Continue

Added by Vincent Granville on November 28, 2019 at 10:00pm — No Comments

New Family of Generalized Gaussian Distributions

In this article, we explore a new type of generalized univariate normal distributions that satisfies useful statistical properties, with interesting applications. This new class of distributions is defined by its characteristic function, and applications are discussed in the last section. These distributions are semi-stable (we define what this means below). In short it is a much wider class than the stable distributions (the only stable distribution with a finite variance…

Continue

Added by Vincent Granville on November 27, 2019 at 11:14pm — No Comments

10 Machine Learning Methods that Every Data Scientist Should Know

Machine learning is a hot topic in research and industry, with new methodologies developed all the time. The speed and complexity of the field makes keeping up with new techniques difficult even for experts — and potentially overwhelming for beginners.

To demystify machine learning and to offer a learning path for those who are new to the core…

Continue

Added by Vincent Granville on November 27, 2019 at 10:58am — No Comments

10 Visualizations Every Data Scientist Should Know

This article is by Jorge Castañón, Ph.D., Senior Data Scientist at the IBM Machine Learning Hub.

Data visualization plays two key roles:

1. Communicating results clearly to a general audience.

2. …

Continue

Added by Vincent Granville on November 12, 2019 at 10:00am — No Comments

Python for Automating Your Quality Analysis

Analyzing the quality of your software is crucial to any business. The process appears towards the end of your software development lifecycle but indeed decides the fate of it. In other words, quality analysis demonstrates a process in which the actual output of the software is tested with its expected output. There are a variety of test inputs that are used in the process of quality analysis so that the product sheds light on the actual truth of where it…

Continue

Added by Divyesh Aegis on November 7, 2019 at 11:00pm — No Comments

More Weird Statistical Distributions

Some original and very interesting material is presented here, with possible applications in Fintech. No need for a PhD in math to understand this article: I tried to make the presentation as simple as possible, focusing on high-level results rather than technicalities. Yet, professional statisticians and mathematicians, even academic researchers, will find some deep and fascinating results worth further exploring.…

Continue

Added by Vincent Granville on October 26, 2019 at 6:00pm — No Comments

Complete Hands-Off Automated Machine Learning

By Bill Vorhies.

Summary:  Here’ a proposal for real ‘zero touch’, ‘set-em-and-forget-em’ machine learning from the researchers at Amazon.  If you have an environment as fast changing as e-retail and a huge number of models matching buyers and products you could achieve real cost savings and revenue increases by making the refresh cycle faster and more accurate with automation.  This capability likely will be coming soon to your favorite AML…

Continue

Added by Vincent Granville on October 22, 2019 at 2:30pm — No Comments

40+ Modern Tutorials Covering All Aspects of Machine Learning

This list of lists contains books, notebooks, presentations, cheat sheets, and tutorials covering all aspects of data science, machine learning, deep learning, statistics, math, and more, with most documents featuring Python or R code and numerous illustrations or case studies. All this material is available for free, and consists of content mostly created in 2019 and 2018, by various top experts in their respective fields. A few of these documents are available on LinkedIn: see last…

Continue

Added by Vincent Granville on October 13, 2019 at 11:00am — No Comments

Surprising Uses of Synthetic Random Data Sets

I have used synthetic data sets many times for simulation purposes, most recently in my articles Six degrees of Separations between any two Datasets and How to Lie with p-values. Many…

Continue

Added by Vincent Granville on October 2, 2019 at 5:00pm — No Comments

Six Degrees of Separation Between Any Two Data Sets

This is an interesting data science conjecture, inspired by the well known six degrees of separation problem, stating that there is a link involving no more than 6 connections between any two people on Earth, say between you and anyone living (say) in North Korea.

Here the link is between any two univariate data sets…

Continue

Added by Vincent Granville on September 9, 2019 at 10:30am — No Comments

Two New Deep Conjectures in Probabilistic Number Theory

The material discussed here is also of interest to machine learning, AI, big data, and data science practitioners, as much of the work is based on heavy data processing, algorithms, efficient coding, testing, and experimentation. Also, it's not just two new conjectures, but paths and suggestions to solve these problems. The last section contains a few new, original exercises, some with solutions, and may be useful to students, researchers, and instructors offering math and statistics classes…

Continue

Added by Vincent Granville on September 8, 2019 at 4:09am — No Comments

Python as a tool benefiting data scientists in many ways

Being extremely versatile general purpose, professional programming language, Python offers plenty of applications. Python language is user-friendly and simple to grasp and this made it popular throughout the world. Python plays a critical role for data scientists to find out lucrative job opportunities.

Today, Python has become the most in-demand programming language in the data science world. Python offers an extensive range…

Continue

Added by Divyesh Aegis on September 5, 2019 at 12:00am — No Comments

10 Machine Learning Methods that Every Data Scientist Should Know

Machine learning is a hot topic in research and industry, with new methodologies developed all the time. The speed and complexity of the field makes keeping up with new techniques difficult even for experts — and potentially overwhelming for beginners.

To demystify machine learning and to offer a learning path for those who are new to the core…

Continue

Added by Vincent Granville on August 30, 2019 at 11:08am — No Comments

A Strange Family of Statistical Distributions

I introduce here a family of very peculiar statistical distributions governed by two parameters: p, a real number in [0, 1], and b, an integer > 1.

Potential applications are found in cryptography, Fintech (stock market modeling), Bitcoin, number theory, random number…

Continue

Added by Vincent Granville on August 30, 2019 at 10:11am — No Comments

Extreme Events Modeling Using Continued Fractions

Continued fractions are usually considered as a beautiful, curious mathematical topic, but with applications mostly theoretical and limited to math and number theory. Here we show how it can be used in applied business and economics contexts, leveraging the mathematical theory developed for continued fraction, to model and explain natural phenomena. …

Continue

Added by Vincent Granville on August 30, 2019 at 9:42am — No Comments

Different Ways to Incorporate Data in Business Strategy for Security

In the data-driven enterprise system, Spark has become a popular name that is easy to use, offer speed and versatility. The data can be understood at fast speed allowing one to make faster decisions. The Big Data has a huge benefit with the faster data processing of Spark. This clustering of large datasets works with a framework in open source that helps in analyzing. The codes are done in the Scala that has made it possible and easier for data processing that gives a certain boost to the…

Continue

Added by Divyesh Aegis on August 13, 2019 at 12:51am — No Comments

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

On Data Science Central

© 2019   AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC   Powered by