]]>

Many of the following statistical tests are rarely discussed in textbooks or in college classes, much less in data camps. Yet they help answer a lot of different and interesting questions. I used most of them without even computing the underlying distribution under the null hypothesis, but instead, using simulations to check whether my assumptions were plausible or not. In short, my approach to statistical testing is is model-free, data-driven. Some are easy to implement even in Excel. Some of them are illustrated here, with examples that do not require statistical knowledge for understanding or implementation.This material should appeal to managers, executives, industrial engineers, software engineers, operations research professionals, economists, and to anyone dealing with data, such as biometricians, analytical chemists, astronomers, epidemiologists, journalists, or physicians. Statisticians with a different perspective are invited to discuss my methodology and the tests described here, in the comment section at the bottom of this article. In my case, I used these tests mostly in the context of experimental mathematics, which is a branch of data science that few people talk about. In that context, the theoretical answer to a statistical test is sometimes known, making it a great benchmarking tool to assess the power of these tests, and determine the minimum sample size to make them valid.I provide here a general overview, as well as my simple approach to statistical testing, accessible to professionals with little or no formal statistical training. Detailed applications of these tests are found in my recent book and in this article. Precise references to these documents are provided as needed, in this article.Examples of traditional tests1. General MethodologyDespite my strong background in statistical science, over the years, I moved away from relying too much on traditional statistical tests and statistical inference. I am not the only one: these tests have been abused and misused, see for instance this article on p-hacking. Instead, I favored a methodology of my own, mostly empirical, based on simulations, data- rather than model-driven. It is essentially a non-parametric approach. It has the advantage of being far easier to use, implement, understand, and interpret, especially to the non-initiated. It was initially designed to be integrated in black-box, automated decision systems. Here I share some of these tests, and many can be implemented easily in Excel. Read the full article here. See More

For background to this post, please see Learn Machine Learning Coding Basics in a weekend. Here,we present the glossary that we use for the coding and the mindmap attached to these classes and upcoming book. About 80 terms are included in the glossary, covering Ensembles, Regression, Classification, Algorithms, Training, Validation, Model Evaluation and more. For instance, the section about Classification contains the following entries:Class Hyperplane Decision Boundary False Negative (FN) False Positive (FP) True Negative (TN) True Positive (TP) Precision Recall F1 Score Few-Shot Learning Hinge Loss Log Loss To download the glossary, follow this link. DSC ResourcesFree BooksForum DiscussionsCheat SheetsJobsSearch DSCDSC on TwitterDSC on FacebookSee More

Logistic regression (LR) models estimate the probability of a binary response, based on one or more predictor variables. Unlike linear regression models, the dependent variables are categorical. LR has become very popular, perhaps because of the wide availability of the procedure in software. Although LR is a good choice for many situations, it doesn't work well for all situations. For example:In propensity score analysis where there are many covariates, LR performs poorly.For classifications, LR usually requires more variables than to achieve the same (or better) misclassification rate than Support Vector Machines (SVM) for multivariate and mixture distributions.In addition, LR is prone to issues like overfitting and multicollinearity.A wide range of alternatives are available, from statistics-based procedures (e.g. log binomial, ordinary or modified Poisson regression and Cox regression) to those rooted more deeply in data science such as machine learning and neural network theory. Which one you choose depends largely on what tools you have available to you, what theory (e.g. statistics vs. neural networks) you want to work with, and what you're trying to achieve with your data. For example, tree-based methods are a good alternative for assessing risk factors, while Neural Networks (NN) and Support Vector Machines (SVM) work well for propensity score estimation and Categorization/Classification.There are literally hundreds of viable alternatives to logistic regression, so it isn't possible to discuss them all within the confines of a single blog post. What follows is an outline of some of the more popular choices.Read the full article, here. See More

This is another interesting problem, off-the-beaten-path. It ends up with a formula to compute the integral of a function, based on its derivatives solely. For simplicity, I'll start with some notations used in the context of matrix theory, familiar to everyone: T(f) = g, where f and g are vectors, and T a square matrix. The notation T(f) represents the product between the matrix T, and the vector f. Now, imagine that the dimensions are infinite, with f being a vector whose entries represent all the real numbers in some peculiar order. In mathematical analysis, T is called an operator, mapping all real numbers (represented by the vector f) onto another infinite vector g. In other words, f and g can be viewed as real-valued functions, and T transforms the function f into a new function g. A simple case is when T is the derivative operator, transforming any function f into its derivative g = df/dx. We define the powers of T as T^0 = I (the identity operator, with I(f) = f), T^2(f) = T(T(f)), T^3(f) = T(T^2(f)) and so on, just like the powers of a square matrix. Now let the fun begins.Exponential of the Derivative OperatorWe assume here that T is the derivative operator. Using the same notation as above, we have the same formula as if T was a matrix:Applied to a function f, we have:This is a simple application of Taylor series. So the exponential of the derivative operator is a shift operator.Inverse of the Derivative OperatorLikewise, as for matrices, we can define the inverse of T asIf T was a matrix, the condition for convergence is that all of the eigenvalues of T - I have absolute value smaller than 1. For the derivative operator T applied to a function f, and under some conditions that guarantee convergence, it is easy to show thatThe coefficients (for instance 1, -4, 6, -4, 1 in the last term displayed above) are just the binomial coefficients, with alternating signs.We call the inverse of the derivative operator, the pseudo-integral operator. It is easy to prove that the pseudo-integral operator (as defined above), applied to the exponential function, yields the exponential function itself. So the exponential function is a fixed point (the only continuous one) of the pseudo-integral operator. More interestingly, in this case, the pseudo-integral operator is just the standard integral operator: they are both the same. Is this always the case regardless of the function f? It turns out that this is true for any function f that can be written as This covers a large class of functions, especially since the coefficients can also be complex numbers. These functions usually have a Taylor series expansion too. However, it does not apply to functions such as polynomials, due to lack of convergence of the formula, in that case.In short, we have found a formula to compute the integral of a function, based solely on the function itself and its successive derivatives. The same technique can be used to invert more complicated linear operators, such as Laplace transforms.ExerciseApply the derivative operator to the pseudo-integral of a function f, using the above formula for the pseudo-integral. The result should be equal to f. This is the case if f belongs to the same family of functions as described above. Can you identify functions not belonging to that family of functions, for which the theory is still valid? Hint: try f(x) = exp(b x^2) or f(x) = x exp(b x), where b is a parameter.To not miss this type of content in the future, subscribe to our newsletter. For related articles from the same author, click here or visit www.VincentGranville.com. Follow me on on LinkedIn, or visit my old web page here.DSC ResourcesBook and Resources for DSC MembersComprehensive Repository of Data Science and ML ResourcesAdvanced Machine Learning with Basic ExcelDifference between ML, Data Science, AI, Deep Learning, and StatisticsSelected Business Analytics, Data Science and ML articlesHire a Data Scientist | Search DSC | Find a JobPost a Blog | Forum QuestionsFollow us: Twitter | FacebookSee More

First days after the celebration of the New Year is the time when looking back we can analyze our actions, promises and draw conclusions whether our predictions and expectations came true. As 2018 came to its end, it is perfect time to analyze it and to set trends for the next year. The amount of data generated every minute is enormous. Therefore new approaches, techniques, and solutions have been developed.Looking back to our article Top 10 Technology Trends of 2018 we can say that we were preparing you for the upcoming changes related to aspects of security, changes provoked by the AI in business operations, extensive application of blockchains, further development of the Internet of Things (IoT), growing of NLP, etc. Some of these statements have been implemented in 2018, yet some will remain topical in 2019 as well. Only one factor remains stable - development. There is no doubt, the technologies will continue to develop, improve and upgrade to fit their purposes better.Primarily smart data technologies were actively applied only by huge enterprises and corporations. Today, big data has become available to a wide range of small businesses and companies. Both big enterprises and small companies tend to rely on big data in the questions of the intelligent business insights in their decision-making.The ever-growing stream of data may also present a challenge to business people. The prediction of changes in the role of big data and technologies is even more difficult. Thus, our top technology trends of 2019 are to serve a comprehensible roadmap for you.2019 Trends1. Data security will reinforce its positions2. Internet of Things will deliver new opportunities3. Automation continues to be game-changing4. AR is expected to overcome VRTo read the 10 trends with detailed information for each trend, follow this link. DSC ResourcesBook and Resources for DSC MembersComprehensive Repository of Data Science and ML ResourcesAdvanced Machine Learning with Basic ExcelDifference between ML, Data Science, AI, Deep Learning, and StatisticsSelected Business Analytics, Data Science and ML articlesHire a Data Scientist | Search DSC | Find a JobPost a Blog | Forum QuestionsFollow us: Twitter | FacebookSee More

Extract from the upcoming Monday newsletter published by Data Science Central. Previous editions can be found here. The contribution flagged with a + is our selection for the picture of the week. To subscribe, follow this link. To check the full digest and see the picture of the week, follow this link. Featured Resources and Technical Contributions 23 Statistical Concepts Explained in Simple English - Part 7 New Home Sales Projection: Time Series Forecasting Best dynamically-typed programming languages for data analysis The 10 Statistical Techniques Data Scientists Need to Master SIP text log analysis using Pandas Featured Articles and Forum QuestionsHow to Flourish in Industry 4.0, the Fourth Industrial Revolution +Advice to a fresh graduate for getting a job in AI/ Data Science Top 10 Technology Trends of 2019Spatial Analytics for Micro-marketing Why AI/ML is Moving so Slowly in Healthcare Mining Customer Reviews to drive Business Growth Graph Analytics to Reinforce Anti-fraud Programs Bill Schmarzo's Retrospective: Data Science, ML, Big Data Analytics... Follow us: Twitter | Facebook. See More

Extract from the upcoming Monday newsletter published by Data Science Central. Previous editions can be found here. The contribution flagged with a + is our selection for the picture of the week. To subscribe, follow this link. To check the full digest and see the picture of the week, follow this link. Featured Resources and Technical Contributions Your Guide to Natural Language Processing (NLP) Stocks, Significance Testing & p-Hacking: How volatile is volatile?The Mathematics of Data Science - Understanding the foundations of Deep Learning through Linear RegressionTableau in 10 Minutes: Step-by-Step Guide Pancake: A Python package for model stacking 900 Most Popular DS & ML Articles in 2018 Featured Articles and Forum QuestionsHow Do You Win the Data Science Wars? You Cheat By Doing The Necessary Pre-work +Supervised vs Unsupervised Learning...Whats the Big Deal? Exploit the Economics of AI with Design Thinking and Data Science The AI/ML Opportunity Landscape in Healthcare 19 Controversial Articles about Data Science Follow us: Twitter | Facebook. See More

This article was written by Ajit Joakar. In this longish post, I have tried to explain Deep Learning starting from familiar ideas like machine learning. This approach forms a part of my forthcoming book. I have used this approach in my teaching. It is based on ‘learning by exception,' i.e. understanding one concept and it’s limitations and then understanding how the subsequent concept overcomes that limitation.The roadmap we follow is:Linear RegressionMultiple Linear RegressionPolynomial RegressionGeneral Linear ModelPerceptron LearningMulti-Layer PerceptronWe thus develop a chain of thought that starts with linear regression and extends to multilayer perceptron (Deep Learning). Also, for simplification, I have excluded other forms of Deep Learning such as CNN and LSTM, i.e. we confine ourselves to the multilayer Perceptron when it comes to Deep Learning. Why start with Linear Regression? Because it is an idea familiar to many even at high school levels.To read the full article, follow this link. For more about deep learning, click here. For more about regression, click here. DSC ResourcesBook and Resources for DSC MembersComprehensive Repository of Data Science and ML ResourcesAdvanced Machine Learning with Basic ExcelDifference between ML, Data Science, AI, Deep Learning, and StatisticsSelected Business Analytics, Data Science and ML articlesHire a Data Scientist | Search DSC | Find a JobPost a Blog | Forum QuestionsFollow us: Twitter | FacebookSee More

Summary: Here are our 5 predictions for data science, machine learning, and AI for 2019. We also take a look back at last year’s predictions to see how we did. It’s that time of year again when we do a look back in order to offer a look forward. What trends will speed up, what things will actually happen, and what things won’t in the coming year for data science, machine learning, and AI.We’ve been watching and reporting on these trends all year and we scoured the web and some of our professional contacts to find out what others are thinking. Here’s a Quick Look at Last Year’s Predictions and How We Did.What we said: Both model production and data prep will become increasingly automated. Larger data science operations will converge on a single platform (of many available). Both of these trends are in response to the groundswell movement for efficiency and effectiveness. In a nutshell allowing fewer data scientists to do the work of many. Clearly a win. No code data science is on the rise as is end-to-end integration in advanced analytic platforms.What we said: Data Science continues to develop specialties that mean the mythical ‘full stack’ data scientist will disappear.To read all 2018 predictions, and compare with the updated 2019 version, click here. AnnouncementLeverage All Your Data With Cloud Analytics - On-demand Webinar See More