Subscribe to DSC Newsletter

All Blog Posts (2,090)

High Precision Computing in Python or R

Here we discuss an application of HPC (not high performance computing, instead high precision computing, which is a special case of HPC)  applied to dynamical systems such as  the logistic map in chaos theory. defined as X(k) = 4 X(k) (1 - X(k-1)). 

For all these systems, the loss of precision propagates exponentially, to the point that after 50 iterations, all generated values are completely wrong. Tons of articles have been written on this subject - none of them acknowledging the…

Continue

Added by Vincent Granville on November 13, 2017 at 7:00pm — No Comments

Great Saturday Reading

Here is our selection of featured articles and resources posted in the last few days:

Continue

Added by Vincent Granville on November 11, 2017 at 4:55pm — No Comments

Beginners Guide to Chatbots

Summary:  This is the first in a series about Chatbots.  In this first installment we cover the basics including their brief technological history, uses, basic design choices, and where deep learning comes into play.  In subsequent articles we’ll describe in more detail about how they are actually programmed and best practice dos and don’ts.…

Continue

Added by Vincent Granville on November 8, 2017 at 1:30pm — No Comments

DSC Competition for Data Science and Quant Pros

Data Science Central (DSC) is excited to announce our competition to solve a new, interesting problem in statistical science, pertaining to stochastic processes. DSC members are invited and encouraged, to submit a theoretical solution or an application to real life problems, including but not limited to fintech, operations research, statistical science, computer science, economics, engineering, social, actuarial, biological or physical sciences.

The statistical process central to this…

Continue

Added by Vincent Granville on November 8, 2017 at 12:00pm — No Comments

Information Retrieval Document Search Engine in R

Introduction:

In this post, we learn about building a basic search engine or document retrieval system using Vector space model. This use case is widely used in information retrieval systems. Given a set of documents and search term(s)/query we need to retrieve relevant documents that are similar to the search query. 

Problem statement:

The problem statement explained above is represented as in below image. …

Continue

Added by suresh kumar Gorakala on November 7, 2017 at 6:30am — No Comments

Fascinating Time Series with Cool Applications

Here we describe well-known chaotic sequences, including new generalizations, with application to random number generation, highly non-linear auto-regressive models for times series, simulation, random permutations, and the use of big numbers (libraries available in programming languages to work with numbers with hundreds of decimals) as standard computer precision almost always produces completely erroneous results after a few iterations  -- a fact rarely if ever mentioned in the scientific…

Continue

Added by Vincent Granville on November 6, 2017 at 8:30pm — No Comments

Trend Analysis of Fragmented Time Series

Full title: Trend Analysis of Fragmented Time Series: Hypothesis Testing Based Adaptive Spline Filtering Method.

Missing data present significant challenges to trend analysis of time series. Straightforward approaches consisting of supplementing missing data with constant or zero values or with linear trends can severely degrade the quality of the trend analysis, which significantly reduces the reliability of the…

Continue

Added by Vincent Granville on November 3, 2017 at 10:23am — No Comments

Linear Models Don’t have to Fit Exactly for P-Values To Be Accurate, Right, and Useful

There is no need to get confused with multiple linear regression, generalized linear model or general linear methods. The general linear model or multivariate regression model is a statistical linear model and is written as Y = XB + U.





Usually, a linear model includes a number of different statistical models such as ANOVA, ANCOVA, MANOVA, MANCOVA, ordinary linear regression, t-test and F-test. The GLM is a generalization of multiple…

Continue

Added by Chirag Shivalker on November 2, 2017 at 11:30pm — 1 Comment

Logistic Map, Chaos, Randomness and Quantum Algorithms

The logistic map is the most basic recurrence formula exhibiting various levels of chaos depending on its parameter. It has been used in population demographics to model chaotic behavior. Here we explore this model in the context of randomness simulation, and revisit a bizarre non-periodic random number generator discovered 70 years ago, based on the logistic map equation. We then discuss flaws and strengths in widely used random number generators, as well as how to reverse-engineer…

Continue

Added by Vincent Granville on October 30, 2017 at 12:00pm — No Comments

Graph Theory: Six Degrees of Separation Problem

This famous statement -- the six degrees of separation -- claims that there is at most 6 degrees of separation between you and anyone else on Earth. Here we feature a simple algorithm that simulates how we are connected, and indeed confirms the claim. We also explain how it applies to web crawlers: Any web page is connected to any other web page by a path of 6 links at most.

The algorithm below is rudimentary and can be used for simulation purposes by any programmer: It does not even…

Continue

Added by Vincent Granville on October 25, 2017 at 11:43am — No Comments

Great Sunday Reading

Here is our selection of featured articles and resources posted recently:

Continue

Added by Vincent Granville on October 22, 2017 at 5:19pm — No Comments

Three Articles Not To Miss

Non-traditional strategies for mid-career switch to Data Science and AI 

In this post, I explore strategies to switch to Data Science mid-career. This switch is not easy, but based on the experience of many who I have taught/mentored/recruited – it is possible. Most people consider PhD/MooC etc for switching their career to Data Science. But here, I will explore some non-traditional/unorthodox ways of switching to Data Science. …

Continue

Added by Vincent Granville on October 20, 2017 at 12:10pm — No Comments

Book on Computer Programming

Data scientists use a range of tools in their work and some of these eventually require programming. This book, titled The Art and Craft of Computer Programming, is a guide to computer programming. It does not focus on a specific programming language, but instead contains the essential material from a first year Computer Science course. The book is available from Amazon.com.…

Continue

Added by Mark McIlroy on October 19, 2017 at 8:00pm — 1 Comment

100+ Commonly Asked Data Science Interview Questions

Here is a new set of easy questions recently published, covering

  • Statistics
  • Programming (General, Big Data, Python, R, SQL)
  • Modeling
  • Behavioral
  • Culture Fit
  • Problem-Solving

Click here to read these questions (some found on Quora), and some answers.

Many list of questions have been…

Continue

Added by Vincent Granville on October 15, 2017 at 7:30pm — No Comments

Predictive Analytics Takes a Victory Lap

Summary:  Over the last eight years predictive analytics has become a fully mature technology with wide adoption among the largest and most successful companies.  The Advanced Analytic Platforms we have to make our work more effective and efficient also show substantial improvement.

Predictive analytics combines the core disciplines of data science to do the everyday heavy lifting of predicting consumer behavior and forecasting future values, plus a lot of other…

Continue

Added by Vincent Granville on October 15, 2017 at 2:14pm — No Comments

Predictive Analytics Takes a Victory Lap

Summary:  Over the last eight years predictive analytics has become a fully mature technology with wide adoption among the largest and most successful companies.  The Advanced Analytic Platforms we have to make our work more effective and efficient also show substantial improvement.

Predictive analytics combines the core disciplines of data science to do the everyday heavy lifting of predicting consumer behavior and forecasting future values, plus a lot of other…

Continue

Added by Vincent Granville on October 15, 2017 at 2:00pm — No Comments

What Kind of OLAP Do We Really Need?

The narrow-sensed OLAP

OLAP is part and parcel of a BI application. As the name suggests, the word is an acronym for online analytical processing. Users, frontline employees, to be precise, are responsible for performing various types of data processing online.  

But, the concept of OLAP tends to be used in a very narrow sense. It has almost become an equivalence of multidimensional analysis. Based on a prebuilt data cubic, the analysis performs summarization…

Continue

Added by JIANG Buxing on October 9, 2017 at 4:00am — No Comments

Some Thoughts on Mid-Career Switching Into Data Science

Summary:  If you are mid-career and thinking about switching into data science here are some things to think about in planning your journey.

We get lots of inquiries from readers asking for career advice and many of these identify as mid-career looking to switch into data science.  If you’re in this group you face some of the same challenges beginners do but also some that are unique to your circumstance.  Here are some thoughts and observations that may be…

Continue

Added by Vincent Granville on October 6, 2017 at 11:00am — No Comments

Interesting Problem: Self-correcting Random Walks

Section 3 was added on October 11. Section 4 was added on October 19.  A $2,000 award is offered to solve any of the open questions, click here for details

This is another off-the-beaten-path problem, one that you won't find in textbooks. You can solve it using data science methods (my approach) but the mathematician with some…

Continue

Added by Vincent Granville on October 4, 2017 at 2:00pm — 5 Comments

Free Book: Probability and Statistics Cookbook

The format is very similar to a BIG cheat sheet. This cookbook integrates a variety of topics in probability theory and statistics. It is based on literature and in-class material from courses of the statistics department at the University of California in Berkeley but also influenced by other sources . …

Continue

Added by Vincent Granville on October 2, 2017 at 7:30pm — No Comments

Monthly Archives

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

Follow Us

On Data Science Central

On DataViz

On Hadoop

© 2017   AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC   Powered by

Badges  |  Report an Issue  |  Terms of Service