A Data Science Central Community

I published a post about the current status of "Data Scientist" in Japan, as a periodic follow-up analysis since two years ago. Its trend still remains, but it's beyond my anticipation at that time.

Indeed growing trend of "Artificial Intelligence" in Japan is steeper than that in English, and "Data Scientist" is now getting to be…

ContinueAdded by Takashi J. OZAKI on January 13, 2017 at 6:30am — 1 Comment

Below is my personal list of statistical and machine learning methods that every data scientist should know in 2016.

**Statistical Hypothesis Testing (t-test, chi-squared test & ANOVA)****Multiple Regression (Linear Models)****General Linear Models (GLM: Logistic Regression, Poisson Regression)****Random Forest****Xgboost (eXtreme Gradient Boosted Trees)****Deep Learning****Bayesian Modeling with…**

Added by Takashi J. OZAKI on January 8, 2017 at 6:30am — 1 Comment

Actually I've known about MXnet for weeks as one of the most popular library / packages in Kaggler, but just recently I heard bug fix has been almost done and some friends say the latest version looks stable, so at last I installed it.

MXnet: https://github.com/dmlc/mxnet

I think that the most important feature…

ContinueAdded by Takashi J. OZAKI on March 30, 2016 at 8:30am — No Comments

I wrote a blog post inspired by Jamie Goode's book "Wine Science: The Application of Science in Winemaking".

In this book, Goode argued that reductionistic approach cannot explain relationship between chemical ingredients and taste of wine. Indeed, we know not all high (alcohol) wines are excellent, although in general high wines are believed to be good. Usually taste of wine is affected by a complicated balance of many components such as sweetness, acid, tannin,…

ContinueAdded by Takashi J. OZAKI on November 26, 2015 at 8:43am — No Comments

I wrote a series of blog posts on Bayesian modeling with R and Stan.

- Bayesian modeling with R and Stan (1): Overview
- Bayesian modeling with R and Stan (2): Installation and an easy example…

Added by Takashi J. OZAKI on August 17, 2015 at 11:14pm — No Comments

A/B testing is widely used for online marketing, management of Internet ads or any other usual analytics. In general, people use it in order to look for "golden features (metrics)" that are vital points for growth hacking. To validate A/B testing, statistical hypothesis tests such as t-test are used and people are trying to find any metric with a significant effect across conditions. If you successfully find a metric with a significant difference between design A and B of a click button,…

ContinueAdded by Takashi J. OZAKI on June 18, 2015 at 9:00am — No Comments

In my own blog I wrote a series of articles about how major machine learning classifiers work, with some visualization of their decision boundaries on various datasets.

- Machine learning for package users with R (0): Prologue
- Machine learning for package users with R (1): Decision Tree…

Added by Takashi J. OZAKI on June 5, 2015 at 4:00am — No Comments

As a part of a series of posts discussing how a machine learning classifier works, I ran decision tree to classify a XY-plane, trained with XOR patterns or linearly separable patterns.

**1. Simple (non-overlapped) XOR pattern**

It worked well. Its decision boundary was drawn almost perfectly parallel to the assumed true…

ContinueAdded by Takashi J. OZAKI on March 22, 2015 at 10:00pm — No Comments

In the latest post of my own blog, I argued about how to learn how each machine learning classifier works visually. My idea is that first I prepare samples for training and then I show its assumed true boundary, and finally decision boundary estimated by the classifier with a dense grid covering over the space as test dataset and the assumed boundary are compared.

In the case below, the assumed true boundary of the space is a set of 3 parallel lines; I think everybody will guess so…

ContinueAdded by Takashi J. OZAKI on March 13, 2015 at 3:00am — No Comments

In order to evaluate how Deep Belief Net (Deep Learning) of {h2o} works on actual datasets, I applied it to MNIST dataset; but I got the dataset from a Kaggle competition on MNIST so consequently I joined the competition. :P)

As well known, classification tasks such as for MNIST should be…

ContinueAdded by Takashi J. OZAKI on February 24, 2015 at 10:30pm — No Comments

Currently I'm concerned about incredible overheating of "Artificial Intelligence" boom in Japan - while "Data Scientist" has gone.

Google Trends shows Japanese people are getting just less attracted by statistics that is believed to be expertise of Data Scientist, and now they are enthusiastic about Artificial Intelligence. I feel this situation looks much puzzling.

So, what's going on in 2015?... yes, I think not a few data science experts in Japan must agree that "Artificial…

ContinueAdded by Takashi J. OZAKI on February 18, 2015 at 8:00am — No Comments

Below is the latest post (and the first post in these 10 months...) of my blog.

*What kind of decision boundaries does Deep Learning (Deep Belief Net) draw? Practice with R and {h2o} package*

Once I wrote a post about a relationship between features of machine learning classifiers and their decision boundaries on the same dataset. The result was much interesting and many people looked to enjoy and even argued about it.

Actually I've been looking for similar…

ContinueAdded by Takashi J. OZAKI on February 15, 2015 at 3:30am — No Comments

10 questions about big data and data science

raised by Dr. Vincent Granville are very interesting I feel.

I have to say I'm never any leader of big data or data science in Japan -- but I'm afraid nobody will answer. So as a personal opinion, I…

ContinueAdded by Takashi J. OZAKI on February 17, 2014 at 7:10am — No Comments

I've seen a dichotomy between "analytics" vs. "data science" (or "statistics) in several teams of web marketing, because people may feel analytics is simple, fast and work well while statistics is hard to learn, complicated and time-consuming.

In the latest post, I argued about the dichotomy from a viewpoint of analytic accuracy and pointed out a pitfall of simple and fast analytics.…

ContinueAdded by Takashi J. OZAKI on February 6, 2014 at 4:00am — No Comments

These days people seem to believe that "growth hacking" must be immediate and rapid. Partly it's true; but I'm afraid partly it's not, because iterative short-term and intensive growth hacking may fall into a pitfall of "regression to the mean".

http://tjo-en.hatenablog.com/entry/2014/01/22/183243

This post is a translation from the Japanese-version of my blog, but I believe this point of view…

ContinueAdded by Takashi J. OZAKI on January 22, 2014 at 6:00am — No Comments

One of motivations I write the English-version of my blog is reporting and sharing the latest topics of Japanese market in data science and related fields. In the latest post, I wrote about a puzzling situation of "Data Scientist" in Japanese market.

http://tjo-en.hatenablog.com/entry/2014/01/13/141432

Although Google Trends shows a fever of "data scientist" in English still rises, the same one…

ContinueAdded by Takashi J. OZAKI on January 12, 2014 at 10:20pm — 2 Comments

In English-version of my personal blog, I posted an article about how we should understand a nature of each machine learning classifier; my solution is "just looking at a hyperplane of each".

http://tjo-en.hatenablog.com/entry/2014/01/06/234155

For package-users, not serious experts in machine learning and its scientific basis, visualized features (i.e. hyperplanes on 2D space) would be much…

ContinueAdded by Takashi J. OZAKI on January 6, 2014 at 7:59am — No Comments

- In Japan, "Artificial Intelligence" comes to be a super star while "Data Scientist" is fading away
- 12 Statistical and Machine Learning Methods that Every Data Scientist Should Know
- Overview and simple trial of Convolutional Neural Network with MXnet
- Multivariate modeling vs. univariate modeling along human intuition: predicting taste of wine
- R and Stan: introduction to Bayesian modeling
- Even without any "golden feature", multivariate modeling can work
- Overfitting or generalized? Comparison of ML classifiers - a series of articles

- 12 Statistical and Machine Learning Methods that Every Data Scientist Should Know
- Overfitting or generalized? Comparison of ML classifiers - a series of articles
- R and Stan: introduction to Bayesian modeling
- In Japan, "Artificial Intelligence" comes to be a super star while "Data Scientist" is fading away
- Experiments of Deep Learning with {h2o} package on R
- Overheating of "Artificial Intelligence" boom in Japan, while "Data Scientist" is fading out
- Comparing machine learning classifiers based on their hyperplanes, for "package-users"

© 2019 AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Statistics -- New Foundations, Toolbox, and Machine Learning Recipes
- Book: Classification and Regression In a Weekend - With Python
- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions