Subscribe to DSC Newsletter

Takashi J. OZAKI's Blog (17)

In Japan, "Artificial Intelligence" comes to be a super star while "Data Scientist" is fading away

I published a post about the current status of "Data Scientist" in Japan, as a periodic follow-up analysis since two years ago. Its trend still remains, but it's beyond my anticipation at that time.

Indeed growing trend of "Artificial Intelligence" in Japan is steeper…

Continue

Added by Takashi J. OZAKI on January 13, 2017 at 6:30am — 1 Comment

12 Statistical and Machine Learning Methods that Every Data Scientist Should Know

Below is my personal list of statistical and machine learning methods that every data scientist should know in 2016.

  1. Statistical Hypothesis Testing (t-test, chi-squared test & ANOVA)
  2. Multiple Regression (Linear Models)
  3. General Linear Models (GLM: Logistic Regression, Poisson Regression)
  4. Random Forest
  5. Xgboost (eXtreme Gradient Boosted Trees)
  6. Deep Learning
  7. Bayesian Modeling with…
Continue

Added by Takashi J. OZAKI on January 8, 2017 at 6:30am — 1 Comment

Overview and simple trial of Convolutional Neural Network with MXnet

Actually I've known about MXnet for weeks as one of the most popular library / packages in Kaggler, but just recently I heard bug fix has been almost done and some friends say the latest version looks stable, so at last I installed it.

MXnet: https://github.com/dmlc/mxnet…

Continue

Added by Takashi J. OZAKI on March 30, 2016 at 8:30am — No Comments

Multivariate modeling vs. univariate modeling along human intuition: predicting taste of wine

I wrote a blog post inspired by Jamie Goode's book "Wine Science: The Application of Science in Winemaking".

In this book, Goode argued that reductionistic approach cannot explain relationship between chemical ingredients and taste of wine. Indeed, we know not all high (alcohol) wines are excellent, although in general high wines are believed to be good. Usually taste of wine is affected by a complicated balance of many components such as sweetness, acid, tannin,…

Continue

Added by Takashi J. OZAKI on November 26, 2015 at 8:43am — No Comments

Even without any "golden feature", multivariate modeling can work

A/B testing is widely used for online marketing, management of Internet ads or any other usual analytics. In general, people use it in order to look for "golden features (metrics)" that are vital points for growth hacking. To validate A/B testing, statistical hypothesis tests such as t-test are used and people are trying to find any metric with a significant effect across conditions. If you successfully find a metric with a significant difference between design A and B of a click button,…

Continue

Added by Takashi J. OZAKI on June 18, 2015 at 9:00am — No Comments

Overfitting or generalized? Comparison of ML classifiers - a series of articles

In my own blog I wrote a series of articles about how major machine learning classifiers work, with some visualization of their decision boundaries on various datasets.

Continue

Added by Takashi J. OZAKI on June 5, 2015 at 4:00am — No Comments

Decision tree vs. linearly separable or non-separable pattern

As a part of a series of posts discussing how a machine learning classifier works, I ran decision tree to classify a XY-plane, trained with XOR patterns or linearly separable patterns.

1. Simple (non-overlapped) XOR pattern

It worked well. Its decision…

Continue

Added by Takashi J. OZAKI on March 22, 2015 at 10:00pm — No Comments

Learn how each ML classifier works: decision boundary vs. assumed true boundary

In the latest post of my own blog, I argued about how to learn how each machine learning classifier works visually. My idea is that first I prepare samples for training and then I show its assumed true boundary, and finally decision boundary estimated by the classifier with a dense grid covering over the space as test dataset and the assumed boundary are compared.

In the case below, the assumed true boundary of the space is a set of 3 parallel lines; I think everybody will guess so…

Continue

Added by Takashi J. OZAKI on March 13, 2015 at 3:00am — No Comments

Deep Belief Net with {h2o} on MNIST and its Kaggle competition

In order to evaluate how Deep Belief Net (Deep Learning) of {h2o} works on actual datasets, I applied it to MNIST dataset; but I got the dataset from a Kaggle competition on MNIST so consequently I joined the competition. :P)…

Continue

Added by Takashi J. OZAKI on February 24, 2015 at 10:30pm — No Comments

Overheating of "Artificial Intelligence" boom in Japan, while "Data Scientist" is fading out

Currently I'm concerned about incredible overheating of "Artificial Intelligence" boom in Japan - while "Data Scientist" has gone.

Google Trends shows Japanese people are getting just less attracted by statistics that is believed to be expertise of Data Scientist, and now they are enthusiastic about Artificial Intelligence. I feel this situation looks much puzzling.

So, what's going on in 2015?... yes, I think not a few data science experts in Japan must agree that "Artificial…

Continue

Added by Takashi J. OZAKI on February 18, 2015 at 8:00am — No Comments

Experiments of Deep Learning with {h2o} package on R

Below is the latest post (and the first post in these 10 months...) of my blog.

What kind of decision boundaries does Deep Learning (Deep Belief Net) draw? Practice with R and {h2o} package

Once I wrote a post about a relationship between features of machine learning classifiers and their decision boundaries on the same dataset. The result was much interesting and many people looked to enjoy and even argued about it.

Actually I've been looking for similar…

Continue

Added by Takashi J. OZAKI on February 15, 2015 at 3:30am — No Comments

Answers to "10 questions about big data and data science" from Japan

10 questions about big data and data science

http://www.datasciencecentral.com/profiles/blogs/participate-in-our-big-data-survey-interview-questions

raised by Dr. Vincent Granville are very interesting I feel. 

I have to say I'm never any leader of big data or data science in Japan -- but I'm afraid nobody will answer. So as a personal opinion, I…

Continue

Added by Takashi J. OZAKI on February 17, 2014 at 7:10am — No Comments

Simple analytics work fast, but cannot avoid third-party effects: why don't you try multivariate statistics?

I've seen a dichotomy between "analytics" vs. "data science" (or "statistics) in several teams of web marketing, because people may feel analytics is simple, fast and work well while statistics is hard to learn, complicated and time-consuming.

In the latest post, I argued about the dichotomy from a viewpoint of analytic accuracy and pointed out a pitfall of simple and fast analytics.…

Continue

Added by Takashi J. OZAKI on February 6, 2014 at 4:00am — No Comments

Pitfall of "regression to the mean" in growth hacking

These days people seem to believe that "growth hacking" must be immediate and rapid. Partly it's true; but I'm afraid partly it's not, because iterative short-term and intensive growth hacking may fall into a pitfall of "regression to the mean".

http://tjo-en.hatenablog.com/entry/2014/01/22/183243

This post is a translation from the Japanese-version of my blog, but I believe this point of view…

Continue

Added by Takashi J. OZAKI on January 22, 2014 at 6:00am — No Comments

A report from Japan: Puzzling situation of "Data Scientist" in Japanese market

One of motivations I write the English-version of my blog is reporting and sharing the latest topics of Japanese market in data science and related fields. In the latest post, I wrote about a puzzling situation of "Data Scientist" in Japanese market.

http://tjo-en.hatenablog.com/entry/2014/01/13/141432

Although Google Trends shows a fever of "data scientist" in English still rises, the same one…

Continue

Added by Takashi J. OZAKI on January 12, 2014 at 10:20pm — 2 Comments

Comparing machine learning classifiers based on their hyperplanes, for "package-users"

In English-version of my personal blog, I posted an article about how we should understand a nature of each machine learning classifier; my solution is "just looking at a hyperplane of each".

 

http://tjo-en.hatenablog.com/entry/2014/01/06/234155

 

For package-users, not serious experts in machine learning and its scientific basis, visualized features (i.e. hyperplanes on 2D space) would be much…

Continue

Added by Takashi J. OZAKI on January 6, 2014 at 7:59am — No Comments

Follow Us

On Data Science Central

On DataViz

On Hadoop

© 2017   AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC   Powered by

Badges  |  Report an Issue  |  Terms of Service