A Data Science Central Community

There are lots of topics floating around in the space of data analysis like statistical modeling, predictive modeling. There have always been questions in mind which technique to choose? which is preferred way for data analysis? Some articles and lecture highlight machine learning or mathematical model over statistics modeling limitations. They mention mathematical modeling as a next step of accuracy and prediction. This kind of articles create more questions in mind of naive user.

Finally, i would thank to coursera.org for zero down this confusion and stating a clear picture of Data Analysis drivers. Now, things are pretty clear in terms of How to proceed on data analysis? Rather, defining “DATA ANALYSIS DRIVERS”. In one liner the answer is simple “**Define a question or problem**“. So, all depend upon how you define the problem.

To start with data analysis drivers here are steps in a data analysis

*Define the question**Define the ideal data set**Determine what data you can access**Obtain the data**Clean the data**Exploratory data analysis**Statistical prediction/modeling**Interpret results**Challenge results**Synthesize/write up results**Create reproducible code*

- Defining the question means how the business problem has stated and how you proceed on story telling on this problem. Story telling on the problem will take you to the structuring the solution. So you should be good in story telling on the problem statement.
- Defining the solution will help you to prepare the data (data set) for the solution.
- Profile the source to identify what data you can access.
- Next step is cleansing the data.
- Now, once the data is cleansed it is either in one of the following standard: txt, csv, xml/html, json and database.
- Based on the solution need we start building the model. Precisely, the solution will have requirement of Descriptive analysis, Inferential analysis or predictive analysis.

Henceforth, The data set and model may depend on your goal:

- Descriptive – a whole population.
- Exploratory – a random sample with many variables measured.
- Inferential – the right population, randomly sampled.
- Predictive – a training and test data set from the same population.
- Causal – data from a randomized study.
- Mechanistic – data about all components of the system.

From here knowledge on statistics, machine learning and mathematical algorithm works

Re-post from my original article on WordPress: http://datumengineering.wordpress.com/2013/02/11/data-analysis-driver/

© 2020 TechTarget, Inc. Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of AnalyticBridge to add comments!

Join AnalyticBridge