A Data Science Central Community

Practicing Data science indeed a long term effort than a learning handful of skills. We ought to be academically good enough to take up this challenge. However, if you think you came a long way from your academic rebuilding, but you still have that zeal & passion to take the oil from the data and fill the skill gap of data science then here is the** warm-up** tips. Below points must **exercised **before jumping into any data science & data mining problems:

Not all datasets are in the form of a data matrix. For instance, more complex datasets can be in the form of sequences, text, time-series, images, audio, video, and so on, which may need special techniques for analysis. However, in many cases even if the raw data is not a data matrix it can usually be transformed into that form via feature extraction. A practical example of feature example is explained in my last post on scikit-learn library.

- Number of attributes defines the dimensionality of the data matrix. Save the dimensionality in mind when you think of any matrix operations.
- Each row may be considered as a d-dimensional column vector (all vectors are assumed to be column vectors by default). You must also understand the term row space and column space.
- Treating data instances and attributes as vectors, and the entire dataset as a matrix, enables one to apply both geometric and algebraic methods to aid in the data mining and analysis tasks. At least you must aware about unit vector, identity matrix etc..
- Clear dust from your school learning about matrix manipulation i.e. matrix addition, multiplication, transpose, inverse etc. Similar applies to some of the algebraic equation like distance between two points,
*Pythagorean theorem*—or*Pythagoras*'*theorem etc..* - Through understanding on matrix manipulation will help you to implement multiplication and summation of elements.
- Leaving probability is probably not a good idea. Run through some short probability problems & exercise before you go in detail of any supervised learning models.
- You may need to practice on the topics that you mightily left during schools like:
*Orthogonal projection of vector*(projecting a vector to another vector),*Probabilistic view of the data, Probability density function*. (i admit to avoid these topics during graduations :) ) - Relax yourself with all the formula of descriptive statistical analysis. From Mean, median, mode to normal distribution, standard deviation, skewness and most importantly don't forget to cover-up Variance and standard deviation. You should be ready with basic statistical analysis of univariate & multivariate numeric data. Believe me distance finding methodologies change due to distribution of the data. (Using Euclidean distance score when data is normally distributed otherwise Pearson coefficient score)
- Generalization, Correlation & regression concepts are widely used across statistics and mathematical modeling. So this must be broadly rehearsed before you go into modeling techniques.
- You must do some exercise on how to normalize vector. Vector normalization is the must-to-know concept in prediction algorithms.

" In fact, data mining is part of a larger knowledge discovery process, which includes pre-processing tasks like data extraction, data cleaning, data fusion, data reduction and feature construction. As well as post-processing steps like pattern and model interpretation, hypothesis confirmation and generation, and so on. This knowledge discovery and data mining process tends to be highly iterative and interactive. "

**CRUX**: The algebraic, geometric & probabilistic viewpoints of data play a key role in data mining. You should exercise them beforehand. So you can easily sail though your boat in Data Science !

Original post: http://datumengineering.wordpress.com/2013/10/18/warm-up-exercise-b...

© 2019 AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

**Technical**

- Free Books and Resources for DSC Members
- Learn Machine Learning Coding Basics in a weekend
- New Machine Learning Cheat Sheet | Old one
- Advanced Machine Learning with Basic Excel
- 12 Algorithms Every Data Scientist Should Know
- Hitchhiker's Guide to Data Science, Machine Learning, R, Python
- Visualizations: Comparing Tableau, SPSS, R, Excel, Matlab, JS, Pyth...
- How to Automatically Determine the Number of Clusters in your Data
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- Fast Combinatorial Feature Selection with New Definition of Predict...
- 10 types of regressions. Which one to use?
- 40 Techniques Used by Data Scientists
- 15 Deep Learning Tutorials
- R: a survival guide to data science with R

**Non Technical**

- Advanced Analytic Platforms - Incumbents Fall - Challengers Rise
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- How to Become a Data Scientist - On your own
- 16 analytic disciplines compared to data science
- Six categories of Data Scientists
- 21 data science systems used by Amazon to operate its business
- 24 Uses of Statistical Modeling
- 33 unusual problems that can be solved with data science
- 22 Differences Between Junior and Senior Data Scientists
- Why You Should be a Data Science Generalist - and How to Become One
- Becoming a Billionaire Data Scientist vs Struggling to Get a $100k Job
- Why do people with no experience want to become data scientists?

**Articles from top bloggers**

- Kirk Borne | Stephanie Glen | Vincent Granville
- Ajit Jaokar | Ronald van Loon | Bernard Marr
- Steve Miller | Bill Schmarzo | Bill Vorhies

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives**: 2008-2014 | 2015-2016 | 2017-2019 | Book 1 | Book 2 | More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of AnalyticBridge to add comments!

Join AnalyticBridge