A Data Science Central Community

All of us at some point in the process of examining data, check for correlations among different variables in the data especially pair-wise correlations.

Among a large chunk of business analysts in industry, there exists a notion of **‘linear correlation coefficient’** being the only criterion for pair-wise correlation and hence at the maximum a **Proc Corr** is run in SAS to check for the same. This will of course be useful for finding out correlations **between continuous variables**. However it more often than not **fails when confronted with real life data** which frequently contains **all kinds of variables, continuous, binary or multi-level categorical** etc.

- In a scenario where you are trying to find
**out correlation between continuous variables**,**Proc Corr**is a good choice, because it simply gives you linear correlation coefficients. - Now when you are looking at correlation
**between a binary variable and a continuous variable**, your idea of correlation needs a little change in perspective. Simple linear correlation coefficient is rendered meaningless here, because one is not really dealing with meaningful numbers now, but categories. In many datasets you would observe that these categories have been given some numbers, but don’t confuse them with real numeric variables, they are just represented using numbers. They very well could have been given some other numbers, changing the value of the linear correlation coefficient, if one was using the same to assess correlation in this case. How do you go about working around this problem then?

Read the full article and get the solution here: http://www.edvancer.in/sas-tutorial-proc-corr/

Let us know your thoughts.

Edvancer is an online analytics training institute and offers courses across the field of analytics. Learn more at www.edvancer.in

© 2020 TechTarget, Inc. Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of AnalyticBridge to add comments!

Join AnalyticBridge