A Data Science Central Community

In my previous blog I have explained about linear regression. In today’s post I will explain about logistic regression.

Consider a scenario where we need to predict a medical condition of a patient (HBP) ,HAVE HIGH BP or NO HIGH BP, based on some observed symptoms – Age, weight, Issmoking, Systolic value, Diastolic value, RACE, etc.. In this scenario we have to build a model which takes the above mentioned symptoms as input values and HBP as response variable. Note that the response variable (HBP) is a value among a fixed set of classes, HAVE HIGH BP or NO HIGH BP.

### Logistic regression – a classification problem, not a prediction problem:

In my previous blog I told that we use linear regression for scenarios which involves prediction. But there is a check; the regression analysis cannot be applied in scenarios where the response variable is not continuous. In our case the response variable is not a continuous variable but a value among a fixed set of classes. We call such scenarios as Classification problem rather than prediction problem. In such scenarios where the response variables are more of qualitative nature rather than continuous nature, we have to apply more suitable models namely logistic regression for classification.

Consider a scenario where we need to predict a medical condition of a patient (HBP) ,HAVE HIGH BP or NO HIGH BP, based on some observed symptoms – Age, weight, Issmoking, Systolic value, Diastolic value, RACE, etc.. In this scenario we have to build a model which takes the above mentioned symptoms as input values and HBP as response variable. Note that the response variable (HBP) is a value among a fixed set of classes, HAVE HIGH BP or NO HIGH BP.

Assume we have a binary category output variable Y and a vector of p input variables X. Rather than modeling this response Y directly, logistic regression models the conditional probability, **Pr(Y = 1|X = x)** as a function of x, that Y belongs to a particular category.

Mathematically, logistic regression is expressed as:

The unknown parameters, β0/ β1, in the function are estimated by maximum likelihood method using available input training data. The Maximum likelihood function expresses the probability of the observed data as a function of the unknown parameters. The maximum likelihood estimators of these parameters are chosen to be those values that maximize this function. Thus, the estimators are those which agree most closely with the observed data.

For now we assume that solving the above equation can be used to estimate the unknown parameters.

In R, we glm() which takes training data as input and gives us the fitted model with estimated parameters as output, which we will see in the later section.

Once the coefficients have been estimated, it is a simple matter to compute the probability of response variable for any given input values by putting values of β0/ β1/X in the below equation.

*Note: we have predict() in R which takes fitted model, input parameters as input values to predict the response variables.*

For code implementation see here

© 2020 TechTarget, Inc. Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of AnalyticBridge to add comments!

Join AnalyticBridge