A Data Science Central Community
Assume we have a binary category output variable Y and a vector of p input variables X. Rather than modeling this response Y directly, logistic regression models the conditional probability, Pr(Y = 1|X = x) as a function of x, that Y belongs to a particular category.
Mathematically, logistic regression is expressed as:
The unknown parameters, β0/ β1, in the function are estimated by maximum likelihood method using available input training data. The Maximum likelihood function expresses the probability of the observed data as a function of the unknown parameters. The maximum likelihood estimators of these parameters are chosen to be those values that maximize this function. Thus, the estimators are those which agree most closely with the observed data.
For now we assume that solving the above equation can be used to estimate the unknown parameters.
In R, we glm() which takes training data as input and gives us the fitted model with estimated parameters as output, which we will see in the later section.
Once the coefficients have been estimated, it is a simple matter to compute the probability of response variable for any given input values by putting values of β0/ β1/X in the below equation.
Note: we have predict() in R which takes fitted model, input parameters as input values to predict the response variables.
For code implementation see here