A Data Science Central Community
A business problem which involves predicting future events by extracting patterns in the historical data. Prediction problems are solved using Statistical techniques, mathematical models or machine learning techniques.
For example: Forecasting stock price for the next week, predicting which football team wins the world cup, etc.
While dealing with any prediction problem, the easiest, most widely used yet powerful technique is the Linear Regression. Regression analysis is used for modeling the relationship between a response variable and one or more input variables.
In simpler terms,Regression Analysis helps us to find answers to:
If model deals with one input, called as independent or predictor variable and one output variable, called as dependent or response variable then it is called Simple Linear Regression. In this type of Linear regression, it assumes that there exists a linear relation between predictor and response variable of the form.
In the above equation, β0,β1 are the unknown constants that represent intercept and slop of a straight line which we learned in our high schools. These known constants are known as the model coefficients or parameters. From the above equation, X is the known input variable and if we can estimate β0,β1 by some method then Y can be predicted. In order to predict future outcomes, by using the training data we need to estimate the unknown model parameters (ˆ β0,ˆ β1) using the equation.
ˆy = ˆ β0 + ˆ β1x + ˆe, where ˆ y,ˆ β0,ˆ β1 are the estimates.
If the problem contains more than one input variables and one response variable, then it is called Multiple Linear regression.
Let us apply regression analysis on power plant dataset available from here. The dataset contains 9568 data points collected from a Combined Cycle Power Plant over 6 years (2006-2011), when the power plant was set to work with full load. Features consist of hourly average ambient variables Temperature (T), Ambient Pressure (AP), Relative Humidity (RH) and Exhaust Vacuum (V) to predict the net hourly electrical energy output (EP) of the plant.
please visit here for full blog post with R code.