AnalyticBridge

A Data Science Central Community

How to normalise data

Dear all,
Good evening. I do have a data with four independent variables which are skewed. I try to normalise them with log normal but I could not normalised two of them. When I am applying regression model I'm getting insignificant P - value for one of variable which implies that it is not contributing much to build the model. But, I can't neglect it as logically that variable is important as other variables are. Is there any way to normalised data apart from removing the outline? Also, I'd tried by taking square root and inverse of variables. I'm hopping that I can get some good solution to my problem.

Thanks and regards,
Sagar.

Views: 4259

Replies to This Discussion

We can simply normalize data by subtracting it from mean & dividing that term by standard deviation.

(x1-mean(x1,x2,..xn))/sd(x1,x2,....xn)

I hope, it will be useful to you.

Vivek

Hi Vivek,

Good evening. Firstly I thank you for your revert. But let me tell, standardisation of data will not reduce skewness. Let me give you little brief on my efforts. I'd try natural log, log to base 10, square root, inverse, Box Cox normalisation method. But I fail to make it normalise for a variable.

If you'll have any other method or suggestions then I welcome.

Have a pleasant evening.

Hi Sagar,

I have got your concern. Could you please share sample data on which normalization needed. I'll also try from my end & will share you the approach.

regards,

Vivek

Dear Vivek,

Good afternoon. I hope you are doing well. Thank you for your reply, but I'm sorry that I can't share data with anyone due to confidentiality. If I could do that it would have solve my problem.

Once again thank you for your support.

Warm regards,
Sagar.

Hi Sagar,

If I am correct, your main objective is to transform non-normal data to normal. You can use Johnson transformation.

Shahid

Thank you for your revert. I'm not aware Johnson transformation but I'll try. Once again thank you so much.

Hi Sagar,

I'd like to help with your question but want to get some background information first.  I assume you are trying to satisfy the requirement of normality of errors in your regression model?

How are you determining that your independent variables are skewed?  What measure or method are you using?

Try plotting the residuals of the independent variable and the dependent variable and see what the pattern or shape looks like.  That can help you determine what transformation you need to use on your data.  Normalizing your data wont hurt either, it will give you more robust coefficient estimates but will change the interpretation slightly.

Let us know how it works!

Kevin

Hi Kevin,

Good evening. Thank you for your revert. I'm using excel so I'm dependent on descriptive statistic. I could not analyse that since I'm lacking with statistical toils.

I'm trying to normalised data to make decision about regression model. Because in excel normality of error is bit difficult task.

Box Cox transformation, apply boxcox to see the closest transformation power you can apply to make the relationship linear, if first difference and log normalization did not work.

Hi Yo Mama,

Good evening. Thank you for your revert. I'd already tried that it is not worth, since it could not normalised data.

Hello Sagar,

I would suggest bootstrap and/or jackknife methods (in general, resampling is useful in highly non-normal data)

Hi Sagar

I would have taken the approached Mortal Kolle mentioned. However my starting point would also have been Box Cox, but as you mentioned this was unsuccessful.

Regards

Daniel