Subscribe to DSC Newsletter

Hi,

I am using SGD for solving Logistic and Linear Regression problems in big data tool Mahout. Although I get the model equation and decent model metrics (AUC, R Square etc), I do not get any clue on a coefficient's statistical significance (t-value for Linear Regression and Wald's Chi Square value for Logistic regression). In soft wares like SAS and R, we get these without much fuss. How do we calculate these values?

Thanks,
Ratheen

Views: 1834

Reply to This

Replies to This Discussion

I believe that Mahout is a bit weak on Linear Regression, and does not use a logit model for logistic regression.  I think it is uses another optimized iterative learning algorithm, rather than maximum likelihood.

A good person to pose this question to would be Sean Owen, who is very active on Quora.

Dear Ratheen,

Check out the RMS package in R (http://cran.r-project.org/web/packages/rms/index.html).  The author, Frank Harrell, is a renowned Biostatiscian who has been programming stats routines for many years.

Regards,

Kevin Gray

Homepage: www.cannongray.com

One assumption that Mahout may be making is that with big data virtually all variables will be significant so statistical significance is not the best measure here.

RSS

On Data Science Central

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service