R-squared for Decision Tree - AnalyticBridge2020-12-04T17:51:05Zhttps://www.analyticbridge.datasciencecentral.com/forum/topics/rsquared-for-decision-tree?utm_content=buffera7f80&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer&feed=yes&xn_auth=noThanks John!
Very insightful…tag:www.analyticbridge.datasciencecentral.com,2010-01-11:2004291:Comment:589912010-01-11T15:46:04.699ZJeffhttps://www.analyticbridge.datasciencecentral.com/profile/Jeff710
Thanks John!<br />
<br />
Very insightful. I had a pretty large hold out group for the validation partition, but I understand the dangers - its did seem too good to be true :)<br />
<br />
I'll check out KXEN for this type of data. I have in the past modeled the zeros via logistic regression and the >0 part as a gamma dist with log link (generalized linear model). Then put them together as a "zero inflated gamma" - using SAS NLMIXED proc.
Thanks John!<br />
<br />
Very insightful. I had a pretty large hold out group for the validation partition, but I understand the dangers - its did seem too good to be true :)<br />
<br />
I'll check out KXEN for this type of data. I have in the past modeled the zeros via logistic regression and the >0 part as a gamma dist with log link (generalized linear model). Then put them together as a "zero inflated gamma" - using SAS NLMIXED proc. I use the methodology you spe…tag:www.analyticbridge.datasciencecentral.com,2010-01-11:2004291:Comment:589892010-01-11T15:39:38.805ZJohn Ginshttps://www.analyticbridge.datasciencecentral.com/profile/JohnGins
I use the methodology you speak of all the time. I was the original programer for Breiman and Stone's version of CART in the late 70's which is where I believe I was first introduced to that method. However we were very careful to use the term variation explained since there is little relationship to the theoretical Pearson "r". (Multiply by 100 to get Percent Variation explained.)<br />
Be aware that this value can go negative. Which implies that parts of your model behave a lot higher variation…
I use the methodology you speak of all the time. I was the original programer for Breiman and Stone's version of CART in the late 70's which is where I believe I was first introduced to that method. However we were very careful to use the term variation explained since there is little relationship to the theoretical Pearson "r". (Multiply by 100 to get Percent Variation explained.)<br />
Be aware that this value can go negative. Which implies that parts of your model behave a lot higher variation then the population variance.<br />
I would use this "statistic" only as a means to compare outcome of different models. Built on the same population base.<br />
In my experience a percent variation explained as high as you have usually implies the model is "too good to be true" you might want to take only a random exclude a large subset of your zero sales data and see what changes if you model what is left. you might need to run two models one to predict a zero or non-zero outcome and take the results that are predicted to be non zero and model those seperately.<br />
Other modeling tools like KXEN K2R usually handle that type of underlying data structure pretty well.<br />
John Gins I will reply to my own questi…tag:www.analyticbridge.datasciencecentral.com,2010-01-06:2004291:Comment:588702010-01-06T20:27:47.154ZJeffhttps://www.analyticbridge.datasciencecentral.com/profile/Jeff710
I will reply to my own question. I did find the following discussion where it seems that there is disagreement on this practice. One professor advocates using the normal R-squared formula and another suggests other sources / methods.<br />
<br />
If anyone has an opinion, I would love to hear..<br />
<br />
<a href="http://www.mail-archive.com/r-help@stat.math.ethz.ch/msg86041.html" target="_blank">http://www.mail-archive.com/r-help@stat.math.ethz.ch/msg86041.html</a>
I will reply to my own question. I did find the following discussion where it seems that there is disagreement on this practice. One professor advocates using the normal R-squared formula and another suggests other sources / methods.<br />
<br />
If anyone has an opinion, I would love to hear..<br />
<br />
<a href="http://www.mail-archive.com/r-help@stat.math.ethz.ch/msg86041.html" target="_blank">http://www.mail-archive.com/r-help@stat.math.ethz.ch/msg86041.html</a>