A Data Science Central Community
Hi All,
Could anyone please explain me...
we are already value of area under curve (c) on logistics regression on SAS then why there is need to plot ROC curce and is there is no need to plot ROC curve...?
if we plot ROC for some cut off value then how do we select the cutoff value if we give command on sas
/ ctable pprob=0.1 to 1
And also how we are going to proceed with that cutoff value for further analysis....
it will be great help to me..
Comment
Thanks Mike,
Ya i know the concepts of ROC curve, but my doubt was only to find the appropriate cut-off when we have 10 different pr cut-off (output of the ctable pprob=0.1 to 1 by 0.1)....which i got by Kevin explanations..
Sandeep, you need to understand that the ROC curve is a plot where the points on the plot are calculated from the counts in the confusion matrix for a given model score cut-off. If you take the output of the ctable pprob=0.1 to 1 by 0.1 then you have the counts of TN TP FN FP that allow you to calculate the x and y coordinates on the roc curve for 10 different pr cut-offs. What you need to understand is what is the cost matrix associated with TN TP FN FP so that you can make decisions about where is the optimal cut-off for your particular problem. From your question, it looks like you need to do some more study to understand what a roc curve represents, and how to use a risk score generated by a logistic regression. Actually a risk score generated by a model (which does not actually have to be a statistical model). Google "Nuts and Bolts of Data Mining: Classifiers & ROC Curves" By Tim Graettinger which is quite a good article that helps understand these concepts.
Thanks a lot....Kevin
Appreciated....these concepts are really very helpful to me..
This is what I have done in the past to get the Youden Index from proc logistic in SAS. After your model statement use this code:
(your code may look slightly different. outroc outputs the specificity and sensitivity)
model dep_var(event='1') = in_var1 in_var2 in_var3 / outroc=rocstats;
Than I have a data step that calculates J:
data Youden;
set rocstats;
_SPECIF_ = (1 - _1MSPEC_);
J = _SENSIT_ + _SPECIF_ - 1;
run;
Then get max J value:
proc means data = Youden max;
var J;
run;
I then use a proc print statement to output the value. Now that you have the cutoff, every score above this value can be classified as a "success". I have used this in the past but I don't use Youden Index anymore, I always end up targeting people in the top 1 - 3 deciles, or depending on what I am modeling I will target deciles 3 - 7. It really depends on what you are modeling.
First of thanks to Jozo and Kevin for responding me....it really helped me a lot...
Now....could you please explain me how to choose a right cut-off..
It is like the the value where sensitivity is equal to specificity (Goods=Bads)?
I usually go straight to putting the scores from regression into deciles and see if I get an even distribution in each decile. I also look to see how many "successes" I get in each decile and how those are distributed from the top decile to bottom decile. From there I will run several models to see which gives me the highest percentage of "successes" in the top deciles, while keeping an even distribution of scores and not over fitting the model.
Like Jozo said, there is statistical method of choosing a cutoff (Youden Index) and a business cutoff.
Shape is important. You care about how "bads" are distributed. It helps you to:
- Verify if you have enough degrees of freedom
- Expain how good your model catches "bads" on both ends of curve
- Choose right cut-off
- Understand if you have caught all "bads" in first 10%, 50% or there are also some in the last decile? That's difference!
There are two cut-offs:
- One of them is statistical - separates goods and bads in then optimal way.
- Another one is business - is man made decision how to apply your model in further applications. Who's accepted who declined. Who targeted by marketing who let in peace. It's art to set this one right. Or hard work beyond data-mining.
Which one are you looking for?
Further reading: http://www.amazon.com/The-Credit-Scoring-Toolkit-Management/dp/0199...
I generally don't find a need to plot the ROC curve, I just care about the c-statistic or area under the curve. Are you asking how you find the Youden Index (cutoff value) from logistic regression in SAS?
© 2021 TechTarget, Inc. Powered by
Badges | Report an Issue | Privacy Policy | Terms of Service
Most Popular Content on DSC
To not miss this type of content in the future, subscribe to our newsletter.
Other popular resources
Archives: 2008-2014 | 2015-2016 | 2017-2019 | Book 1 | Book 2 | More
Most popular articles
You need to be a member of AnalyticBridge to add comments!
Join AnalyticBridge