Modeling Rare Events - AnalyticBridge2020-10-26T14:14:52Zhttps://www.analyticbridge.datasciencecentral.com/forum/topics/modeling-rare-events-2?groupUrl=analyticaltechniques&xg_source=activity&feed=yes&xn_auth=noHi，Manish,
I'd like to recomm…tag:www.analyticbridge.datasciencecentral.com,2012-06-27:2004291:Comment:1990592012-06-27T03:46:09.877ZDaisy Dinghttps://www.analyticbridge.datasciencecentral.com/profile/DaisyDing
<p>Hi，Manish,</p>
<p>I'd like to recommend you take a look at esProc, a script for complicated data processing. It is the an unconventional technique without modeling,just with a new computing modes of step-by-step. further more, the data-analytic operation doesn't need high skilled technology background, and it brings great convenience for the analyst to get the instant answers and realize their real-time ideas without calling for data scientists' help.</p>
<p>I'm wondering if the technique…</p>
<p>Hi，Manish,</p>
<p>I'd like to recommend you take a look at esProc, a script for complicated data processing. It is the an unconventional technique without modeling,just with a new computing modes of step-by-step. further more, the data-analytic operation doesn't need high skilled technology background, and it brings great convenience for the analyst to get the instant answers and realize their real-time ideas without calling for data scientists' help.</p>
<p>I'm wondering if the technique could help you, if so, you could download it at <a href="http://www.esproc.com" target="_blank">www.esproc.com</a></p>
<p>Looking forward to hearing from you.</p>
<p>Kindly<br/>Daisy</p> Hello,
I'd like to know if yo…tag:www.analyticbridge.datasciencecentral.com,2010-03-04:2004291:Comment:622152010-03-04T17:54:20.212ZAnna Frenihttps://www.analyticbridge.datasciencecentral.com/profile/AnnaFreni
Hello,<br />
I'd like to know if you can point me to<br />
publications or papers where I can learn more, about the issue.<br />
<br />
Many thanks<br />
<br />
Anna
Hello,<br />
I'd like to know if you can point me to<br />
publications or papers where I can learn more, about the issue.<br />
<br />
Many thanks<br />
<br />
Anna A bit late to the table...One…tag:www.analyticbridge.datasciencecentral.com,2009-12-28:2004291:Comment:585062009-12-28T12:51:29.226ZKenneth Kennedyhttps://www.analyticbridge.datasciencecentral.com/profile/KennethKennedy
A bit late to the table...One-class classification aka novelty detection might be worth a look at. There is a package called DD_Tools for Matlab, it contains a number of one-class classifiers including SVDD. 2 researchers active in this area are David J. Tax (<a href="http://ict.ewi.tudelft.nl/~davidt/oneclass.html" target="_blank">http://ict.ewi.tudelft.nl/~davidt/oneclass.html</a>) and Nathalie Japkowicz.
A bit late to the table...One-class classification aka novelty detection might be worth a look at. There is a package called DD_Tools for Matlab, it contains a number of one-class classifiers including SVDD. 2 researchers active in this area are David J. Tax (<a href="http://ict.ewi.tudelft.nl/~davidt/oneclass.html" target="_blank">http://ict.ewi.tudelft.nl/~davidt/oneclass.html</a>) and Nathalie Japkowicz. Any thoughts on claims fraud…tag:www.analyticbridge.datasciencecentral.com,2009-11-12:2004291:Comment:570252009-11-12T17:56:00.834ZSouravhttps://www.analyticbridge.datasciencecentral.com/profile/Sourav
Any thoughts on claims fraud modeling ?
Any thoughts on claims fraud modeling ? "...a form of bootstrap sampl…tag:www.analyticbridge.datasciencecentral.com,2009-08-26:2004291:Comment:538232009-08-26T03:11:09.855ZMark Richardshttps://www.analyticbridge.datasciencecentral.com/profile/MarkRichards
"...a form of bootstrap sampling to boost your rare event cases higher for development purposes "<br />
<br />
Have you considered adaptive methods (e.g Gradient Boosting aka Salford System's "Tree Net")?
"...a form of bootstrap sampling to boost your rare event cases higher for development purposes "<br />
<br />
Have you considered adaptive methods (e.g Gradient Boosting aka Salford System's "Tree Net")? Manish - something like your…tag:www.analyticbridge.datasciencecentral.com,2009-05-26:2004291:Comment:451182009-05-26T20:06:26.188ZAlan Forresthttps://www.analyticbridge.datasciencecentral.com/profile/AlanForrest
Manish - something like your problem happened in Basel II Credit Risk analysis. About 2005, the financial regulators in Europe and US realised their rules would require banks to produce a probability of default model for some loan portfolios that in fact had no, or few, historical defaults - this is pre-crunch 2005, mind! How to do it? (PD=0 is the wrong answer, by the way).<br />
<br />
Some good ideas came out of the FSA (UK regulator), the Bundesbank and other financial institutions and some of these…
Manish - something like your problem happened in Basel II Credit Risk analysis. About 2005, the financial regulators in Europe and US realised their rules would require banks to produce a probability of default model for some loan portfolios that in fact had no, or few, historical defaults - this is pre-crunch 2005, mind! How to do it? (PD=0 is the wrong answer, by the way).<br />
<br />
Some good ideas came out of the FSA (UK regulator), the Bundesbank and other financial institutions and some of these were gathered in the following link.<br />
<br />
<a href="http://www.fsa.gov.uk/pubs/international/default_probabilities.pdf">http://www.fsa.gov.uk/pubs/international/default_probabilities.pdf</a><br />
<br />
I'm interested in this because I proposed one of the methods, based on well-known ideas of marginal likelihood (so nothing proprietorial about this!). I still like it, as it gets to the heart of the underlying portfolio default model - a random effects model with time-series autocorrelations. See<br />
<br />
<a href="http://www.crc.man.ed.ac.uk/pdf/Forrest-2008.pdf">http://www.crc.man.ed.ac.uk/pdf/Forrest-2008.pdf</a><br />
<br />
I think if you're data-mining then you're probably more interested in a purely fixed effects model and will have a rather high dimensional state space over which to build your likelihood functions, so likelihood surfaces may become hard to visualise. But it's important to remember that likelihood is the fundamental quantity from which any exact model will derive its best fit and standard errors, so in low event situations you'll probably find yourself working with likelihoods directly whatever way you do it.<br />
<br />
Hope this helps, or at least triggers a few ideas.<br />
<br />
Alan Thanks for the replies,
Bill…tag:www.analyticbridge.datasciencecentral.com,2009-05-25:2004291:Comment:450022009-05-25T17:06:01.474ZManishhttps://www.analyticbridge.datasciencecentral.com/profile/Manish
Thanks for the replies,<br />
<br />
Bill can you share a little more around the spline based modeling ?<br />
<br />
Cheers<br />
M
Thanks for the replies,<br />
<br />
Bill can you share a little more around the spline based modeling ?<br />
<br />
Cheers<br />
M Hello,
I am not sure if this…tag:www.analyticbridge.datasciencecentral.com,2009-05-25:2004291:Comment:449222009-05-25T00:59:27.222ZSharethram Hariharan (Shareth)https://www.analyticbridge.datasciencecentral.com/profile/SharethramHariharan
Hello,<br />
<br />
I am not sure if this is the right place to ask this question but I am currently working on Supply chain Disruption Management. Specifically, I would like to model the impact of very low frequency but high impact events analytically - Not in a data mining way but more in terms of Operations Research/Management Science. Are there analytical methodologies other than Rare Event Simulation to model extreme or very low probability events?<br />
<br />
thank you in advance!
Hello,<br />
<br />
I am not sure if this is the right place to ask this question but I am currently working on Supply chain Disruption Management. Specifically, I would like to model the impact of very low frequency but high impact events analytically - Not in a data mining way but more in terms of Operations Research/Management Science. Are there analytical methodologies other than Rare Event Simulation to model extreme or very low probability events?<br />
<br />
thank you in advance! You likely want to do Poisson…tag:www.analyticbridge.datasciencecentral.com,2009-05-25:2004291:Comment:449182009-05-25T00:25:55.086ZJoseph Hilbehttps://www.analyticbridge.datasciencecentral.com/profile/JosephHilbe
You likely want to do Poisson, or better, negative binomial regression. See my book:<br />
Hilbe, Joseph M. (2007), Negative Binomial Regression, Cambridge University Press<br />
<br />
I give a rather lengthy discussion of how dealing with low incidence data differ from logistic regression in my new:<br />
Hilbe, Joseph M (2009), Logistic Regression Models, Chapman & Hall/CRC
You likely want to do Poisson, or better, negative binomial regression. See my book:<br />
Hilbe, Joseph M. (2007), Negative Binomial Regression, Cambridge University Press<br />
<br />
I give a rather lengthy discussion of how dealing with low incidence data differ from logistic regression in my new:<br />
Hilbe, Joseph M (2009), Logistic Regression Models, Chapman & Hall/CRC Actually, you can model prett…tag:www.analyticbridge.datasciencecentral.com,2009-05-24:2004291:Comment:449052009-05-24T19:11:42.872ZBill Cassillhttps://www.analyticbridge.datasciencecentral.com/profile/BillCassill
Actually, you can model pretty much anything with a low incidence rate even less than 1 in 500. Believe it or not straight statistical methods like logistic regression or any other algorithm of your choice can handle this type of problem without too much trouble. My favorite, however, is using spline model techniques. The secret in my experience is in balancing your samples (i.e. development and validation). With any low incidence data, there is a greater likelihood that a couple of outliers…
Actually, you can model pretty much anything with a low incidence rate even less than 1 in 500. Believe it or not straight statistical methods like logistic regression or any other algorithm of your choice can handle this type of problem without too much trouble. My favorite, however, is using spline model techniques. The secret in my experience is in balancing your samples (i.e. development and validation). With any low incidence data, there is a greater likelihood that a couple of outliers can throw your samples off leading to big differences in predictive performance between each. My suggestion is to create several such sample splits and build models on each set. Look for those splits that give you relatively good and equal performance on both samples.<br />
<br />
Hope this helps,<br />
<br />
Bill