Comments - 10,000 observations and 100,000 parameters: what to do? - AnalyticBridge2021-09-19T14:03:42Zhttps://www.analyticbridge.datasciencecentral.com/profiles/comment/feed?attachedTo=2004291%3ABlogPost%3A78313&xn_auth=noJoseph: To answer your questi…tag:www.analyticbridge.datasciencecentral.com,2010-09-14:2004291:Comment:785512010-09-14T04:11:10.700ZVincent Granvillehttps://www.analyticbridge.datasciencecentral.com/profile/VincentGranville
Joseph: To answer your question (What do you consider to be the practical limit on the number of parameters to (directly) estimate), it depends on the type of parameter. Actually, in the density estimation problem that I described, where parameter = kernel bandwidth, many people would indeed consider this not to be a parameter, they would use a different English word.<br />
<br />
In short, parameters that explain little variance in your model can be very numerous; those that truly drive the model should…
Joseph: To answer your question (What do you consider to be the practical limit on the number of parameters to (directly) estimate), it depends on the type of parameter. Actually, in the density estimation problem that I described, where parameter = kernel bandwidth, many people would indeed consider this not to be a parameter, they would use a different English word.<br />
<br />
In short, parameters that explain little variance in your model can be very numerous; those that truly drive the model should be limited to a very small set, maybe less than 15 as a rule of the thumb. And when you have a large number of parameters, you can usually set constraints on them, a bit like in a hierarchical Bayesian model with 15 top parameters and thousands of lower level parameters. Theo: Do you mean unsupervise…tag:www.analyticbridge.datasciencecentral.com,2010-09-13:2004291:Comment:785382010-09-13T22:31:40.321ZVincent Granvillehttps://www.analyticbridge.datasciencecentral.com/profile/VincentGranville
Theo: Do you mean unsupervised clustering, as opposed to supervised clustering? If yes, I believe there is no solution that applies to all contexts, because there's no precise definition of what a cluster is (and there will never be): just look at the stars in the night sky, and try to group them into clusters with your naked eye; ask a friend do the same -- you will end up with different results, and different justifications. Even the (apparently simple) problem of detecting the number of…
Theo: Do you mean unsupervised clustering, as opposed to supervised clustering? If yes, I believe there is no solution that applies to all contexts, because there's no precise definition of what a cluster is (and there will never be): just look at the stars in the night sky, and try to group them into clusters with your naked eye; ask a friend do the same -- you will end up with different results, and different justifications. Even the (apparently simple) problem of detecting the number of clusters is extremely challenging, because many times a cluster is considered a sub-cluster by some algorithms or human beings, and the other way around.<br />
<br />
I believe the most promising technology will combine multiple types of classifiers, and take advantage of the strength of each classifier (some are good to detect linear or curvilinear structures, some are good with overlapping clusters that have density peaks, some are good at detecting clusters defined by nearest neighbors, some are goods with clusters that have convex shapes, etc.) Vincent: is there a well-defi…tag:www.analyticbridge.datasciencecentral.com,2010-09-13:2004291:Comment:785372010-09-13T22:06:14.912ZTheodore Omtzigthttps://www.analyticbridge.datasciencecentral.com/profile/TheodoreOmtzigt
Vincent: is there a well-defined workflow to drive the clustering and to automatically build optimal models? If you have a pointer to a paper or presentation, I would love to try my hand on implementing this in a high-performance scalable library. I presume there are R implementations available to benchmark against?
Vincent: is there a well-defined workflow to drive the clustering and to automatically build optimal models? If you have a pointer to a paper or presentation, I would love to try my hand on implementing this in a high-performance scalable library. I presume there are R implementations available to benchmark against? Tom: In some models (regressi…tag:www.analyticbridge.datasciencecentral.com,2010-09-13:2004291:Comment:784992010-09-13T01:36:47.203ZVincent Granvillehttps://www.analyticbridge.datasciencecentral.com/profile/VincentGranville
Tom: In some models (regression), you can just ignore variables with little predictive power, or better, try to clusters the variables to reduce the number of variables (and thus parameters) to a meaningful number.<br />
<br />
In the context of discriminate analysis based on kernel-based density estimators, you have one parameter (the kernel bandwidth) at each location in your 2- or 3 dimensional space, and for each group. The bandwidth is usually a function of the distance to nearest neighbors, and thus…
Tom: In some models (regression), you can just ignore variables with little predictive power, or better, try to clusters the variables to reduce the number of variables (and thus parameters) to a meaningful number.<br />
<br />
In the context of discriminate analysis based on kernel-based density estimators, you have one parameter (the kernel bandwidth) at each location in your 2- or 3 dimensional space, and for each group. The bandwidth is usually a function of the distance to nearest neighbors, and thus you can end up with an efficient model that truly has an infinite number of parameters and an infinite number of parameter values. Theodore: you are right about…tag:www.analyticbridge.datasciencecentral.com,2010-09-12:2004291:Comment:784772010-09-12T19:45:01.338ZVincent Granvillehttps://www.analyticbridge.datasciencecentral.com/profile/VincentGranville
Theodore: you are right about <i>information</i>, also called entropy. If you have 10,000 observations, 10 core parameters explaining 80% of the entropy, 1,000 secondary parameters explaining 19% of the entropy, 10,000 parameters explaining 0.9% of the entropy, and a billion parameters explaining the remaining 0.1% of the entropy, then you could say that your framework has 10,000 observations and 1 billion parameters, and that statistical inference works very well, yet you would not be telling…
Theodore: you are right about <i>information</i>, also called entropy. If you have 10,000 observations, 10 core parameters explaining 80% of the entropy, 1,000 secondary parameters explaining 19% of the entropy, 10,000 parameters explaining 0.9% of the entropy, and a billion parameters explaining the remaining 0.1% of the entropy, then you could say that your framework has 10,000 observations and 1 billion parameters, and that statistical inference works very well, yet you would not be telling the truth -- that is the fact that statistical inference is working well because of your top 10 core parameters alone (meaning that if you would eliminate (1 billion - 10) parameters from your model, you would essentially get the same results). Do you have an intuitive expl…tag:www.analyticbridge.datasciencecentral.com,2010-09-12:2004291:Comment:784752010-09-12T18:54:01.175ZTheodore Omtzigthttps://www.analyticbridge.datasciencecentral.com/profile/TheodoreOmtzigt
Do you have an intuitive explanation of what the technique does? It appears that by doing a low pass filter on the correlation between a parameter and the response variable that you filter out 'noise' that might have injected some correlation where there is none in causal terms. However, since this filter is non-discriminatory it appears to only affect the numerics of the algorithm. A simple thought experiment could construct a plausible case where the technique is mightily wrong: for example,…
Do you have an intuitive explanation of what the technique does? It appears that by doing a low pass filter on the correlation between a parameter and the response variable that you filter out 'noise' that might have injected some correlation where there is none in causal terms. However, since this filter is non-discriminatory it appears to only affect the numerics of the algorithm. A simple thought experiment could construct a plausible case where the technique is mightily wrong: for example, if none of the observations capture any of the causal relationship between parameter and response, you would go badly wrong if suddenly a stream of observations comes across that do contain this information.<br />
<br />
From an intuition point of view, the technique assumes that in the observations there is <i>some</i> information that guides the correlation to a reasonable estimate. To me that sounds like a big assumption especially for the case we are talking about where we might have 10 to 1 more parameters than observations. I can see where this could work, but I also see where this would blow up in your face. Any empirical data that you know of? I have a question on a slight…tag:www.analyticbridge.datasciencecentral.com,2010-09-12:2004291:Comment:784302010-09-12T04:15:49.476ZJoseph Foutzhttps://www.analyticbridge.datasciencecentral.com/profile/JosephFoutz
I have a question on a slightly different tack of the same thing. What do you consider to be the practical limit on the number of parameters to (directly) estimate? Many of the mainstream packages have large limits on what they can theoretically handle. SAS has a 2TB dataset size limitation. Stata has a limit of 11,000 parameters (I assume this has to do with assigning a 32-bit integer to every matrix location) with an 8TB max theoretic, 5TB practical theoretic dataset size limitation. Big…
I have a question on a slightly different tack of the same thing. What do you consider to be the practical limit on the number of parameters to (directly) estimate? Many of the mainstream packages have large limits on what they can theoretically handle. SAS has a 2TB dataset size limitation. Stata has a limit of 11,000 parameters (I assume this has to do with assigning a 32-bit integer to every matrix location) with an 8TB max theoretic, 5TB practical theoretic dataset size limitation. Big theoretical limits, but what are the practical limits of what you would estimate? Let's say you have a regressi…tag:www.analyticbridge.datasciencecentral.com,2010-09-12:2004291:Comment:783202010-09-12T02:06:30.573ZVincent Granvillehttps://www.analyticbridge.datasciencecentral.com/profile/VincentGranville
Let's say you have a regression model with 500,000 parameters. Parameter k is denoted as a[k]. The idea is to model a[k] with a formula such as a[k]=f( correletion[Var[k],Response], x, y, z) thus reducing the 500,000 parameters a[1], a[2], ... , a[500,000] to only 3 parameters x, y, z.<br />
<br />
For additional information (in the context of regression) check references about<br />
<br />
- Lasso regression<br />
- Ridge regression
Let's say you have a regression model with 500,000 parameters. Parameter k is denoted as a[k]. The idea is to model a[k] with a formula such as a[k]=f( correletion[Var[k],Response], x, y, z) thus reducing the 500,000 parameters a[1], a[2], ... , a[500,000] to only 3 parameters x, y, z.<br />
<br />
For additional information (in the context of regression) check references about<br />
<br />
- Lasso regression<br />
- Ridge regression