Linear regression on an usual domain, hyperplane, sphere or simplex - AnalyticBridge2021-06-18T02:35:44Zhttps://www.analyticbridge.datasciencecentral.com/forum/topics/linear-regression-on-an-usual-domain-hyperplane-sphere-or-simplex?feed=yes&xn_auth=noAs a first thought, I vaguely…tag:www.analyticbridge.datasciencecentral.com,2013-08-03:2004291:Comment:2601542013-08-03T17:19:06.954ZJavier Alonsohttps://www.analyticbridge.datasciencecentral.com/profile/JavierAlonso
<p>As a first thought, I vaguely remember from my navigation skills that the Mercator projection maps a sphere into the inside face of a cylinder:</p>
<p></p>
<p>For a given latitude lambda and longitude phi, </p>
<p><strong>x= lambda(n) - lambda(0)</strong> and <strong>y= ln(tan(phi) + sec(phi))</strong>,</p>
<p>being lambda(0) the longitude you took as origin - Greenwich meridian, for example.</p>
<p></p>
<p>Then you cut the cylinder vertically, unfold it, and you've got a…</p>
<p>As a first thought, I vaguely remember from my navigation skills that the Mercator projection maps a sphere into the inside face of a cylinder:</p>
<p></p>
<p>For a given latitude lambda and longitude phi, </p>
<p><strong>x= lambda(n) - lambda(0)</strong> and <strong>y= ln(tan(phi) + sec(phi))</strong>,</p>
<p>being lambda(0) the longitude you took as origin - Greenwich meridian, for example.</p>
<p></p>
<p>Then you cut the cylinder vertically, unfold it, and you've got a rectangle.</p>
<p></p>
<p>Problem: Angles (directions) are kept constant, but distances not. The shorter distance between two points is now a curve, the orthodromic...</p> Update: My intent was more t…tag:www.analyticbridge.datasciencecentral.com,2013-08-03:2004291:Comment:2598172013-08-03T04:00:29.526ZVincent Granvillehttps://www.analyticbridge.datasciencecentral.com/profile/VincentGranville
<p><span>Update: My intent was more to create a competing product that tastes the same, call it something different from Coke, and sell it for far less. If the ingredients are different, even very different, even though the taste is identical, it is actually a significant benefit, because Coke manufacturers won't be able to successfully sue you. </span><br></br><br></br><span>I think Virgin almost managed to create a clone. And of course, Pepsi does not come close, the taste is so different, just like…</span></p>
<p><span>Update: My intent was more to create a competing product that tastes the same, call it something different from Coke, and sell it for far less. If the ingredients are different, even very different, even though the taste is identical, it is actually a significant benefit, because Coke manufacturers won't be able to successfully sue you. </span><br/><br/><span>I think Virgin almost managed to create a clone. And of course, Pepsi does not come close, the taste is so different, just like apples and oranges.</span></p> Philip Hanser suggested the f…tag:www.analyticbridge.datasciencecentral.com,2013-08-02:2004291:Comment:2595952013-08-02T19:15:07.166ZVincent Granvillehttps://www.analyticbridge.datasciencecentral.com/profile/VincentGranville
<p><a href="http://www.linkedin.com/groups?viewMemberFeed=&gid=4520336&memberID=12500745" target="_blank">Philip Hanser</a> suggested the following eBook: <a href="http://www.leg.ufpr.br/lib/exe/fetch.php/pessoais:abtmartins:a_concise_guide_to_compositional_data_analysis.pdf" target="_blank">A Concise Guide to Compositional Data Analysis</a> (by John Aitchison, estimated publication date is 1999 based on references) which deals with mixture / simplex domain. I did not find a section on…</p>
<p><a href="http://www.linkedin.com/groups?viewMemberFeed=&gid=4520336&memberID=12500745" target="_blank">Philip Hanser</a> suggested the following eBook: <a href="http://www.leg.ufpr.br/lib/exe/fetch.php/pessoais:abtmartins:a_concise_guide_to_compositional_data_analysis.pdf" target="_blank">A Concise Guide to Compositional Data Analysis</a> (by John Aitchison, estimated publication date is 1999 based on references) which deals with mixture / simplex domain. I did not find a section on "simplex regression", but reading this book is a good starting point.</p> Hi John-
Can you illustrate t…tag:www.analyticbridge.datasciencecentral.com,2013-08-02:2004291:Comment:2595932013-08-02T18:44:18.928ZAli ElKahkyhttps://www.analyticbridge.datasciencecentral.com/profile/AliElKahky
<p>Hi John-</p>
<p>Can you illustrate the idea more for me. I think the issue in this problem is we do not observe the ratios in the training data, you just observe a value (say from 0 to 1).</p>
<p>The constrains here are on the hidden variable not on the output variable if I understand correctly</p>
<p>Hi John-</p>
<p>Can you illustrate the idea more for me. I think the issue in this problem is we do not observe the ratios in the training data, you just observe a value (say from 0 to 1).</p>
<p>The constrains here are on the hidden variable not on the output variable if I understand correctly</p> Seems like a good approach! I…tag:www.analyticbridge.datasciencecentral.com,2013-08-02:2004291:Comment:2596652013-08-02T17:49:02.023ZLisa Wellshttps://www.analyticbridge.datasciencecentral.com/xn/detail/u_3o8kui8su4bb5
<p>Seems like a good approach! I felt uneasy about the idea of running regression on the constrained optimization problem's variables. How to be certain that it made sense to layer one method over another? Either one would need to work it out as a proof (ugh) or try it with numeric data, then sanity check the results. The latter is not ideal, e.g. I wouldn't want to defend that as my rationale! </p>
<p>I like the idea of separating the problem into two parts by translating the constrained…</p>
<p>Seems like a good approach! I felt uneasy about the idea of running regression on the constrained optimization problem's variables. How to be certain that it made sense to layer one method over another? Either one would need to work it out as a proof (ugh) or try it with numeric data, then sanity check the results. The latter is not ideal, e.g. I wouldn't want to defend that as my rationale! </p>
<p>I like the idea of separating the problem into two parts by translating the constrained variables, solving that, then going back and doing the rest, so to speak ;o)</p> Vincent:
You are facing a "co…tag:www.analyticbridge.datasciencecentral.com,2013-08-02:2004291:Comment:2598722013-08-02T16:49:57.887ZJohn F. Elder IVhttps://www.analyticbridge.datasciencecentral.com/profile/JohnFElderIV
<p>Vincent:</p>
<p>You are facing a "composition" problem. You are right to be wary of doing a normal regression on the component percentages; it has flaws. John Aitchison solved this problem - which is big in the mining industry where one takes core samples. He showed you must first translate the k variables which are constrained into a (k-1)-dimensional set of unconstrained variables by using log-ratios. That is, z_j = log(x_j/x_k) for j=1,k-1. (This assumes z_k is not zero; there are…</p>
<p>Vincent:</p>
<p>You are facing a "composition" problem. You are right to be wary of doing a normal regression on the component percentages; it has flaws. John Aitchison solved this problem - which is big in the mining industry where one takes core samples. He showed you must first translate the k variables which are constrained into a (k-1)-dimensional set of unconstrained variables by using log-ratios. That is, z_j = log(x_j/x_k) for j=1,k-1. (This assumes z_k is not zero; there are other ways to do it if that's a problem.)</p>
<p>Then, with the answer, you can translate back into the original variables.</p> Maybe a Bayesian approach wit…tag:www.analyticbridge.datasciencecentral.com,2013-08-02:2004291:Comment:2598682013-08-02T15:42:36.148ZVincent Granvillehttps://www.analyticbridge.datasciencecentral.com/profile/VincentGranville
<p>Maybe a Bayesian approach with MCMC.</p>
<p>Maybe a Bayesian approach with MCMC.</p> if you think of Bayesian regr…tag:www.analyticbridge.datasciencecentral.com,2013-08-01:2004291:Comment:2596342013-08-01T21:20:19.243ZAli ElKahkyhttps://www.analyticbridge.datasciencecentral.com/profile/AliElKahky
<p>if you think of Bayesian regression model and place a Dirichlet prior over weights' mean then this may do what you need. but you will need to do approximate inference since we lost our conjugate prior</p>
<p> </p>
<p>if you think of Bayesian regression model and place a Dirichlet prior over weights' mean then this may do what you need. but you will need to do approximate inference since we lost our conjugate prior</p>
<p> </p> Some of them look like attrib…tag:www.analyticbridge.datasciencecentral.com,2013-08-01:2004291:Comment:2595202013-08-01T19:38:05.370ZJuan Carlos Borráshttps://www.analyticbridge.datasciencecentral.com/profile/JuanCarlosBorras
<p>Some of them look like attribution problems. For the "coca cola" problem you can try L² minimization and even relaxing the simplex constraint (assuming non-linearities do not exist, and your minimization variables don't try to find a probability distribution). If you have R in your bag of tricks you can use some code I wrote here: …</p>
<p>Some of them look like attribution problems. For the "coca cola" problem you can try L² minimization and even relaxing the simplex constraint (assuming non-linearities do not exist, and your minimization variables don't try to find a probability distribution). If you have R in your bag of tricks you can use some code I wrote here: <a href="http://jcborras.net/carpet/voting-sympathies-in-double-round-elections-the-finnish-2012-presidential-election-case.html">http://jcborras.net/carpet/voting-sympathies-in-double-round-elections-the-finnish-2012-presidential-election-case.html</a> ("Candidate drop and its..." section, and if the simplex constraint is very very very important for you then the "Final vote mix..." section).</p>
<p></p>
<p>If you input data is large (as in the number of samples) your matrices may grow too big though.</p>
<p></p> I'd be curious how you comput…tag:www.analyticbridge.datasciencecentral.com,2013-08-01:2004291:Comment:2594232013-08-01T19:21:59.391ZVincent Granvillehttps://www.analyticbridge.datasciencecentral.com/profile/VincentGranville
<p>I'd be curious how you compute confidence intervals for the coefficients, or test whether some are equal to 0. This might be a bit more tricky, though you can always use my <a href="http://www.analyticbridge.com/profiles/blogs/how-to-build-simple-accurate-data-driven-model-free-confidence-in" target="_blank">non-parametric, model-free approach</a>.</p>
<p>I'd be curious how you compute confidence intervals for the coefficients, or test whether some are equal to 0. This might be a bit more tricky, though you can always use my <a href="http://www.analyticbridge.com/profiles/blogs/how-to-build-simple-accurate-data-driven-model-free-confidence-in" target="_blank">non-parametric, model-free approach</a>.</p>