AnalyticBridge2021-09-27T14:04:14ZMatt Thompsonhttps://www.analyticbridge.datasciencecentral.com/profile/MattThompsonhttps://storage.ning.com/topology/rest/1.0/file/get/2191569637?profile=RESIZE_48X48&width=48&height=48&crop=1%3A1https://www.analyticbridge.datasciencecentral.com/forum/topic/listForContributor?user=2nv3dg0kgw8ca&feed=yes&xn_auth=noCoronavirus effect on algorithm validitytag:www.analyticbridge.datasciencecentral.com,2020-04-23:2004291:Topic:3976002020-04-23T06:40:55.591ZMatt Thompsonhttps://www.analyticbridge.datasciencecentral.com/profile/MattThompson
<p><span class="css-901oao css-16my406 r-1qd0xha r-ad9z0x r-bcqeeo r-qvutc0">Corona turns the society as well as the economy upside down and changes everything from customer behavior to travel preferences to economic expectations.</span></p>
<p><span class="css-901oao css-16my406 r-1qd0xha r-ad9z0x r-bcqeeo r-qvutc0">Current algorithms which optimize e.g. recommendations in online shops are based on pre-corona data, however.…</span></p>
<p></p>
<p><span class="css-901oao css-16my406 r-1qd0xha r-ad9z0x r-bcqeeo r-qvutc0">Corona turns the society as well as the economy upside down and changes everything from customer behavior to travel preferences to economic expectations.</span></p>
<p><span class="css-901oao css-16my406 r-1qd0xha r-ad9z0x r-bcqeeo r-qvutc0">Current algorithms which optimize e.g. recommendations in online shops are based on pre-corona data, however.</span></p>
<p><span class="css-901oao css-16my406 r-1qd0xha r-ad9z0x r-bcqeeo r-qvutc0">Thus, the question comes to mind if</span> <span class="css-901oao css-16my406 r-1qd0xha r-ad9z0x r-bcqeeo r-qvutc0">the commonly used algorithms</span> <span class="css-901oao css-16my406 r-1qd0xha r-ad9z0x r-bcqeeo r-qvutc0">in areas like marketing</span> <span class="css-901oao css-16my406 r-1qd0xha r-ad9z0x r-bcqeeo r-qvutc0">and sales</span> <span class="css-901oao css-16my406 r-1qd0xha r-ad9z0x r-bcqeeo r-qvutc0">are still reliable. Or useless or even dangerous? Are there perhaps already analyses for this?</span></p> Outliers in Logistic Regressiontag:www.analyticbridge.datasciencecentral.com,2019-04-04:2004291:Topic:3922522019-04-04T17:59:41.476ZMatt Thompsonhttps://www.analyticbridge.datasciencecentral.com/profile/MattThompson
<p><i>This discussion has been recovered from our archives. </i></p>
<p>I'm new to predictive modelling and I'am currently developing a model of student churn for an educative institution where I work. I´m using logistic regression for this issue , so which technique should I use in order to detect outliers in my training set?.</p>
<p><strong>Answers:</strong></p>
<ol>
<li><span>The way we take care of outliers in Logistic Regression is creating dummy variables based on EDA (Exploratory Data…</span></li>
</ol>
<p><i>This discussion has been recovered from our archives. </i></p>
<p>I'm new to predictive modelling and I'am currently developing a model of student churn for an educative institution where I work. I´m using logistic regression for this issue , so which technique should I use in order to detect outliers in my training set?.</p>
<p><strong>Answers:</strong></p>
<ol>
<li><span>The way we take care of outliers in Logistic Regression is creating dummy variables based on EDA (Exploratory Data Analysis).</span></li>
<li>Regression analysis, the available "DRS" Software</li>
<li><span>You brought a good question for discussion. We use Half-Normal Probability Plot of the deviance residuals with a Simulated envelope to detect outliers in binary logistic regression. The plot helps to identify the deviance residuals. A good reference is a book authored by Cook, R.d and S. Weisberg, titled </span><strong><em>Applied Regression Including Computing and Graphics (1999)</em></strong><span>. For reference how to do half-normal plot with envelop check <a href="https://cran.r-project.org/web/packages/auditor/vignettes/model_fit_audit.html">https://cran.r-project.org/web/packages/auditor/vignettes/model_fit_audit.html</a></span></li>
<li><span>we normally screen out the most extreme 2 percentile of any variable(total of 4pct). those records that have the extreme variable got removed. </span><span>u can reduce the cutoff to 1pct if yr sample size is small</span></li>
</ol>
<p></p>
<p></p> Can Python do the following?tag:www.analyticbridge.datasciencecentral.com,2018-11-13:2004291:Topic:3894022018-11-13T18:02:59.085ZMatt Thompsonhttps://www.analyticbridge.datasciencecentral.com/profile/MattThompson
<p>These were features that I liked in Perl. Wondering if there is a way to make it work with Python?</p>
<ul>
<li>Automated memory allocation / de-allocation (for variables, arrays, hash tables etc.)</li>
<li>Turning your program into an executable (that is, pre-compiled.)</li>
<li>Automated variable initialization (variables, arrays don't even need to be declared, much less initialized)</li>
<li>Automated type casting (e.g. automatically treating a same variable as an integer or string…</li>
</ul>
<p>These were features that I liked in Perl. Wondering if there is a way to make it work with Python?</p>
<ul>
<li>Automated memory allocation / de-allocation (for variables, arrays, hash tables etc.)</li>
<li>Turning your program into an executable (that is, pre-compiled.)</li>
<li>Automated variable initialization (variables, arrays don't even need to be declared, much less initialized)</li>
<li>Automated type casting (e.g. automatically treating a same variable as an integer or string depending on the context: integer when performing a multiplication, or string for concatenation)</li>
</ul>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/134946461?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/134946461?profile=original" class="align-center"/></a></p>
<p>You are going to say that this makes for terrible programming, but in my case I use the code only for myself, and I'd rather focus on the algorithms rather than the coding / debugging itself. Also, wondering if there are options for automated de-bugging.</p>
<p>Also wondering how to produce sounds in Python, and which random number generator it uses. Finally, is high precision computing (like 500 digits of accuracy) is reliable in Python, using the default BigNum libraries?</p>
<p>Thanks.</p>
<p></p> Explaining SQL JOINS - Improving on the Classic Venn Diagramstag:www.analyticbridge.datasciencecentral.com,2018-10-16:2004291:Topic:3892312018-10-16T13:32:32.727ZMatt Thompsonhttps://www.analyticbridge.datasciencecentral.com/profile/MattThompson
<p class="yklcuq-10 hpxQMr">Back when I was learning SQL, I was often hung up on the JOIN concept. The venn diagrams were a life saver but as I learned more and used SQL more and more I found that they were not quite enough.</p>
<p class="yklcuq-10 hpxQMr"></p>
<p class="yklcuq-10 hpxQMr">I worked with some of my colleagues at the Data School to try to go a little bit further.</p>
<p class="yklcuq-10 hpxQMr"></p>
<p class="yklcuq-10 hpxQMr">We were trying to keep it VERY basic so some join…</p>
<p class="yklcuq-10 hpxQMr">Back when I was learning SQL, I was often hung up on the JOIN concept. The venn diagrams were a life saver but as I learned more and used SQL more and more I found that they were not quite enough.</p>
<p class="yklcuq-10 hpxQMr"></p>
<p class="yklcuq-10 hpxQMr">I worked with some of my colleagues at the Data School to try to go a little bit further.</p>
<p class="yklcuq-10 hpxQMr"></p>
<p class="yklcuq-10 hpxQMr">We were trying to keep it VERY basic so some join types and antijoins are not included.</p>
<p class="yklcuq-10 hpxQMr"></p>
<p class="yklcuq-10 hpxQMr">I would love to know what you all think.</p>
<p class="yklcuq-10 hpxQMr"><a href="http://storage.ning.com/topology/rest/1.0/file/get/2059721635?profile=original" target="_self"><img src="http://storage.ning.com/topology/rest/1.0/file/get/2059721635?profile=original" width="470" class="align-center"/></a></p>
<p class="yklcuq-10 hpxQMr"></p>
<p class="yklcuq-10 hpxQMr">For the full write up:<span> </span><a target="_blank" class="yklcuq-27 dlFmxw" href="https://dataschool.com/sql-join-types-explained-visualizing-sql-joins-and-building-on-the-classic-venn-diagrams/" rel="noopener">https://dataschool.com/sql-join-types-explained-visualizing-sql-joins-and-building-on-the-classic-venn-diagrams/</a></p> Anomaly Detectiontag:www.analyticbridge.datasciencecentral.com,2018-07-06:2004291:Topic:3866992018-07-06T13:40:18.124ZMatt Thompsonhttps://www.analyticbridge.datasciencecentral.com/profile/MattThompson
<p>How would you go about an Unsupervised Anomaly Detection problem?</p>
<p>How would you go about an Unsupervised Anomaly Detection problem?</p> Powerful, Hybrid Machine Learning Algorithm with Excel Implementationtag:www.analyticbridge.datasciencecentral.com,2018-06-12:2004291:Topic:3852472018-06-12T13:30:51.780ZMatt Thompsonhttps://www.analyticbridge.datasciencecentral.com/profile/MattThompson
<p><span>In this article, we discuss a general machine learning technique to make predictions or score transactional data, applicable to very big, streaming data. This hybrid technique combines different algorithms to boost accuracy, outperforming each algorithm taken separately, yet it is simple enough to be reliably automated It is illustrated in the context of predicting the performance of articles published in media outlets or blogs, and has been used by the author to build an AI…</span></p>
<p><span>In this article, we discuss a general machine learning technique to make predictions or score transactional data, applicable to very big, streaming data. This hybrid technique combines different algorithms to boost accuracy, outperforming each algorithm taken separately, yet it is simple enough to be reliably automated It is illustrated in the context of predicting the performance of articles published in media outlets or blogs, and has been used by the author to build an AI (artificial intelligence) system to detect articles worth curating, as well as to automatically schedule tweets and other postings in social networks.for maximum impact, with a goal of eventually fully automating digital publishing. This application is broad enough that the methodology can be applied to most NLP (natural language processing) contexts with large amounts of unstructured data. The results obtained in our particular case study are also very interesting. </span></p>
<p><span><a href="http://storage.ning.com/topology/rest/1.0/file/get/2059721691?profile=original" target="_self"><img src="http://storage.ning.com/topology/rest/1.0/file/get/2059721691?profile=original" width="263" class="align-center"/></a></span></p>
<p><span>The algorithmic framework described here applies to any data set, text or not, with quantitative, non-quantitative (gender, race) or a mix of variables. It consists of several components; we discuss in details those that are new and original, The other, non original components are briefly mentioned, with references provided for further reading. No deep technical expertise and no mathematical knowledge is required to understand the concepts and methodology described here. The methodology, though state-of-the-art, is simple enough that it can even be implemented in Excel, for small data sets (one million observations.)</span></p>
<p><span>The technique presented here blends non-standard, robust versions of decision trees and regression. It has been successfully used in black-box ML implementations.</span></p>
<p><span>Read full article <a href="https://www.datasciencecentral.com/profiles/blogs/state-of-the-art-machine-learning-automation-with-hdt" target="_blank" rel="noopener">here</a>. </span></p>
<p><em>For related articles from the same author, <a href="http://www.datasciencecentral.com/profiles/blogs/my-data-science-machine-learning-and-related-articles" target="_blank" rel="noopener">click here</a><span>.</span></em></p>
<p><span><b>DSC Resources</b></span></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/fee-book-applied-stochastic-processes">Free Book: Applied Stochastic Processes</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/comprehensive-repository-of-data-science-and-ml-resources">Comprehensive Repository of Data Science and ML Resources</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/difference-between-machine-learning-data-science-ai-deep-learning">Difference between ML, Data Science, AI, Deep Learning, and Statistics</a></li>
</ul> Examples of Stochastic Processes in Machine Learningtag:www.analyticbridge.datasciencecentral.com,2018-06-11:2004291:Topic:3853392018-06-11T13:03:55.271ZMatt Thompsonhttps://www.analyticbridge.datasciencecentral.com/profile/MattThompson
<p>Hi All,</p>
<p>I am currently taking graduate-level coursework in Stochastic Processes. My hope is to apply Stochastic Processes in Machine Learning. I have just started to think about uses cases, and one particular use case that stands out is having the machine learn which probability distribution to pick from when given a data set, then create "X" amount of random processes. </p>
<p>As an amateur, I was wondering if anyone else has tried to use stochastic processes in machine learning.…</p>
<p>Hi All,</p>
<p>I am currently taking graduate-level coursework in Stochastic Processes. My hope is to apply Stochastic Processes in Machine Learning. I have just started to think about uses cases, and one particular use case that stands out is having the machine learn which probability distribution to pick from when given a data set, then create "X" amount of random processes. </p>
<p>As an amateur, I was wondering if anyone else has tried to use stochastic processes in machine learning. What use cases did you apply to? What were some best practices you can share?</p>
<p>Thanks,</p>
<p>Jacob</p> Question about Some Statistical Distributions (Updated)tag:www.analyticbridge.datasciencecentral.com,2018-02-12:2004291:Topic:3804142018-02-12T17:46:44.817ZMatt Thompsonhttps://www.analyticbridge.datasciencecentral.com/profile/MattThompson
<p>What are the potential distributions for a continuous variable <em>X</em> on [0, 1], if |2<em>X</em> - 1| is known to have a uniform distribution on [0, 1]? Will the distribution of INT(2<em>X</em>) always be uniform on {0, 1} ?</p>
<p>This question arises in a potential proof that the digits of the number Pi in base 2 (see exercise 7 <a href="https://www.datasciencecentral.com/profiles/blogs/are-the-digits-of-pi-truly-random" rel="noopener" target="_blank">in this article</a>), distributed…</p>
<p>What are the potential distributions for a continuous variable <em>X</em> on [0, 1], if |2<em>X</em> - 1| is known to have a uniform distribution on [0, 1]? Will the distribution of INT(2<em>X</em>) always be uniform on {0, 1} ?</p>
<p>This question arises in a potential proof that the digits of the number Pi in base 2 (see exercise 7 <a href="https://www.datasciencecentral.com/profiles/blogs/are-the-digits-of-pi-truly-random" target="_blank" rel="noopener">in this article</a>), distributed as INT(2<em>X</em>) and obviously being equal to 0 or 1, are uniformly distributed (50% of 0's and 50% of 1's.) </p>
<p></p>
<p><a href="http://storage.ning.com/topology/rest/1.0/file/get/2059721720?profile=original" target="_self"><img src="http://storage.ning.com/topology/rest/1.0/file/get/2059721720?profile=original" width="271" class="align-center"/></a></p>
<p><strong>Update</strong></p>
<p>I spent more time on this problem, and it is not an easy one. There are actually infinitely many solutions, as many as there are real numbers on [0, 1]. The vast majority of these distributions are nowhere continuous -- they don't have a density. To understand this, do the following simulation:</p>
<ul>
<li>Simulate <em>n</em> random deviates <em>u</em>(<em>n</em>) uniformly distributed on [0, 1].</li>
<li>Generate <em>n</em> numbers <em>d</em>(<em>n</em>) distributed on {-1, +1}. They don't need to be uniformly distributed: they can all be -1 or +1 or any combination of both. For instance <em>d</em>(<em>n</em>) can be -1 if the <em>n</em>-th digit of Pi in base 2, is zero, and +1 if the <em>n</em>-th digit of Pi in base 2, is one. You can use any other number instead of Pi, for instance 7/13, and then the final result will be different.</li>
<li>For each <em>n</em>, compute <em>v</em>(<em>n</em>) = <em>d</em>(<em>n</em>) * <em>u</em>(<em>n</em>).</li>
<li>For each <em>n</em>, compute <em>x</em>(<em>n</em>) = (1 + <em>v</em>(<em>n</em>)) / 2.</li>
</ul>
<p>The limiting random variable <em>X</em> attached to the <em>x</em>(<em>n</em>)'s, as <em>n</em> tends to infinity, is solution to the problem. However, there are as many solutions as there are ways to generate the <em>d</em>(<em>n</em>)'s, and the distribution of INT(2<em>X</em>) will be discrete on {0, 1}, but usually not uniform: it will depend on the proportions of +1 and -1 in the <em>d</em>(<em>n</em>)'s. If you use the number Pi to compute the <em>d</em>(<em>n</em>)'s, it will be uniform.</p> Generalized Coefficient of Correlation for Non-Linear Relationshipstag:www.analyticbridge.datasciencecentral.com,2018-02-12:2004291:Topic:3804122018-02-12T17:24:16.144ZMatt Thompsonhttps://www.analyticbridge.datasciencecentral.com/profile/MattThompson
<p>What is the best correlation coefficient R(<em>X</em>, <em>Y</em>) to measure non-linear dependencies between two variables <em>X</em> and <em>Y</em>? Let's say that you want to assess weather there is a linear or quadratic relationship between <em>X</em> and <em>Y</em>. One way to do it is to perform a polynomial regression such as <em>Y</em> = <em>a</em> + <em>bX</em> + <em>cX</em>^2, and then measure the standard coefficient of correlation between the predicted and observed values. How…</p>
<p>What is the best correlation coefficient R(<em>X</em>, <em>Y</em>) to measure non-linear dependencies between two variables <em>X</em> and <em>Y</em>? Let's say that you want to assess weather there is a linear or quadratic relationship between <em>X</em> and <em>Y</em>. One way to do it is to perform a polynomial regression such as <em>Y</em> = <em>a</em> + <em>bX</em> + <em>cX</em>^2, and then measure the standard coefficient of correlation between the predicted and observed values. How good is this approach? </p>
<p><a href="http://storage.ning.com/topology/rest/1.0/file/get/2059721641?profile=original" target="_self"><img src="http://storage.ning.com/topology/rest/1.0/file/get/2059721641?profile=original" width="430" class="align-center"/></a></p>
<p>Note that the proposed correlation coefficient R(<em>X</em>, <em>Y</em>) is not symmetric. One way to get a symmetric version, is to use the maximum between | R(<em>X</em>, <em>Y</em>) | and | R(<em>Y</em>, <em>X</em>) |. It will be equal to 1 if and only if there is an exact polynomial or inverse polynomial relationship between <em>X</em> and <em>Y</em>. </p>
<p><strong>Note</strong>: If one checks the model <em>Y</em> = <em>a</em> + <em>b</em>X + <em>c</em>X^2, the "inverse polynomial" model would be <em>X</em> = <em>a'</em> + <em>b'Y</em> + <em>c'Y</em>^2. So, R(<em>X</em>, <em>Y</em>) is computed on the first regression, while R(<em>Y</em>, <em>X</em>) is computed on the second (reversed, also called dual) regression. </p>
<p><strong>Discussion</strong></p>
<p>An issue with my approach is the risk of over-fitting. If you have <em>n</em> observations and <em>n</em> coefficients in the regression, my correlation will always be 1.</p>
<p>There are various ways to avoid this problem, for instance:</p>
<ul>
<li>Use a polynomial of degree 2 maximum, regardless of the number of observations.</li>
<li>Use much smoother functions than polnomials, for instance functions that have one extremum (maximum or minimum) at most, and growing not faster than a linear function. Even in that case, use a small number of coefficients in the regression, maybe log(log(<em>n</em>))) where <em>n</em> is the number of observations.</li>
</ul>
<p>The correlation coefficient in question can also be used for model selection: The best model would provide the correlation closest to 1.</p> My paper on Differential Geometry on Graphs with applications to Foreign Exchange Option Symmetrytag:www.analyticbridge.datasciencecentral.com,2017-10-02:2004291:Topic:3721022017-10-02T20:55:19.864ZMatt Thompsonhttps://www.analyticbridge.datasciencecentral.com/profile/MattThompson
<p>Dear Colleagues,<br></br> <br></br> I would like to bring to your attention my paper of 2001 on differential geometry on graphs with applications to the foreign exchange option symmetry in a multiple currency foreign exchange market that might be of interest to you:</p>
<p> </p>
<p>[1] V.A. Kholodnyi and J.F. Price, <i>Foreign Exchange Option Symmetry and a Coordinate-Free Description of a Multiple Currency Market in Terms of Differential Geometry on Graphs,</i> Journal of Nonlinear Analysis, 47 (9)…</p>
<p>Dear Colleagues,<br/> <br/> I would like to bring to your attention my paper of 2001 on differential geometry on graphs with applications to the foreign exchange option symmetry in a multiple currency foreign exchange market that might be of interest to you:</p>
<p> </p>
<p>[1] V.A. Kholodnyi and J.F. Price, <i>Foreign Exchange Option Symmetry and a Coordinate-Free Description of a Multiple Currency Market in Terms of Differential Geometry on Graphs,</i> Journal of Nonlinear Analysis, 47 (9) (2001) 5885-5896.</p>
<p> </p>
<p>I introduced the notion of the differential geometry on graphs in 1995 in the following preprint:</p>
<p> </p>
<p>[2] V.A. Kholodnyi, <i>Beliefs-Preferences Gauge Symmetry Group and Dynamic Replication of Contingent Claims in a General Market Environment</i>, IES Preprint, 1995.</p>
<p> </p>
<p>The foreign exchange symmetry was introduced in 1996 in the following preprints:</p>
<p> </p>
<p>[3] V.A. Kholodnyi and J.F. Price, <i>Foreign Exchange Option Symmetry in a General Market Environment</i>, IES Preprint, 1996.</p>
<p> </p>
<p>[4] V.A. Kholodnyi and J.F. Price, <i>Foreign Exchange Option Symmetry in a Multiple Currency General Market Environment</i>, IES Preprint, 1996.</p>
<p> </p>
<p>Please also find below the references to some of my related published books and papers that might be also of interest to you:</p>
<p> </p>
<p>[5] V.A. Kholodnyi, <i>Beliefs-Preferences Gauge Symmetry Group and Replication of Contingent Claims in a General Market Environment</i>, IES Press, Research Triangle Park, North Carolina, 1998.</p>
<p> </p>
<p>[6] V.A. Kholodnyi and J.F. Price, <i>Foreign Exchange Option Symmetry</i>, World Scientific, River Edge, New Jersey, 1998.</p>
<p> </p>
<p>[7] V.A. Kholodnyi and J.F. Price, <i>Foundations of Foreign Exchange Option Symmetry</i>, IES Press, Research Triangle Park, North Carolina, 1998.</p>
<p> </p>
<p>[8] V.A. Kholodnyi, <i>Valuation and Dynamic Replication of Contingent Claims in the Framework of the Beliefs-Preferences Gauge Symmetry,</i> European Physical Journal B, 27 (2) (2002) 229-238.</p>
<p> </p>
<p>[9] V.A. Kholodnyi and J.F. Price, <i>Foreign Exchange Option Symmetry Based on Domestic-Foreign Payoff Invariance</i>, Proceedings of the IEEE/IAFE Conference on Computational Intelligence for Financial Engineering (CIFEr), New York, (1997), 164-170.</p>
<p> </p>
<p>For further information about my related books and papers please find below the link to my Profile at the ResearchGate: <a href="http://www.researchgate.net/profile/Valery_Kholodnyi">http://www.researchgate.net/profile/Valery_Kholodnyi</a>.</p>
<p><br/> Please let me know if you might have questions or would like further information.<br/> <br/> Sincerely,<br/> Valery Kholodnyi</p>