Data Mining Software - AnalyticBridge2019-03-20T15:15:52Zhttps://www.analyticbridge.datasciencecentral.com/forum/categories/data-mining-software/listForCategory?categoryId=2004291%3ACategory%3A52&feed=yes&xn_auth=noCan Python do the following?tag:www.analyticbridge.datasciencecentral.com,2018-11-13:2004291:Topic:3894022018-11-13T18:02:59.085ZVincent Granvillehttps://www.analyticbridge.datasciencecentral.com/profile/VincentGranville
<p>These were features that I liked in Perl. Wondering if there is a way to make it work with Python?</p>
<ul>
<li>Automated memory allocation / de-allocation (for variables, arrays, hash tables etc.)</li>
<li>Turning your program into an executable (that is, pre-compiled.)</li>
<li>Automated variable initialization (variables, arrays don't even need to be declared, much less initialized)</li>
<li>Automated type casting (e.g. automatically treating a same variable as an integer or string…</li>
</ul>
<p>These were features that I liked in Perl. Wondering if there is a way to make it work with Python?</p>
<ul>
<li>Automated memory allocation / de-allocation (for variables, arrays, hash tables etc.)</li>
<li>Turning your program into an executable (that is, pre-compiled.)</li>
<li>Automated variable initialization (variables, arrays don't even need to be declared, much less initialized)</li>
<li>Automated type casting (e.g. automatically treating a same variable as an integer or string depending on the context: integer when performing a multiplication, or string for concatenation)</li>
</ul>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/134946461?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/134946461?profile=original" class="align-center"/></a></p>
<p>You are going to say that this makes for terrible programming, but in my case I use the code only for myself, and I'd rather focus on the algorithms rather than the coding / debugging itself. Also, wondering if there are options for automated de-bugging.</p>
<p>Also wondering how to produce sounds in Python, and which random number generator it uses. Finally, is high precision computing (like 500 digits of accuracy) is reliable in Python, using the default BigNum libraries?</p>
<p>Thanks.</p>
<p></p> Explaining SQL JOINS - Improving on the Classic Venn Diagramstag:www.analyticbridge.datasciencecentral.com,2018-10-16:2004291:Topic:3892312018-10-16T13:32:32.727ZTim Millerhttps://www.analyticbridge.datasciencecentral.com/profile/TimMiller
<p class="yklcuq-10 hpxQMr">Back when I was learning SQL, I was often hung up on the JOIN concept. The venn diagrams were a life saver but as I learned more and used SQL more and more I found that they were not quite enough.</p>
<p class="yklcuq-10 hpxQMr"></p>
<p class="yklcuq-10 hpxQMr">I worked with some of my colleagues at the Data School to try to go a little bit further.</p>
<p class="yklcuq-10 hpxQMr"></p>
<p class="yklcuq-10 hpxQMr">We were trying to keep it VERY basic so some join…</p>
<p class="yklcuq-10 hpxQMr">Back when I was learning SQL, I was often hung up on the JOIN concept. The venn diagrams were a life saver but as I learned more and used SQL more and more I found that they were not quite enough.</p>
<p class="yklcuq-10 hpxQMr"></p>
<p class="yklcuq-10 hpxQMr">I worked with some of my colleagues at the Data School to try to go a little bit further.</p>
<p class="yklcuq-10 hpxQMr"></p>
<p class="yklcuq-10 hpxQMr">We were trying to keep it VERY basic so some join types and antijoins are not included.</p>
<p class="yklcuq-10 hpxQMr"></p>
<p class="yklcuq-10 hpxQMr">I would love to know what you all think.</p>
<p class="yklcuq-10 hpxQMr"><a href="http://api.ning.com:80/files/-rxv8dNJjcGNjLu0hA-2JlfruPHm1Xwi8ENnh7PW6WPK7QDpM0i6zFUJjgsDeRXEjZpIGvQBdA42s5OlVgjKsSIYqE9qlWKp/Capture.PNG" target="_self"><img src="http://api.ning.com:80/files/-rxv8dNJjcGNjLu0hA-2JlfruPHm1Xwi8ENnh7PW6WPK7QDpM0i6zFUJjgsDeRXEjZpIGvQBdA42s5OlVgjKsSIYqE9qlWKp/Capture.PNG" width="470" class="align-center"/></a></p>
<p class="yklcuq-10 hpxQMr"></p>
<p class="yklcuq-10 hpxQMr">For the full write up:<span> </span><a target="_blank" class="yklcuq-27 dlFmxw" href="https://dataschool.com/sql-join-types-explained-visualizing-sql-joins-and-building-on-the-classic-venn-diagrams/" rel="noopener">https://dataschool.com/sql-join-types-explained-visualizing-sql-joins-and-building-on-the-classic-venn-diagrams/</a></p> Powerful, Hybrid Machine Learning Algorithm with Excel Implementationtag:www.analyticbridge.datasciencecentral.com,2018-06-12:2004291:Topic:3852472018-06-12T13:30:51.780ZVincent Granvillehttps://www.analyticbridge.datasciencecentral.com/profile/VincentGranville
<p><span>In this article, we discuss a general machine learning technique to make predictions or score transactional data, applicable to very big, streaming data. This hybrid technique combines different algorithms to boost accuracy, outperforming each algorithm taken separately, yet it is simple enough to be reliably automated It is illustrated in the context of predicting the performance of articles published in media outlets or blogs, and has been used by the author to build an AI…</span></p>
<p><span>In this article, we discuss a general machine learning technique to make predictions or score transactional data, applicable to very big, streaming data. This hybrid technique combines different algorithms to boost accuracy, outperforming each algorithm taken separately, yet it is simple enough to be reliably automated It is illustrated in the context of predicting the performance of articles published in media outlets or blogs, and has been used by the author to build an AI (artificial intelligence) system to detect articles worth curating, as well as to automatically schedule tweets and other postings in social networks.for maximum impact, with a goal of eventually fully automating digital publishing. This application is broad enough that the methodology can be applied to most NLP (natural language processing) contexts with large amounts of unstructured data. The results obtained in our particular case study are also very interesting. </span></p>
<p><span><a href="http://api.ning.com:80/files/NCD3riPer0*YRi2TfzhtvxVa2Hn*6i*MPzHgoIETj1JBO-uK6q5ML6N7dls*Omd-PWy9PpbZBnJKyMjg3QBMcbDlCbe5glCh/Capture.PNG" target="_self"><img src="http://api.ning.com:80/files/NCD3riPer0*YRi2TfzhtvxVa2Hn*6i*MPzHgoIETj1JBO-uK6q5ML6N7dls*Omd-PWy9PpbZBnJKyMjg3QBMcbDlCbe5glCh/Capture.PNG" width="263" class="align-center"/></a></span></p>
<p><span>The algorithmic framework described here applies to any data set, text or not, with quantitative, non-quantitative (gender, race) or a mix of variables. It consists of several components; we discuss in details those that are new and original, The other, non original components are briefly mentioned, with references provided for further reading. No deep technical expertise and no mathematical knowledge is required to understand the concepts and methodology described here. The methodology, though state-of-the-art, is simple enough that it can even be implemented in Excel, for small data sets (one million observations.)</span></p>
<p><span>The technique presented here blends non-standard, robust versions of decision trees and regression. It has been successfully used in black-box ML implementations.</span></p>
<p><span>Read full article <a href="https://www.datasciencecentral.com/profiles/blogs/state-of-the-art-machine-learning-automation-with-hdt" target="_blank" rel="noopener">here</a>. </span></p>
<p><em>For related articles from the same author, <a href="http://www.datasciencecentral.com/profiles/blogs/my-data-science-machine-learning-and-related-articles" target="_blank" rel="noopener">click here</a><span>.</span></em></p>
<p><span><b>DSC Resources</b></span></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/fee-book-applied-stochastic-processes">Free Book: Applied Stochastic Processes</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/comprehensive-repository-of-data-science-and-ml-resources">Comprehensive Repository of Data Science and ML Resources</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/difference-between-machine-learning-data-science-ai-deep-learning">Difference between ML, Data Science, AI, Deep Learning, and Statistics</a></li>
</ul> Question about Some Statistical Distributions (Updated)tag:www.analyticbridge.datasciencecentral.com,2018-02-12:2004291:Topic:3804142018-02-12T17:46:44.817ZVincent Granvillehttps://www.analyticbridge.datasciencecentral.com/profile/VincentGranville
<p>What are the potential distributions for a continuous variable <em>X</em> on [0, 1], if |2<em>X</em> - 1| is known to have a uniform distribution on [0, 1]? Will the distribution of INT(2<em>X</em>) always be uniform on {0, 1} ?</p>
<p>This question arises in a potential proof that the digits of the number Pi in base 2 (see exercise 7 <a href="https://www.datasciencecentral.com/profiles/blogs/are-the-digits-of-pi-truly-random" rel="noopener" target="_blank">in this article</a>), distributed…</p>
<p>What are the potential distributions for a continuous variable <em>X</em> on [0, 1], if |2<em>X</em> - 1| is known to have a uniform distribution on [0, 1]? Will the distribution of INT(2<em>X</em>) always be uniform on {0, 1} ?</p>
<p>This question arises in a potential proof that the digits of the number Pi in base 2 (see exercise 7 <a href="https://www.datasciencecentral.com/profiles/blogs/are-the-digits-of-pi-truly-random" target="_blank" rel="noopener">in this article</a>), distributed as INT(2<em>X</em>) and obviously being equal to 0 or 1, are uniformly distributed (50% of 0's and 50% of 1's.) </p>
<p></p>
<p><a href="http://api.ning.com:80/files/0xzfOxJznOmpCboDGsU6eKqFur9GLP02f3b7cr*2HnqZG3AOukC0xHr3j5DNld8p6NeAnoFQF8U0VQBCamjwB82eyBbaJHHP/Capture.PNG" target="_self"><img src="http://api.ning.com:80/files/0xzfOxJznOmpCboDGsU6eKqFur9GLP02f3b7cr*2HnqZG3AOukC0xHr3j5DNld8p6NeAnoFQF8U0VQBCamjwB82eyBbaJHHP/Capture.PNG" width="271" class="align-center"/></a></p>
<p><strong>Update</strong></p>
<p>I spent more time on this problem, and it is not an easy one. There are actually infinitely many solutions, as many as there are real numbers on [0, 1]. The vast majority of these distributions are nowhere continuous -- they don't have a density. To understand this, do the following simulation:</p>
<ul>
<li>Simulate <em>n</em> random deviates <em>u</em>(<em>n</em>) uniformly distributed on [0, 1].</li>
<li>Generate <em>n</em> numbers <em>d</em>(<em>n</em>) distributed on {-1, +1}. They don't need to be uniformly distributed: they can all be -1 or +1 or any combination of both. For instance <em>d</em>(<em>n</em>) can be -1 if the <em>n</em>-th digit of Pi in base 2, is zero, and +1 if the <em>n</em>-th digit of Pi in base 2, is one. You can use any other number instead of Pi, for instance 7/13, and then the final result will be different.</li>
<li>For each <em>n</em>, compute <em>v</em>(<em>n</em>) = <em>d</em>(<em>n</em>) * <em>u</em>(<em>n</em>).</li>
<li>For each <em>n</em>, compute <em>x</em>(<em>n</em>) = (1 + <em>v</em>(<em>n</em>)) / 2.</li>
</ul>
<p>The limiting random variable <em>X</em> attached to the <em>x</em>(<em>n</em>)'s, as <em>n</em> tends to infinity, is solution to the problem. However, there are as many solutions as there are ways to generate the <em>d</em>(<em>n</em>)'s, and the distribution of INT(2<em>X</em>) will be discrete on {0, 1}, but usually not uniform: it will depend on the proportions of +1 and -1 in the <em>d</em>(<em>n</em>)'s. If you use the number Pi to compute the <em>d</em>(<em>n</em>)'s, it will be uniform.</p> Generalized Coefficient of Correlation for Non-Linear Relationshipstag:www.analyticbridge.datasciencecentral.com,2018-02-12:2004291:Topic:3804122018-02-12T17:24:16.144ZVincent Granvillehttps://www.analyticbridge.datasciencecentral.com/profile/VincentGranville
<p>What is the best correlation coefficient R(<em>X</em>, <em>Y</em>) to measure non-linear dependencies between two variables <em>X</em> and <em>Y</em>? Let's say that you want to assess weather there is a linear or quadratic relationship between <em>X</em> and <em>Y</em>. One way to do it is to perform a polynomial regression such as <em>Y</em> = <em>a</em> + <em>bX</em> + <em>cX</em>^2, and then measure the standard coefficient of correlation between the predicted and observed values. How…</p>
<p>What is the best correlation coefficient R(<em>X</em>, <em>Y</em>) to measure non-linear dependencies between two variables <em>X</em> and <em>Y</em>? Let's say that you want to assess weather there is a linear or quadratic relationship between <em>X</em> and <em>Y</em>. One way to do it is to perform a polynomial regression such as <em>Y</em> = <em>a</em> + <em>bX</em> + <em>cX</em>^2, and then measure the standard coefficient of correlation between the predicted and observed values. How good is this approach? </p>
<p><a href="http://api.ning.com:80/files/xYth05jzfSephrvY6EUOUAflcMV3nnFq9JsBfb0n3629zDiOADK-uF0nHn9onvIZB9yG200WEbFKKsOoVQQKFeM2niEvweN*/Capture.PNG" target="_self"><img src="http://api.ning.com:80/files/xYth05jzfSephrvY6EUOUAflcMV3nnFq9JsBfb0n3629zDiOADK-uF0nHn9onvIZB9yG200WEbFKKsOoVQQKFeM2niEvweN*/Capture.PNG" width="430" class="align-center"/></a></p>
<p>Note that the proposed correlation coefficient R(<em>X</em>, <em>Y</em>) is not symmetric. One way to get a symmetric version, is to use the maximum between | R(<em>X</em>, <em>Y</em>) | and | R(<em>Y</em>, <em>X</em>) |. It will be equal to 1 if and only if there is an exact polynomial or inverse polynomial relationship between <em>X</em> and <em>Y</em>. </p>
<p><strong>Note</strong>: If one checks the model <em>Y</em> = <em>a</em> + <em>b</em>X + <em>c</em>X^2, the "inverse polynomial" model would be <em>X</em> = <em>a'</em> + <em>b'Y</em> + <em>c'Y</em>^2. So, R(<em>X</em>, <em>Y</em>) is computed on the first regression, while R(<em>Y</em>, <em>X</em>) is computed on the second (reversed, also called dual) regression. </p>
<p><strong>Discussion</strong></p>
<p>An issue with my approach is the risk of over-fitting. If you have <em>n</em> observations and <em>n</em> coefficients in the regression, my correlation will always be 1.</p>
<p>There are various ways to avoid this problem, for instance:</p>
<ul>
<li>Use a polynomial of degree 2 maximum, regardless of the number of observations.</li>
<li>Use much smoother functions than polnomials, for instance functions that have one extremum (maximum or minimum) at most, and growing not faster than a linear function. Even in that case, use a small number of coefficients in the regression, maybe log(log(<em>n</em>))) where <em>n</em> is the number of observations.</li>
</ul>
<p>The correlation coefficient in question can also be used for model selection: The best model would provide the correlation closest to 1.</p> Two Great Courses on Deep Learning and AItag:www.analyticbridge.datasciencecentral.com,2017-08-10:2004291:Topic:3693842017-08-10T23:03:42.725ZVincent Granvillehttps://www.analyticbridge.datasciencecentral.com/profile/VincentGranville
<p><strong>Deep Learning, Neural Networks and AI</strong></p>
<p>The course is a new one by <a href="https://www.coursera.org/instructor/andrewng" rel="noopener noreferrer" target="_blank"></a><span>Andrew Ng, Co-founder, Coursera; Adjunct Professor, Stanford University; formerly head of Baidu AI Group/Google Brain. It will start Aug 15. </span></p>
<p><span>About this course: <span>If you want to break into cutting-edge AI, this course will help you do so. Deep learning engineers are highly…</span></span></p>
<p><strong>Deep Learning, Neural Networks and AI</strong></p>
<p>The course is a new one by <a href="https://www.coursera.org/instructor/andrewng" target="_blank" rel="noopener noreferrer"></a><span>Andrew Ng, Co-founder, Coursera; Adjunct Professor, Stanford University; formerly head of Baidu AI Group/Google Brain. It will start Aug 15. </span></p>
<p><span>About this course: <span>If you want to break into cutting-edge AI, this course will help you do so. Deep learning engineers are highly sought after, and mastering deep learning will give you numerous new career opportunities. Deep learning is also a new "superpower" that will let you build AI systems that just weren't possible a few years ago.</span></span></p>
<p><span>In this course, you will learn the foundations of deep learning. When you finish this class, you will:</span></p>
<ul>
<li>Understand the major technology trends driving Deep Learning</li>
<li>Be able to build, train and apply fully connected deep neural networks</li>
<li>Know how to implement efficient (vectorized) neural networks</li>
<li>Understand the key parameters in a neural network's architecture</li>
</ul>
<p><strong>Deep Learning with TensorFlow</strong></p>
<p><span>To help make deep learning even more accessible to engineers and data scientists at large, Google has launched a free Deep Learning Course. This short, intensive course provides you with all the basic tools and vocabulary to get started with deep learning, and walks you through how to use it to address some of the most common machine learning problems. It is also accompanied by interactive TensorFlow notebooks that directly mirror and implement the concepts introduced in the lectures. </span></p>
<p><span>Links to these two courses <a href="http://www.datasciencecentral.com/profiles/blogs/two-great-courses-on-deep-learning-and-ai" target="_blank">are provided here</a>. </span></p>
<p><span><span><span class="font-size-4"><b>DSC Resources</b></span></span></span></p>
<ul>
<li>Services: <a href="http://careers.analytictalent.com/jobs/products">Hire a Data Scientist</a> | <a href="http://www.datasciencecentral.com/page/search?q=Python">Search DSC</a> | <a href="http://classifieds.datasciencecentral.com/">Classifieds</a> | <a href="http://www.analytictalent.com/">Find a Job</a></li>
<li>Contributors: <a href="http://www.datasciencecentral.com/profiles/blog/new">Post a Blog</a> | <a href="http://www.datasciencecentral.com/forum/topic/new">Ask a Question</a></li>
<li>Follow us: <a href="http://www.twitter.com/datasciencectrl">@DataScienceCtrl</a> | <a href="http://www.twitter.com/analyticbridge">@AnalyticBridge</a></li>
</ul>
<p><span>Popular Articles</span></p>
<ul>
<li><a href="http://www.datasciencecentral.com/profiles/blogs/difference-between-machine-learning-data-science-ai-deep-learning">Difference between Machine Learning, Data Science, AI, Deep Learnin...</a></li>
<li><a href="http://www.datasciencecentral.com/profiles/blogs/20-articles-about-core-data-science">What is Data Science? 24 Fundamental Articles Answering This Question</a></li>
<li><a href="http://www.datasciencecentral.com/profiles/blogs/hitchhiker-s-guide-to-data-science-machine-learning-r-python">Hitchhiker's Guide to Data Science, Machine Learning, R, Python</a></li>
<li><a href="http://www.datasciencecentral.com/profiles/blogs/advanced-machine-learning-with-basic-excel">Advanced Machine Learning with Basic Excel</a></li>
</ul> SPSS Statistics Conference (York, UK)tag:www.analyticbridge.datasciencecentral.com,2017-07-03:2004291:Topic:3664672017-07-03T14:35:44.157ZPeter Watsonhttps://www.analyticbridge.datasciencecentral.com/profile/PeterWatson
<p> <strong><em>ASSESS:IBM SPSS STATS USER GROUP</em></strong> <b><i>COURSES AND TALKS </i></b></p>
<p align="left"><b><i>UNIVERSITY OF YORK, UK </i></b></p>
<p align="left"><b><i>FRIDAY 10TH NOVEMBER 2017</i></b></p>
<p> </p>
<p>ASSESS is an independent user led group for IBM SPSS Statistics, a computer package for analysing and presenting data.</p>
<p> </p>
<p>The 31<sup>st</sup> annual IBM SPSS Statistics users group meeting is provisionally planned to be held at the…</p>
<p> <strong><em>ASSESS:IBM SPSS STATS USER GROUP</em></strong> <b><i>COURSES AND TALKS </i></b></p>
<p align="left"><b><i>UNIVERSITY OF YORK, UK </i></b></p>
<p align="left"><b><i>FRIDAY 10TH NOVEMBER 2017</i></b></p>
<p> </p>
<p>ASSESS is an independent user led group for IBM SPSS Statistics, a computer package for analysing and presenting data.</p>
<p> </p>
<p>The 31<sup>st</sup> annual IBM SPSS Statistics users group meeting is provisionally planned to be held at the Alcuin Research Resource Centre, University of York, on Friday 10<sup>th</sup> November 2017. Workshop topics are at <a href="http://spssusers.co.uk/Events/2017/index.html">http://spssusers.co.uk/Events/2017/index.html</a></p>
<p>The workshops will be taught in an interactive hands-on workshop-style format, with frequent examples. A full set of notes and example files will be given to all workshop attenders on memory sticks. There will also be handouts at the users talk sessions. The booking fee includes coffee breaks but not overnight accommodation. A buffet lunch is included for those attending <u>both</u> a morning and an afternoon event. Further details will be sent to delegates upon receipt of booking forms.</p>
<p></p>
<p align="center"><b>PC Lab Course Room 1</b></p>
<p align="center"><b>Programme</b></p>
<p><b>PARALLEL Sessions (ONE DAY COURSE 10am to 4-30pm)</b></p>
<p><b>An introduction to IBM SPSS Statistics for complete beginners by Keith Bentley, University of Salford</b></p>
<p><b>(Includes a buffet lunch and two coffee breaks mid-morning and mid-afternoon)</b></p>
<p align="center"><b>PC Lab Course Room 2</b></p>
<p align="center"><b>Programme</b></p>
<p><b>PARALLEL Sessions (ONE DAY COURSE 10am to 4-30pm)</b></p>
<p><b>Determining area types: by classification using k-means cluster analysis and by developing a deprivation index using SPSS by Paul Norman, University of Leeds</b></p>
<p><b>(Includes a buffet lunch and two coffee breaks mid-morning and mid-afternoon)</b></p>
<p align="center"><b> </b></p>
<p align="center"><b> </b></p>
<p align="center"><b> </b></p>
<p align="center"><b> </b></p>
<p align="center"><b>Auditorium talks**</b></p>
<p align="center"><b>Programme (in talk order)</b></p>
<p><b>Morning PARALLEL Session (10am to 12-50pm)</b></p>
<p> </p>
<p>¨ <b>An introduction to using Exploratory Factor Analysis in IBM SPSS Statistics</b> by Anne Laure Humbert, University of Cranfield</p>
<p> </p>
<p>¨ <b>Showcasing the use of Factor Analysis in data reduction: Research on learner support for In-service teachers</b> by Richard Ouma, University of York</p>
<p> </p>
<p>¨ <b>A Cluster Analysis of local authority data using IBM SPSS Statistics</b> by Kitty Lymperopoulou, Cathie Marsh Centre, University of Manchester</p>
<p><b> </b></p>
<p><b>Lunch</b></p>
<p><b>Afternoon PLENARY session (1-35pm – 2-05pm)</b></p>
<p><b> </b></p>
<p>¨ <b>A case study using item response theory methods in IBM SPSS</b> (recorded webinar) by Dusan Magula, IBM SPSS; This is a <u>plenary talk</u>.</p>
<p> </p>
<p><b>Afternoon PARALLEL session (2-10pm – 3-30pm)</b></p>
<p> </p>
<p>¨ <b>Oh, how the tables have turned – automating SPSS tables commands to create bulk tabulations, using the ELSA wave 7 dataset</b> by Migle Aleksejunaite and Tania Dimitrova, National Centre for Social Research</p>
<p> </p>
<p>¨ <b>An SPSS users clinic made up of experienced IBM SPSS users (make-up to be confirmed nearer the time). Attenders are invited to submit questions for the panel.</b></p>
<p><u> </u></p>
<p> </p>
<p>Note that for UK participants, NCRM bursaries of up to 500 pounds may be available to help fund your fees: Details at <a href="http://www.ncrm.ac.uk/TandE/bursary/">http://www.ncrm.ac.uk/TandE/bursary/</a></p>
<p> </p>
<p>____________________________</p>
<p>** The titles and order of events are subject to amendment</p>
<p></p>
<p> </p>
<p>BOOKING FORM</p>
<p><b><i>Important:</i></b></p>
<p>Bookings will not be treated as firm until a cheque or official (company) order, payable to ASSESS, is received. Note payment can also be made by BACS but not by credit card. Details on request. Please indicate if you require a receipt of payment. Deadline for bookings: Friday 27<sup>th</sup> October 2017. Please note we reserve the right to cancel the workshops and/or user sessions if there are insufficient numbers.</p>
<p>Name: ...................................................................... Tel: ...................................................</p>
<p>Email: ...................................................................... Fax: ...................................................</p>
<p> </p>
<p>Job Title: </p>
<p>Organization:............................................................................................................................ </p>
<p>Address: </p>
<p> </p>
<p> Postcode ...............................</p>
<p> </p>
<p><b>Only for those attending the users clinic in the afternoon</b>: If there is a question regarding IBM SPSS Statistics you would like to ask the panel please mention it here (alternatively questions may be submitted on the meeting day to the registration desk no later than 12-50pm on Friday 10th November 2017):</p>
<p> </p>
<p>………………………………………………………………………………………………..</p>
<p> </p>
<p>………………………………………………………………………………………………..</p>
<p> </p>
<p>If having lunch: Specify vegetarian or other dietary requirements, if any:</p>
<p>..................................................................................................................................................</p>
<p> </p>
<p> </p>
<table width="100%" cellspacing="0">
<tbody><tr><td> <div class="shape"> <p><b>The booking form continues on the next page</b></p>
</div>
</td>
</tr>
</tbody>
</table>
<p> </p>
<p></p>
<p> </p>
<p> </p>
<p></p>
<p><b>Booking form (continued)</b></p>
<p>Names of attendees and event(s) they wish to attend (Please tick as appropriate):</p>
<p> </p>
<table width="703" border="1" cellspacing="0">
<tbody><tr><td width="141" valign="top"><p> </p>
</td>
<td width="95" valign="top" colspan="3"><p align="center">Friday morning parallel sessions</p>
</td>
<td width="95" valign="top" colspan="3"><p align="center">Friday afternoon parallel sessions</p>
</td>
</tr>
<tr><td width="141" valign="top"><p><i>Names for badges</i> </p>
</td>
<td width="95" valign="top"><p>‘An introduction to IBM SPSS Statistics for complete beginners’ workshop</p>
<p>(am session)</p>
</td>
<td width="89" valign="top"><p align="center">‘Determining area types:by classification using k-means cluster analysis and by developing a deprivation index using SPSS’ workshop</p>
<p>(am session)</p>
</td>
<td width="86" valign="top"><p align="center">IBM SPSS Statistics Users Meeting Morning talks Session</p>
<p align="center"> </p>
<p align="center"> </p>
</td>
<td width="95" valign="top"><p>‘An introduction to IBM SPSS Statistics for complete beginners’ workshop</p>
<p>(pm session)</p>
</td>
<td width="86" valign="top"><p align="center">‘Determining area types:by classification using k-means cluster analysis and by developing a deprivation index using SPSS’ workshop</p>
<p align="center">(pm session)</p>
</td>
<td width="75" valign="top"><p align="center">IBM SPSS Statistics Users Meeting Afternoon talks Session & users clinic</p>
<p align="center"> </p>
</td>
</tr>
<tr><td width="141" valign="top"><p> </p>
</td>
<td width="95" valign="top"><p> </p>
</td>
<td width="89" valign="top"><p align="center"> </p>
</td>
<td width="86" valign="top"><p> </p>
</td>
<td width="95" valign="top"><p> </p>
</td>
<td width="86" valign="top"><p> </p>
</td>
<td width="75" valign="top"><p> </p>
</td>
</tr>
<tr><td width="141" valign="top"><p> </p>
</td>
<td width="95" valign="top"><p> </p>
</td>
<td width="89" valign="top"><p align="center"> </p>
</td>
<td width="86" valign="top"><p> </p>
</td>
<td width="95" valign="top"><p> </p>
</td>
<td width="86" valign="top"><p> </p>
</td>
<td width="75" valign="top"><p> </p>
</td>
</tr>
<tr><td width="141" valign="top"><p> </p>
</td>
<td width="95" valign="top"><p> </p>
</td>
<td width="89" valign="top"><p align="center"> </p>
</td>
<td width="86" valign="top"><p> </p>
</td>
<td width="95" valign="top"><p> </p>
</td>
<td width="86" valign="top"><p> </p>
</td>
<td width="75" valign="top"><p> </p>
</td>
</tr>
<tr><td width="141" valign="top"><p> </p>
</td>
<td width="95" valign="top"><p> </p>
</td>
<td width="89" valign="top"><p align="center"> </p>
</td>
<td width="86" valign="top"><p> </p>
</td>
<td width="95" valign="top"><p> </p>
</td>
<td width="86" valign="top"><p> </p>
</td>
<td width="75" valign="top"><p> </p>
</td>
</tr>
<tr><td width="141" valign="top"><p> </p>
</td>
<td width="95" valign="top"><p> </p>
</td>
<td width="89" valign="top"><p align="center"> </p>
</td>
<td width="86" valign="top"><p> </p>
</td>
<td width="95" valign="top"><p> </p>
</td>
<td width="86" valign="top"><p> </p>
</td>
<td width="75" valign="top"><p> </p>
</td>
</tr>
</tbody>
</table>
<p> </p>
<p>(Please enter the appropriate amounts on the registration form overleaf)</p>
<p>Cheque or official order enclosed for <b>____</b>GBP </p>
<p><b>For official orders please also give here the number and address for invoicing</b>:</p>
<p>.................................................................................................................................................</p>
<p><b>Return completed forms to Peter Watson, ASSESS, 15 Chaucer Road, Cambridge CB2 7EF by FRIDAY 27<sup>TH</sup> OCTOBER 2017.</b></p>
<p>Telephone enquiries about bookings: 01223 355294 or 01223 273712 (direct line) (has an answerphone). E-mail enquiries about bookings: peter.watson@mrc-cbu.cam.ac.uk (important:put ‘<b>ASSESS’</b> in the Subject field)</p>
<p> </p>
<table width="100%" cellspacing="0">
<tbody><tr><td> <div class="shape"> <p><b>The booking form continues on the next page</b></p>
</div>
</td>
</tr>
</tbody>
</table>
<p> </p>
<p></p>
<p><b> </b></p>
<p></p>
<p><b>Booking form (continued)</b> REGISTRATION FORM</p>
<table width="644" border="1" cellspacing="0">
<tbody><tr><td width="274" nowrap="nowrap" valign="bottom"><p> </p>
</td>
<td width="68" valign="top"><p align="center"><b>First <br/> delegate</b></p>
</td>
<td width="28"><p align="center"><b>ü</b></p>
</td>
<td width="104" valign="top"><p align="center"><b>Subsequent delegates</b></p>
</td>
<td width="47"><p align="center"><b>No.</b></p>
</td>
<td width="76" valign="top"><p align="center"><b>Students or</b></p>
<p align="center"><b>Retired</b></p>
</td>
<td width="47"><p align="center"><b>No.</b></p>
</td>
</tr>
<tr><td width="274" nowrap="nowrap"><p>Either ‘An introduction to IBM SPSS for complete beginners’ morning session <u>or</u> ‘Determining area types:by classification using k-means cluster analysis and by developing a deprivation index using SPSS’ workshop morning session</p>
</td>
<td width="68" nowrap="nowrap"><p align="center">£90</p>
</td>
<td width="28"><p align="center"> </p>
</td>
<td width="104" nowrap="nowrap"><p align="center">£70</p>
</td>
<td width="47"><p align="center"> </p>
</td>
<td width="76" nowrap="nowrap"><p align="center">£40</p>
</td>
<td width="47"><p> </p>
</td>
</tr>
<tr><td width="274" nowrap="nowrap"><p>Either ‘An introduction to IBM SPSS for complete beginners’ afternoon session <u>or</u> ‘Determining area types:by classification using k-means cluster analysis and by developing a deprivation index using SPSS’ workshop afternoon session</p>
</td>
<td width="68" nowrap="nowrap"><p align="center">£90</p>
</td>
<td width="28"><p align="center"> </p>
</td>
<td width="104" nowrap="nowrap"><p align="center">£70</p>
</td>
<td width="47"><p align="center"> </p>
</td>
<td width="76" nowrap="nowrap"><p align="center">£40</p>
</td>
<td width="47"><p> </p>
</td>
</tr>
<tr><td width="274" nowrap="nowrap"><p>One day workshop (includes lunch)</p>
</td>
<td width="68" nowrap="nowrap"><p align="center">£160</p>
</td>
<td width="28"><p align="center"> </p>
</td>
<td width="104" nowrap="nowrap"><p align="center">£130</p>
</td>
<td width="47"><p align="center"> </p>
</td>
<td width="76" nowrap="nowrap"><p align="center">£75</p>
</td>
<td width="47"><p> </p>
</td>
</tr>
<tr><td width="274" nowrap="nowrap"><p>Either Users Meeting morning talks or afternoon talks & IBM SPSS Statistics afternoon Users clinic session</p>
</td>
<td width="68" nowrap="nowrap"><p align="center">£50</p>
</td>
<td width="28"><p align="center"> </p>
</td>
<td width="104" nowrap="nowrap"><p align="center">£45</p>
</td>
<td width="47"><p align="center"> </p>
</td>
<td width="76" nowrap="nowrap"><p align="center">£25</p>
</td>
<td width="47"><p> </p>
</td>
</tr>
<tr><td width="274" nowrap="nowrap"><p><u>Both</u> Users Meeting morning talks and afternoon talks & IBM SPSS Statistics users clinic (includes lunch)</p>
</td>
<td width="68" nowrap="nowrap"><p align="center">£80</p>
</td>
<td width="28"><p align="center"> </p>
</td>
<td width="104" nowrap="nowrap"><p align="center">£60</p>
</td>
<td width="47"><p align="center"> </p>
</td>
<td width="76" nowrap="nowrap"><p align="center">£40</p>
</td>
<td width="47"><p> </p>
</td>
</tr>
<tr><td width="274" nowrap="nowrap"><p>A single Users meeting session (morning talks or afternoon talks & IBM SPSS Statistics clinic) plus a (morning or afternoon) workshop session (includes lunch)</p>
</td>
<td width="68" nowrap="nowrap"><p align="center">£150</p>
</td>
<td width="28"><p align="center"> </p>
</td>
<td width="104" nowrap="nowrap"><p align="center">£120</p>
</td>
<td width="47"><p align="center"> </p>
</td>
<td width="76" nowrap="nowrap"><p align="center">£70</p>
</td>
<td width="47"><p> </p>
</td>
</tr>
<tr><td width="274" nowrap="nowrap"><p>Lunch before or after a single Users Meeting session (morning talks or afternoon talks & IBM SPSS Statistics clinic)</p>
</td>
<td width="68" nowrap="nowrap"><p align="center">£10</p>
</td>
<td width="28"><p align="center"> </p>
</td>
<td width="104" nowrap="nowrap"><p align="center">£10</p>
</td>
<td width="47"><p align="center"> </p>
</td>
<td width="76" nowrap="nowrap"><p align="center">£5</p>
</td>
<td width="47"><p> </p>
</td>
</tr>
<tr><td width="274" nowrap="nowrap"><p>Lunch before or after a single workshop</p>
</td>
<td width="68" nowrap="nowrap"><p align="center">£10</p>
</td>
<td width="28"><p align="center"> </p>
</td>
<td width="104" nowrap="nowrap"><p align="center">£10</p>
</td>
<td width="47"><p align="center"> </p>
</td>
<td width="76" nowrap="nowrap"><p align="center">£5</p>
</td>
<td width="47"><p> </p>
</td>
</tr>
<tr><td width="274" nowrap="nowrap"><p><b>TOTALS</b></p>
</td>
<td width="68" nowrap="nowrap" colspan="2"><p><b>£</b></p>
</td>
<td width="151" nowrap="nowrap" colspan="2"><p><b>£</b></p>
</td>
<td width="123" nowrap="nowrap" colspan="2"><p><b>£</b></p>
</td>
</tr>
<tr><td width="274" nowrap="nowrap"><p><b>GRAND TOTAL</b></p>
</td>
<td width="68" nowrap="nowrap" colspan="4"><p> </p>
</td>
<td width="123" nowrap="nowrap" colspan="2"><p><b>£</b></p>
</td>
</tr>
</tbody>
</table>
<p><b>We are on facebook</b>: <a href="https://www.facebook.com/groups/assess.spssusers/">https://www.facebook.com/groups/assess.spssusers/</a></p> Java versus C++ (funny)tag:www.analyticbridge.datasciencecentral.com,2017-06-12:2004291:Topic:3658522017-06-12T17:19:21.008ZVincent Granvillehttps://www.analyticbridge.datasciencecentral.com/profile/VincentGranville
<p>I could not resist to post it. Note that I do not share the views expressed in this video.</p>
<p><iframe width="560" height="315" src="https://www.youtube.com/embed/pkdz5kFuLlo?wmode=opaque" frameborder="0" allowfullscreen=""></iframe>
</p>
<p>Enjoy!</p>
<p></p>
<p>I could not resist to post it. Note that I do not share the views expressed in this video.</p>
<p><iframe width="560" height="315" src="https://www.youtube.com/embed/pkdz5kFuLlo?wmode=opaque" frameborder="0" allowfullscreen=""></iframe>
</p>
<p>Enjoy!</p>
<p></p> Converting infinite series to finite series, a problem cited here in 2016, is 4000 years old.tag:www.analyticbridge.datasciencecentral.com,2017-06-11:2004291:Topic:3660432017-06-11T22:02:43.211ZMilo Gardnerhttps://www.analyticbridge.datasciencecentral.com/profile/MiloGardner
IEEE has long noted that Egyptians were first to convert base 10 to a form of binary arithmetic. The Old Kingdom seemed to round off base 10 numerals and rational numbers by throwing away units as large as 1/64. Nearby Babylonians rounded off much smaller units in its bad 60 numeration system that write 1/91 as 1/90.<br />
<br />
However, by 2050 BCE Egyotisn scribes formalized an exact numeration system as the Kahun Papyrus and Ahmes (RMP) 2/n tables report…
IEEE has long noted that Egyptians were first to convert base 10 to a form of binary arithmetic. The Old Kingdom seemed to round off base 10 numerals and rational numbers by throwing away units as large as 1/64. Nearby Babylonians rounded off much smaller units in its bad 60 numeration system that write 1/91 as 1/90.<br />
<br />
However, by 2050 BCE Egyotisn scribes formalized an exact numeration system as the Kahun Papyrus and Ahmes (RMP) 2/n tables report<br />
<br />
<a href="http://rmprectotable.blogspot.com/">http://rmprectotable.blogspot.com/</a><br />
<br />
had scaled n/p by LCM m/m to mn/mp by finding the best divisors of mn that summed to mn. Ahmes often recorded the divisors in red auxiliary numbers, before five-term or shorter unit fraction were recorde in a ciphered numeration system.<br />
<br />
Let me stop here. Does anyone wish to comment on this or deeper historical threads that show that scribes also used the modern number theiry property that division was inverse to multiplication and multiplication was inverse to division, literally?<br />
<br />
Best Regards to all. Yet Another Interesting Math Problem - The Collatz Conjecturetag:www.analyticbridge.datasciencecentral.com,2017-05-28:2004291:Topic:3645392017-05-28T22:39:18.343ZDecision Sciencehttps://www.analyticbridge.datasciencecentral.com/profile/DecisionSciences
<p>Take any positive integer <em>n</em>. If <em>n</em> is even, divide it by 2 to get <em>n</em> / 2. If <em>n</em> is odd, multiply it by 3 and add 1 to obtain 3<em>n</em> + 1. Repeat the process indefinitely. Does the sequence eventually reach 1, regardless of the initial value? For instance, if you start with the number <span>75,128,138,247, you eventually reach 1 after 1228 steps. If you start with the number 27, you climb as high as 9,232, but eventually reach 1 after 41…</span></p>
<p>Take any positive integer <em>n</em>. If <em>n</em> is even, divide it by 2 to get <em>n</em> / 2. If <em>n</em> is odd, multiply it by 3 and add 1 to obtain 3<em>n</em> + 1. Repeat the process indefinitely. Does the sequence eventually reach 1, regardless of the initial value? For instance, if you start with the number <span>75,128,138,247, you eventually reach 1 after 1228 steps. If you start with the number 27, you climb as high as 9,232, but eventually reach 1 after 41 steps.</span></p>
<p>This is supposed to be a very difficult problem. Note that if a sequence reaches any power of 2 (say, 64) or any intermediate number found in the trillions of trillions of such sequences that are known to reach 1, then the sequence in question will obviously reach 1 too. For a sequence not to reach 1, the first element (as well as any subsequent element) would have to be different from any initial or intermediate number found in any series identified as reaching 1 so far. This makes it highly unlikely, yet the conjecture has not been proved yet.</p>
<p><a href="http://api.ning.com:80/files/JPX5gKnxX7Kg0n0NNbuthrIVH6vlbrRvZs*a*9Do6EYGUaERueO2lERXbzLXwmfMjxtu0xt8MhVi5BJTaFjCTAew1mZAM2Zw/Capture.PNG" target="_self"><img width="193" class="align-center" src="http://api.ning.com:80/files/JPX5gKnxX7Kg0n0NNbuthrIVH6vlbrRvZs*a*9Do6EYGUaERueO2lERXbzLXwmfMjxtu0xt8MhVi5BJTaFjCTAew1mZAM2Zw/Capture.PNG?width=193"/></a></p>
<p></p>
<p>For more on this problem, as well as about the above picture linked to this problem, <a href="https://en.wikipedia.org/wiki/3x_%2B_1_problem" target="_blank">click here</a>. </p>
<p>It is interesting to note that if you replace the deterministic algorithm by a probabilistic one, for instance <em>n</em> becomes <em>n</em> / 2 with probability 0.5 and 3<em>n</em> + 1 with probability 0.5, then instead of reaching 1, you reach infinity. Also with the deterministic algorithm, if you replace 3<em>n</em> + 1 by 2<em>n</em> + 1, you would think that you would reach 1 even faster, but this is not the case: you reach 1 only if the initial value is a power of 2, and in all cases you eventually reach infinity. </p>
<p><strong>Possible proof</strong></p>
<p>If you want to prove (or disprove) this conjecture, a possible methodology is as follows. Let's recursively define f(<em>k</em>+1, <em>n</em>) = f(f(<em>k</em>, <em>n</em>)) for <em>k</em> = 0, 1, 2 and so on, with f(0, <em>n</em>) = <em>n</em>, and f(<em>x</em>) = <em>x</em> / 2 if <em>x</em> is even, 3<em>x</em> + 1 otherwise. The conjecture states that no matter the initial value <em>n</em>, there is always a number <em>k</em> (function of <em>n</em>) such that f(<em>k</em>, <em>n</em>) = 1: in short, you reach 1 after <em>k</em> steps.</p>
<p>Consider the following four cases, each occurring with a probability 0.25 (Mod stands for the <a href="https://en.wikipedia.org/wiki/Modulo_operation" target="_blank">modulo operator</a>):</p>
<ol>
<li>Mod(<em>n</em>, 4) = 0. Then f(2, <em>n</em>) = <em>n</em> / 4.</li>
<li>Mod(<em>n</em>, 4) = 1. Then f(3, <em>n</em>) = (3<em>n</em> + 1) / 4</li>
<li>Mod(<em>n</em>, 4) = 2. Then f(1, <em>n</em>) = <em>n</em> / 2.</li>
<li>Mod(<em>n</em>, 4) = 3. This case is broken down into two sub-cases, see below.</li>
</ol>
<p>The case Mod(<em>n</em>, 4) = 3 is broken down into the following two sub-cases, each occurring with probability 0.125:</p>
<ol>
<li>If Mod(<em>n</em>, 8) = 3 then f(2, <em>n</em>) = (3<em>n</em> + 1) / 2 and in this case we are back to case #2 above after 2 steps. </li>
<li>If Mod(<em>n</em>, 8) = 7 then f(2, <em>n</em>) = (3<em>n</em> + 1) / 2 and in this case we are back to case #4 above after 2 steps.</li>
</ol>
<p>In both sub-cases, the sequence has been increasing, though we know that if Mod(<em>n</em>, 8) = 3, it will go down a bit (but still stay a little above <em>n</em>, more specifically around 9<em>n</em> / 8) after 3 additional steps.</p>
<p>So it looks like on average, we are decreasing over time (thus the fact that we would eventually reach 1 seems likely), but the challenging case if when Mod(<em>n</em>, 4) = 3, and even more challenging when Mod(<em>n</em>, 8) = 7. Can we get stuck in a sequence where every two steps, the residual modulo 8 is equal to 7 (the worst case that makes the sequence grows at its fastest pace?) And for how many cycles can we get stuck in such a configuration? These are difficult issues to address if you want to prove this conjecture. </p>
<p>The problem might also be approximately modeled as some kind of <a href="https://en.wikipedia.org/wiki/Markov_chain" target="_blank">Markov chain</a>, with 5 different states corresponding to the first 3 cases and the 2 sub-cases discussed earlier. One single iteration in the Markov chain corresponds respectively to 2, 3, 1, 5, and 2 steps of the above algorithm, to reach the next local dip in value (if any). For <em>n</em> large enough, one iteration of the Markov chain is thus approximately as follows:</p>
<ul>
<li>we reduce <em>n</em> by 75% with probability 0.25 </li>
<li>we reduce <em>n</em> by 25% with probability 0.25 </li>
<li>we reduce <em>n</em> by 50% with probability 0.25</li>
<li>we increase <em>n</em> by 12.5% with probability 0.125</li>
<li>we increase <em>n</em> by 50% with probability 0.125</li>
</ul>
<p>It is easy to compute the probability p(<em>N)</em> that the initial value <em>n</em> will not be reduced after <em>N</em> iterations of the Markov chain, for any positive <em>N</em>. However, even for very large <em>N</em>, this probability is still strictly positive, albeit very close to zero. Also, it is not clear if the memory-less property of Markov chains is violated here, which would either invalidate this approach, or possibly make it more difficult to handle this problem. Most likely, if it results in a proof, it would be an heuristic one. </p>
<p><strong>Related articles</strong></p>
<ul>
<li><a href="http://www.analyticbridge.com/profiles/blogs/10-interesting-reads-for-math-geeks" target="_blank">10 Interesting Read for Math Geeks</a></li>
<li><a href="http://www.datasciencecentral.com/group/resources/forum/topics/best-kept-secret-about-data-science-competitions" target="_blank">Other math challenges</a></li>
</ul>