Everyone's Blog Posts - AnalyticBridge2019-03-25T01:44:53Zhttps://www.analyticbridge.datasciencecentral.com/profiles/blog/feed?xn_auth=noFascinating Developments in the Theory of Randomnesstag:www.analyticbridge.datasciencecentral.com,2019-03-21:2004291:BlogPost:3916312019-03-21T13:30:00.000ZVincent Granvillehttps://www.analyticbridge.datasciencecentral.com/profile/VincentGranville
<p>I present here some innovative results in my most recent research on stochastic processes. chaos modeling, and dynamical systems, with applications to Fintech, cryptography, number theory, and random number generators. While covering advanced topics, this article is accessible to professionals with limited knowledge in statistical or mathematical theory. It introduces new material not covered in my recent book (available …</p>
<p>I present here some innovative results in my most recent research on stochastic processes. chaos modeling, and dynamical systems, with applications to Fintech, cryptography, number theory, and random number generators. While covering advanced topics, this article is accessible to professionals with limited knowledge in statistical or mathematical theory. It introduces new material not covered in my recent book (available <a href="https://www.datasciencecentral.com/profiles/blogs/fee-book-applied-stochastic-processes" target="_blank" rel="noopener">here</a>) on applied stochastic processes. You don't need to read my book to understand this article, but the book is a nice complement and introduction to the concepts discussed here.</p>
<p>None of the material presented here is covered in standard textbooks on stochastic processes or dynamical systems. In particular, it has nothing to do with the classical logistic map or Brownian motions, though the systems investigated here exhibit very similar behaviors and are related to the classical models. This cross-disciplinary article is targeted to professionals with interests in statistics, probability, mathematics, machine learning, simulations, signal processing, operations research, computer science, pattern recognition, and physics. Because of its tutorial style, it should also appeal to beginners learning about Markov processes, time series, and data science techniques in general, offering fresh, off-the-beaten-path content not found anywhere else, contrasting with the material covered again and again in countless, identical books, websites, and classes catering to students and researchers alike. </p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/1529825331?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/1529825331?profile=RESIZE_710x" class="align-center"/></a></p>
<p>Some problems discussed here could be used by college professors in the classroom, or as original exam questions, while others are extremely challenging questions that could be the subject of a PhD thesis or even well beyond that level. This article constitutes (along with my book) a stepping stone in my endeavor to solve one of the biggest mysteries in the universe: are the digits of mathematical constants such as Pi, evenly distributed? To this day, no one knows if these digits even have a distribution to start with, let alone whether that distribution is uniform or not. Part of the discussion is about statistical properties of numeration systems in a non-integer base (such as the golden ratio base) and its applications. All systems investigated here, whether deterministic or not, are treated as stochastic processes, including the digits in question. They all exhibit strong chaos, albeit easily manageable due to their<span> </span><a href="https://en.wikipedia.org/wiki/Ergodicity" target="_blank" rel="noopener">ergodicity</a>. .</p>
<p>Interesting connections with the golden ratio, special polynomials, and other special mathematical constants, are discussed in section 2. Finally, all the analyses performed during this work were done in Excel. I share my spreadsheets in this article, as well as many illustration, and all the results are replicable.</p>
<p><a href="https://www.datasciencecentral.com/profiles/blogs/fascinating-new-results-in-the-theory-of-randomness" target="_blank" rel="noopener">Read the full article here</a>.</p>
<p><strong>Content of this article</strong></p>
<p>1. General framework, notations and terminology</p>
<ul>
<li>Finding the equilibrium distribution</li>
<li>Auto-correlation and spectral analysis</li>
<li>Ergodicity, convergence, and attractors</li>
<li>Space state, time state, and Markov chain approximations</li>
<li>Examples</li>
</ul>
<p>2. Case study</p>
<ul>
<li>First fundamental theorem</li>
<li>Second fundamental theorem</li>
<li>Convergence to equilibrium: illustration</li>
</ul>
<p>3. Applications</p>
<ul>
<li>Potential application domains</li>
<li>Example: the golden ratio process</li>
<li>Finding other useful b-processes</li>
</ul>
<p>4. Additional research topics</p>
<ul>
<li>Perfect stochastic processes</li>
<li>Characterization of equilibrium distributions (the attractors)</li>
<li>Probabilistic calculus and number theory, special integrals</li>
</ul>
<p>5. Appendix</p>
<ul>
<li>Computing the auto-correlation at equilibrium</li>
<li>Proof of the first fundamental theorem</li>
<li>How to find the exact equilibrium distribution</li>
</ul>
<p>6. Additional Resources</p>
<p><a href="https://www.datasciencecentral.com/profiles/blogs/fascinating-new-results-in-the-theory-of-randomness" target="_blank" rel="noopener">Read the full article here</a>.</p>How to Automatically Determine the Number of Clusters in your Data - and moretag:www.analyticbridge.datasciencecentral.com,2019-03-14:2004291:BlogPost:3912662019-03-14T00:00:00.000ZVincent Granvillehttps://www.analyticbridge.datasciencecentral.com/profile/VincentGranville
<p>Determining the number of clusters when performing unsupervised clustering is a tricky problem. Many data sets don't exhibit well separated clusters, and two human beings asked to visually tell the number of clusters by looking at a chart, are likely to provide two different answers. Sometimes clusters overlap with each other, and large clusters contain sub-clusters, making a decision not easy.</p>
<p>For instance, how many clusters do you see in the picture below? What is the optimum number…</p>
<p>Determining the number of clusters when performing unsupervised clustering is a tricky problem. Many data sets don't exhibit well separated clusters, and two human beings asked to visually tell the number of clusters by looking at a chart, are likely to provide two different answers. Sometimes clusters overlap with each other, and large clusters contain sub-clusters, making a decision not easy.</p>
<p>For instance, how many clusters do you see in the picture below? What is the optimum number of clusters? No one can tell with certainty, not AI, not a human being, not an algorithm. </p>
<p></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/1405294997?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/1405294997?profile=RESIZE_710x" class="align-center"/></a></p>
<p style="text-align: center;"><em>How many clusters here? (source: see<span> </span><a href="https://www.datasciencecentral.com/profiles/blogs/data-science-wizardry" target="_blank" rel="noopener">here</a>)</em></p>
<p>In the above picture, the underlying data suggests that there are three main clusters. But an answer such as 6 or 7, seems equally valid. </p>
<p>A number of empirical approaches have been used to determine the number of clusters in a data set. They usually fit into two categories:</p>
<ul>
<li>Model fitting techniques: an example is using a<span> </span><a href="https://www.datasciencecentral.com/profiles/blogs/decomposition-of-statistical-distributions-using-mixture-models-a" target="_blank" rel="noopener">mixture model</a> to fit with your data, and determine the optimum number of components; or use density estimation techniques, and test for the number of modes (see<span> </span><a href="https://www.datasciencecentral.com/profiles/blogs/a-plethora-of-original-underused-statistical-tests" target="_blank" rel="noopener">here</a>.) Sometimes, the fit is compared with that of a model where observations are uniformly distributed on the entire support domain, thus with no cluster; you may have to estimate the support domain in question, and assume that it is not made of disjoint sub-domains; in many cases, the convex hull of your data set, as an estimate of the support domain, is good enough. </li>
<li>Visual techniques: for instance, the silhouette or elbow rule (very popular.)</li>
</ul>
<p>In both cases, you need a criterion to determine the optimum number of clusters. In the case of the elbow rule, one typically uses the percentage of unexplained variance. This number is 100% with zero cluster, and it decreases (initially sharply, then more modestly) as you increase the number of clusters in your model. When each point constitutes a cluster, this number drops to 0. Somewhere in between, the curve that displays your criterion, exhibits an elbow (see picture below), and that elbow determines the number of clusters. For instance, in the chart below, the optimum number of clusters is 4.</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/1405610723?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/1405610723?profile=RESIZE_710x" class="align-center"/></a></p>
<p style="text-align: center;"><em>The elbow rule tells you that here, your data set has 4 clusters (elbow strength in red)</em></p>
<p>Good references on the topic are available. Some R functions are available too, for instance fviz_nbclust. However, I could not find in the literature, how the elbow point is explicitly computed. Most references mention that it is mostly hand-picked by visual inspection, or based on some predetermined but arbitrary threshold. In the next section, we solve this problem.</p>
<p><a href="https://www.datasciencecentral.com/profiles/blogs/how-to-automatically-determine-the-number-of-clusters-in-your-dat" target="_blank" rel="noopener">Read full article here</a>. </p>Deep Analytical Thinking and Data Science Wizardrytag:www.analyticbridge.datasciencecentral.com,2019-03-07:2004291:BlogPost:3913552019-03-07T20:46:51.000ZVincent Granvillehttps://www.analyticbridge.datasciencecentral.com/profile/VincentGranville
<p>Many times, complex models are not enough (or too heavy), or not necessary, to get great, robust, sustainable insights out of data. Deep analytical thinking may prove more useful, and can be done by people not necessarily trained in data science, even by people with limited coding experience. Here we explore what we mean by deep analytical thinking, using a case study, and how it works: combining craftsmanship, business acumen, the use and creation of tricks and rules of thumb, to provide…</p>
<p>Many times, complex models are not enough (or too heavy), or not necessary, to get great, robust, sustainable insights out of data. Deep analytical thinking may prove more useful, and can be done by people not necessarily trained in data science, even by people with limited coding experience. Here we explore what we mean by deep analytical thinking, using a case study, and how it works: combining craftsmanship, business acumen, the use and creation of tricks and rules of thumb, to provide sound answers to business problems. These skills are usually acquired by experience more than by training, and data science generalists (see<span> </span><a href="https://www.datasciencecentral.com/profiles/blogs/why-you-should-be-a-data-science-generalist" target="_blank" rel="noopener">here</a><span> </span>how to become one) usually possess them.</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/1308372685?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/1308372685?profile=RESIZE_710x" class="align-center"/></a></p>
<p>This article is targeted to data science managers and decision makers, as well as to junior professionals who want to become one at some point in their career. Deep thinking, unlike deep learning, is also more difficult to automate, so it provides better job security. Those automating deep learning are actually the new data science wizards, who can think out-of-the box. Much of what is described in this article is also data science wizardry, and not taught in standard textbooks nor in the classroom. By reading this tutorial, you will learn and be able to use these data science secrets, and possibly change your perspective on data science. Data science is like an iceberg: everyone knows and can see the tip of the iceberg (regression models, neural nets, cross-validation, clustering, Python, and so on, as presented in textbooks.) Here I focus on the unseen bottom, using a statistical level almost accessible to the layman, avoiding jargon and complicated math formulas, yet discussing a few advanced concepts. <span> </span></p>
<p><span><a href="https://www.datasciencecentral.com/profiles/blogs/data-science-wizardry" target="_blank" rel="noopener">Read full article here</a>. </span></p>
<p><strong>Content</strong></p>
<p>1. Case Study: The Problem</p>
<p>2. Deep Analytical Thinking</p>
<ul>
<li>Answering hidden questions</li>
<li>Business questions</li>
<li>Data questions</li>
<li>Metrics questions</li>
</ul>
<p>3. Data Science Wizardry</p>
<ul>
<li>Generic algorithm</li>
<li>Illustration with three different models</li>
<li>Results</li>
</ul>
<p>4. A few data science hacks</p>The graph analytics landscape 2019tag:www.analyticbridge.datasciencecentral.com,2019-02-27:2004291:BlogPost:3912352019-02-27T12:00:00.000ZElise Devauxhttps://www.analyticbridge.datasciencecentral.com/profile/EliseDevaux
<h1 style="text-align: left;"><span style="font-size: 12pt;">Read the part 1 - <a href="https://linkurio.us/blog/graphtech-ecosystem-2019-part-1-graph-databases/?utm_source=analyticsbridge&utm_medium=article&utm_content=07" rel="noopener" target="_blank">The graph database landscape</a></span></h1>
<h1 style="text-align: center;"><strong>The graph analytics landscape 2019</strong></h1>
<p><span>Graph analytics frameworks consist of a set of tools and methods developed to extract…</span></p>
<h1 style="text-align: left;"><span style="font-size: 12pt;">Read the part 1 - <a href="https://linkurio.us/blog/graphtech-ecosystem-2019-part-1-graph-databases/?utm_source=analyticsbridge&utm_medium=article&utm_content=07" target="_blank" rel="noopener">The graph database landscape</a></span></h1>
<h1 style="text-align: center;"><strong>The graph analytics landscape 2019</strong></h1>
<p><span>Graph analytics frameworks consist of a set of tools and methods developed to extract knowledge from data modeled as a graph. They are crucial for many applications because processing large datasets of complex connected data is computationally challenging. </span></p>
<p></p>
<p><span><a href="https://storage.ning.com/topology/rest/1.0/file/get/1213658813?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/1213658813?profile=RESIZE_710x" class="align-full"/><br/></a></span></p>
<h2><span style="font-size: 18pt;"><strong>A need for analytics at scale</strong></span></h2>
<p><span>The field of graph theory has spawned multiple algorithms on which analysts can rely on to find insights hidden in graph data. From Google’s famous </span><a href="https://en.wikipedia.org/wiki/PageRank"><span>PageRank algorithm</span></a><span> to traversal and path-finding algorithms or community detection algorithms, there are plenty of calculations available to get insights from graphs.</span></p>
<p><span>The graph database storage systems we mentioned in </span><a href="https://linkurio.us/blog/graphtech-ecosystem-2019-part-1-graph-databases/?utm_source=analyticsbridge&utm_medium=article&utm_content=07" target="_self">the previous article</a><span> are good at storing data as graphs, or at managing operations such as data retrieval, writing real-time queries or at local analysis. But they might fall short on graph analytics processing at scale. That’s where graph analytics frameworks step in. Shipping with common graph algorithms, processing engines and, sometimes, query languages, they handle online analytical processing and persist the results back into databases.<br/> <br/></span></p>
<h2><span style="font-size: 18pt;"><strong>Graph processing engines</strong></span></h2>
<p><span>The graph processing ecosystem offers various approaches to answer the challenges of graph analytics, and historical players occupy a large part of the market.</span></p>
<p><span><a href="https://storage.ning.com/topology/rest/1.0/file/get/1213658813?profile=original" target="_blank" rel="noopener"></a></span></p>
<p><span><a href="https://storage.ning.com/topology/rest/1.0/file/get/1213663551?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/1213663551?profile=RESIZE_710x" class="align-full"/></a></span></p>
<p><span>In 2010, Google led the way with the </span><a href="https://dl.acm.org/citation.cfm?id=1807184"><span>release of Pregel</span></a><span>, a “large-scale graph processing” framework. Several solutions followed, such as </span><a href="https://giraph.apache.org/"><span>Apache Giraph</span></a><span>, an open source graph processing system developed in 2012 by the Apache foundation. It leverages MapReduce implementation to process graphs and is the system used by Facebook to traverse its social graph. Other open source systems iterated on Google’s, for example, </span><a href="https://thegraphsblog.wordpress.com/the-graph-blog/mizan/"><span>Mizan </span></a><span>or </span><a href="http://infolab.stanford.edu/gps/"><span>GPS</span></a><span>.</span></p>
<p><span>Other systems, like </span><a href="https://github.com/GraphChi"><span>GraphChi</span></a><span> or </span><a href="http://www.powergraph.ru/en/soft/demo.asp"><span>PowerGraph Create</span></a><span>, were launched following GraphLab’s release in 2009. This system started as an open-source project at Carnegie Mellon University and is now known as </span><a href="https://turi.com/"><span>Turi</span></a><span>. </span></p>
<p><span>Oracle Lab developed </span><a href="https://www.oracle.com/technetwork/oracle-labs/parallel-graph-analytix/overview/index.html"><span>PGX</span></a><span> (Parallel Graph AnalytiX), a graph analysis framework including an analytics processing engine powering Oracle Big Data Spatial and Graph.</span></p>
<p><span>The distributed open source graph engine Trinity, presented in 2013 by Microsoft, is now known as </span><a href="https://www.graphengine.io/"><span>Microsoft Graph Engine</span></a><span>. </span><a href="https://spark.apache.org/graphx/"><span>GraphX</span></a><span>, introduced in 2014, is the embedded graph processing framework built on top of </span><a href="https://spark.apache.org/"><span>Apache Spark</span></a><span> for parallel computed. Some other systems have since been introduced, for example, </span><a href="https://github.com/uzh/signal-collect"><span>Signal/Collect</span></a><span>.<br/> <br/></span></p>
<h2><span style="font-size: 18pt;"><strong>Graph analytics libraries and toolkit</strong></span></h2>
<p><span>In the graph analytics landscape, there are also single-users systems dedicated to graph analytics. Graph analytics libraries and toolkit provide implementations of numbers of algorithms from graph theory.<br/> <br/></span></p>
<p><span><a href="https://storage.ning.com/topology/rest/1.0/file/get/1213665836?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/1213665836?profile=RESIZE_710x" class="align-full"/></a></span></p>
<p><span><a href="https://storage.ning.com/topology/rest/1.0/file/get/1213663551?profile=original" target="_blank" rel="noopener"></a></span></p>
<p></p>
<p><span>There are standalone libraries such as </span><a href="https://networkx.github.io/"><span>NetworkX</span></a><span> and </span><a href="https://networkit.github.io/"><span>NetworKit</span></a><span>, python libraries for large-scale graph analysis, or </span><a href="https://igraph.org/redirect.html"><span>iGraph</span></a><span>, a graph library written in C and available as Python and R packages, and library provided by graph database vendors such as Neo4j with its </span><a href="https://neo4j.com/graph-machine-learning-algorithms/"><span>Graph Algorithms Library</span></a><span>.</span></p>
<p><span>Other technology vendors offer graph analytics libraries for high performance graph analytics. It is the case of the GPU technology provider NVIDIA with its </span><a href="https://developer.nvidia.com/nvgraph"><span>NVGraph library</span></a><span>. The geographic information software QGIS also built its own </span><a href="https://docs.qgis.org/testing/en/docs/pyqgis_developer_cookbook/network_analysis.html#graph-analysis"><span>library for network analysis</span></a><span>.</span></p>
<p><span>Some of these libraries also propose graph visualization tools to help users build graph data exploration interfaces, but this is a topic for the third post of this series.<br/> <br/></span></p>
<h2><span style="font-size: 18pt;"><strong>Graph query languages</strong></span></h2>
<p><span>Finally, one important piece of analytics frameworks that was not mentioned yet: graph query languages.</span></p>
<p><span>As for any storage system, query languages are an essential element for graph databases. These languages make it possible to model the data as a graph, and their logic is very close to the graph data model. In addition to the data modeling process, graph query languages are used to query data. Depending on their nature they can be used against databases systems and as domain-specific analytics language. Most of the high-level computing engines allow users to write using these query languages.</span></p>
<p></p>
<p><span><a href="https://storage.ning.com/topology/rest/1.0/file/get/1213668117?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/1213668117?profile=RESIZE_710x" class="align-full"/></a></span></p>
<p><a href="https://neo4j.com/developer/cypher-query-language/"><span>Cypher</span></a><span> was created in 2011 by Neo4j to use on their own database. It has been </span><a href="https://neo4j.com/blog/open-cypher-sql-for-graphs/"><span>open-sourced in 2015</span></a><span> as a separate project named </span><a href="https://www.opencypher.org/"><span>OpenCypher</span></a><span>. Other notable graph query languages are: </span><a href="https://tinkerpop.apache.org/gremlin.html"><span>Gremlin</span></a><span> the graph traversal language of Apache TinkerPop query language created in 2009 or </span><a href="https://jena.apache.org/tutorials/sparql.html"><span>SPARQL</span></a><span>, the SQL-like language created by the W3C to query RDF graphs in 2008. More recently, TigerGraph developed its own graph query language name </span><a href="https://www.tigergraph.com/2018/05/22/crossing-the-chasm-eight-prerequisites-for-a-graph-query-language/"><span>GSQL</span></a><span> and Oracle created </span><a href="http://pgql-lang.org/"><span>PGQL</span></a><span>, both SQL-like graph query languages. </span><a href="https://arxiv.org/abs/1712.01550"><span>G-Core</span></a><span> was proposed by the Linked Data Benchmark Council (LDBC) in 2018 as a language bridging the academic and industrial worlds. Other vendors such as OrientDB went for the </span><a href="https://orientdb.com/docs/2.0/orientdb.wiki/Tutorial-SQL.html"><span>relational query language SQL</span></a><span>.</span></p>
<p><span>Last year, Neo4j launched an initiative to unify Cypher, PGQL and G-Core under a single standard graph query language: </span><a href="https://gql.today/"><span>GQL (Graph Query Language)</span></a><span>. The initiative will be discussed during a </span><a href="https://www.w3.org/Data/events/data-ws-2019/"><span>W3C workshop in March 2019</span></a><span>. Some other query languages are especially dedicated to graph analysis such as </span><a href="https://github.com/socialite-lang/socialite"><span>SociaLite</span></a><span>.</span></p>
<p><span>While not originally a graph query language, Facebook’s </span><a href="https://graphql.org/"><span>GraphQL</span></a><span> is worth mentioning. This API language has been extended by graph database vendors to use as a graph query language. </span><a href="https://docs.dgraph.io/master/query-language/"><span>Dgraph uses it natively</span></a><span>as its query language, Prisma is planning to </span><a href="https://www.prisma.io/features/databases"><span>extend it to various graph databases</span></a><span> and Neo4j has been pushing it into </span><a href="https://grandstack.io/"><span>GRANDstack</span></a><span> and its query execution layer </span><a href="https://github.com/neo4j-graphql/neo4j-graphql-js"><span>neo4j-graphql.js</span></a><span>.<br/> <br/></span></p>
<p>This article was originally posted on <a href="https://linkurio.us/blog/graphtech-ecosystem-2019-part-2-graph-analytics/?utm_source=analyticsbridge&utm_medium=article&utm_content=07" target="_blank" rel="noopener">Linkurious blog</a>. It is part of a series of articles about the GraphTech ecosystem. This is the second part. It covers the graph analytics landscape. The first part introduced the <a href="https://linkurio.us/blog/graphtech-ecosystem-2019-part-1-graph-databases/?utm_source=analyticsbridge&utm_medium=article&utm_content=07" target="_blank" rel="noopener">graph database vendors</a>.</p>New Perspectives on Statistical Distributions and Deep Learningtag:www.analyticbridge.datasciencecentral.com,2019-02-23:2004291:BlogPost:3909252019-02-23T18:00:00.000ZVincent Granvillehttps://www.analyticbridge.datasciencecentral.com/profile/VincentGranville
<p>In this data science article, emphasis is placed on<span> </span><em>science</em>, not just on data. State-of-the art material is presented in simple English, from multiple perspectives: applications, theoretical research asking more questions than it answers, scientific computing, machine learning, and algorithms. I attempt here to lay the foundations of a new statistical technology, hoping that it will plant the seeds for further research on a topic with a broad range of potential…</p>
<p>In this data science article, emphasis is placed on<span> </span><em>science</em>, not just on data. State-of-the art material is presented in simple English, from multiple perspectives: applications, theoretical research asking more questions than it answers, scientific computing, machine learning, and algorithms. I attempt here to lay the foundations of a new statistical technology, hoping that it will plant the seeds for further research on a topic with a broad range of potential applications. It is based on mixture models. Mixtures have been studied and used in applications for a long time, and it is still a subject of active research. Yet you will find here plenty of new material.</p>
<p><span><strong>Introduction and Context</strong></span></p>
<p>In a previous article (see<span> </span><a href="https://www.datasciencecentral.com/profiles/blogs/new-perspective-on-central-limit-theorem-and-related-stats-topics" target="_blank" rel="noopener">here</a>) I attempted to approximate a random variable representing real data, by a weighted sum of simple<span> </span><em>kernels</em><span> </span>such as uniformly and independently, identically distributed random variables. The purpose was to build Taylor-like series approximations to more complex models (each term in the series being a random variable), to</p>
<ul>
<li>avoid over-fitting,</li>
<li>approximate any empirical distribution (the inverse of the percentiles function) attached to real data,</li>
<li>easily compute data-driven confidence intervals regardless of the underlying distribution,</li>
<li>derive simple tests of hypothesis,</li>
<li>perform model reduction, </li>
<li>optimize data binning to facilitate feature selection, and to improve visualizations of histograms</li>
<li>create perfect histograms,</li>
<li>build simple density estimators,</li>
<li>perform interpolations, extrapolations, or predictive analytics,</li>
<li>perform clustering and detect the number of clusters,</li>
<li>create deep learning Bayesian systems.</li>
</ul>
<p>Why I've found very interesting properties about stable distributions during this research project, I could not come up with a solution to solve all these problems. The fact is that these weighed sums would usually converge (in distribution) to a normal distribution if the weights did not decay too fast -- a consequence of the central limit theorem. And even if using uniform kernels (as opposed to Gaussian ones) with fast-decaying weights, it would converge to an almost symmetrical, Gaussian-like distribution. In short, very few real-life data sets could be approximated by this type of model.</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/1187940877?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/1187940877?profile=RESIZE_710x" class="align-center"/></a></p>
<p>Now, in this article, I offer a full solution, using mixtures rather than sums. The possibilities are endless. </p>
<p><span style="font-size: 14pt;"><strong>Content of this article</strong></span></p>
<p><strong>1. Introduction and Context</strong></p>
<p><strong>2. Approximations Using Mixture Models</strong></p>
<ul>
<li>The error term</li>
<li>Kernels and model parameters</li>
<li>Algorithms to find the optimum parameters</li>
<li>Convergence and uniqueness of solution</li>
<li>Find near-optimum with fast, black-box step-wise algorithm</li>
</ul>
<p><strong>3. Example</strong></p>
<ul>
<li>Data and source code</li>
<li>Results</li>
</ul>
<p><strong>4. Applications</strong></p>
<ul>
<li>Optimal binning</li>
<li>Predictive analytics</li>
<li>Test of hypothesis and confidence intervals</li>
<li>Deep learning: Bayesian decision trees</li>
<li>Clustering</li>
</ul>
<p><strong>5. Interesting problems</strong></p>
<ul>
<li>Gaussian mixtures uniquely characterize a broad class of distributions</li>
<li>Weighted sums fail to achieve what mixture models do</li>
<li>Stable mixtures</li>
<li>Nested mixtures and Hierarchical Bayesian Systems</li>
<li>Correlations</li>
</ul>
<p>Read full article <a href="https://www.datasciencecentral.com/profiles/blogs/decomposition-of-statistical-distributions-using-mixture-models-a" target="_blank" rel="noopener">here</a>. </p>A Plethora of Original, Not Well-Known Statistical Teststag:www.analyticbridge.datasciencecentral.com,2019-02-14:2004291:BlogPost:3909012019-02-14T02:00:00.000ZVincent Granvillehttps://www.analyticbridge.datasciencecentral.com/profile/VincentGranville
<p>Many of the following statistical tests are rarely discussed in textbooks or in college classes, much less in data camps. Yet they help answer a lot of different and interesting questions. I used most of them without even computing the underlying distribution under the null hypothesis, but instead, using simulations to check whether my assumptions were plausible or not. In short, my approach to statistical testing is is model-free, data-driven. Some are easy to implement even in Excel. Some…</p>
<p>Many of the following statistical tests are rarely discussed in textbooks or in college classes, much less in data camps. Yet they help answer a lot of different and interesting questions. I used most of them without even computing the underlying distribution under the null hypothesis, but instead, using simulations to check whether my assumptions were plausible or not. In short, my approach to statistical testing is is model-free, data-driven. Some are easy to implement even in Excel. Some of them are illustrated here, with examples that do not require statistical knowledge for understanding or implementation.</p>
<p>This material should appeal to managers, executives, industrial engineers, software engineers, operations research professionals, economists, and to anyone dealing with data, such as biometricians, analytical chemists, astronomers, epidemiologists, journalists, or physicians. Statisticians with a different perspective are invited to discuss my methodology and the tests described here, in the comment section at the bottom of this article. In my case, I used these tests mostly in the context of experimental mathematics, which is a branch of data science that few people talk about. In that context, the theoretical answer to a statistical test is sometimes known, making it a great benchmarking tool to assess the power of these tests, and determine the minimum sample size to make them valid.</p>
<p>I provide here a general overview, as well as my simple approach to statistical testing, accessible to professionals with little or no formal statistical training. Detailed applications of these tests are found<span> </span><a href="https://www.datasciencecentral.com/profiles/blogs/fee-book-applied-stochastic-processes" target="_blank" rel="noopener">in my recent book</a> and in<span> </span><a href="https://www.datasciencecentral.com/profiles/blogs/new-perspective-on-central-limit-theorem-and-related-stats-topics" target="_blank" rel="noopener">this article</a>. Precise references to these documents are provided as needed, in this article.</p>
<p></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/1048765124?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/1048765124?profile=RESIZE_710x" class="align-center"/></a></p>
<p style="text-align: center;"><span><em>Examples of traditional tests</em></span></p>
<p><span><strong>1. General Methodology</strong></span></p>
<p>Despite my strong background in statistical science, over the years, I moved away from relying too much on traditional statistical tests and statistical inference. I am not the only one: these tests have been abused and misused, see for instance<span> </span><a href="https://www.datasciencecentral.com/profiles/blogs/statistical-significance-and-p-values-take-another-blow" target="_blank" rel="noopener">this article</a><span> </span>on<span> </span><em>p</em>-hacking. Instead, I favored a methodology of my own, mostly empirical, based on simulations, data- rather than model-driven. It is essentially a non-parametric approach. It has the advantage of being far easier to use, implement, understand, and interpret, especially to the non-initiated. It was initially designed to be integrated in black-box, automated decision systems. Here I share some of these tests, and many can be implemented easily in Excel. </p>
<p><em><a href="https://www.datasciencecentral.com/profiles/blogs/a-plethora-of-original-underused-statistical-tests" target="_blank" rel="noopener">Read the full article here</a>. </em></p>Machine Learning Glossarytag:www.analyticbridge.datasciencecentral.com,2019-02-12:2004291:BlogPost:3909852019-02-12T19:31:40.000ZVincent Granvillehttps://www.analyticbridge.datasciencecentral.com/profile/VincentGranville
<p><span>For background to this post, please see </span><a href="https://www.datasciencecentral.com/profiles/blogs/learn-machinelearning-coding-basics-in-a-weekend-a-new-approach" rel="nofollow noopener" target="_blank">Learn Machine Learning Coding Basics in a weekend</a><span>. Here,we present the glossary that we use for the coding and the mindmap attached to these classes and upcoming book. About 80 terms are included in the glossary, covering Ensembles, Regression, Classification,…</span></p>
<p><span>For background to this post, please see </span><a rel="nofollow noopener" href="https://www.datasciencecentral.com/profiles/blogs/learn-machinelearning-coding-basics-in-a-weekend-a-new-approach" target="_blank">Learn Machine Learning Coding Basics in a weekend</a><span>. Here,we present the glossary that we use for the coding and the mindmap attached to these classes and upcoming book. About 80 terms are included in the glossary, covering Ensembles, Regression, Classification, Algorithms, Training, Validation, Model Evaluation and more. For instance, the section about Classification contains the following entries:</span></p>
<ul>
<li>Class </li>
<li>Hyperplane </li>
<li>Decision Boundary </li>
<li>False Negative (FN) </li>
<li>False Positive (FP) </li>
<li>True Negative (TN) </li>
<li>True Positive (TP) </li>
<li>Precision </li>
<li>Recall </li>
<li>F1 Score </li>
<li>Few-Shot Learning </li>
<li>Hinge Loss </li>
<li>Log Loss </li>
</ul>
<p>To download the glossary, <a href="https://www.datasciencecentral.com/profiles/blogs/learn-machinelearning-coding-basics-in-a-weekend-glossary-and" target="_blank" rel="noopener">follow this link</a>. </p>
<p><span style="font-size: 12pt;"><strong>DSC Resources</strong></span></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/new-books-and-resources-for-dsc-members">Free Books</a><br/><a href="https://www.datasciencecentral.com/forum"></a></li>
<li><a href="https://www.datasciencecentral.com/forum">Forum Discussions</a><br/><a href="https://www.datasciencecentral.com/page/search?q=cheat+sheets"></a></li>
<li><a href="https://www.datasciencecentral.com/page/search?q=cheat+sheets">Cheat Sheets</a><br/><a href="https://analytictalent.com/"></a></li>
<li><a href="https://www.analytictalent.datasciencecentral.com/">Jobs</a></li>
<li><a href="https://www.datasciencecentral.com/page/search?q=one+picture" target="_blank" rel="noopener">Search DSC</a></li>
<li><a href="https://twitter.com/DataScienceCtrl" target="_self">DSC on Twitter</a></li>
<li><a href="https://www.facebook.com/DataScienceCentralCommunity/" target="_blank" rel="noopener">DSC on Facebook</a></li>
</ul>Alternatives to Logistic Regressiontag:www.analyticbridge.datasciencecentral.com,2019-02-07:2004291:BlogPost:3909802019-02-07T22:23:19.000ZVincent Granvillehttps://www.analyticbridge.datasciencecentral.com/profile/VincentGranville
<p><strong>Logistic regression (LR)</strong><span> </span>models estimate the probability of a binary response, based on one or more predictor variables. Unlike linear regression models, the dependent variables are categorical. LR has become very popular, perhaps because of the wide availability of the procedure in software. Although LR is a good choice for many situations, it doesn't work well for<span> </span><em>all</em><span> </span>situations. For example:</p>
<ul>
<li>In propensity score…</li>
</ul>
<p><strong>Logistic regression (LR)</strong><span> </span>models estimate the probability of a binary response, based on one or more predictor variables. Unlike linear regression models, the dependent variables are categorical. LR has become very popular, perhaps because of the wide availability of the procedure in software. Although LR is a good choice for many situations, it doesn't work well for<span> </span><em>all</em><span> </span>situations. For example:</p>
<ul>
<li>In propensity score analysis where there are many covariates, LR performs poorly.</li>
<li>For classifications, LR usually requires more variables than to achieve the same (or better) misclassification rate than Support Vector Machines (SVM) for multivariate and mixture distributions.</li>
</ul>
<p><br/>In addition, LR is prone to issues like overfitting and multicollinearity.</p>
<p></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/995605995?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/995605995?profile=RESIZE_710x" class="align-center"/></a></p>
<p></p>
<p>A<span> </span><strong>wide range of alternatives</strong><span> </span>are available, from statistics-based procedures (e.g. log binomial, ordinary or modified Poisson regression and Cox regression) to those rooted more deeply in data science such as machine learning and neural network theory. Which one you choose depends largely on what tools you have available to you, what theory (e.g. statistics vs. neural networks) you want to work with, and what you're trying to achieve with your data. For example, tree-based methods are a good alternative for assessing risk factors, while Neural Networks (NN) and Support Vector Machines (SVM) work well for propensity score estimation and Categorization/Classification.</p>
<p>There are literally hundreds of viable alternatives to logistic regression, so it isn't possible to discuss them all within the confines of a single blog post. What follows is an outline of some of the more popular choices.</p>
<p><em>Read the full article, <a href="https://www.datasciencecentral.com/profiles/blogs/alternatives-to-logistic-regression" target="_blank" rel="noopener">here</a>. </em></p>From Infinite Matrices to New Integration Formulatag:www.analyticbridge.datasciencecentral.com,2019-02-04:2004291:BlogPost:3910832019-02-04T00:30:00.000ZVincent Granvillehttps://www.analyticbridge.datasciencecentral.com/profile/VincentGranville
<p>This is another interesting problem, off-the-beaten-path. It ends up with a formula to compute the integral of a function, based on its derivatives solely. </p>
<p>For simplicity, I'll start with some notations used in the context of matrix theory, familiar to everyone: T(<em>f</em>) = <em>g</em>, where <em>f</em> and <em>g</em> are vectors, and T a square matrix. The notation T(<em>f</em>) represents the product between the matrix T, and the vector <em>f</em>. Now, imagine that the…</p>
<p>This is another interesting problem, off-the-beaten-path. It ends up with a formula to compute the integral of a function, based on its derivatives solely. </p>
<p>For simplicity, I'll start with some notations used in the context of matrix theory, familiar to everyone: T(<em>f</em>) = <em>g</em>, where <em>f</em> and <em>g</em> are vectors, and T a square matrix. The notation T(<em>f</em>) represents the product between the matrix T, and the vector <em>f</em>. Now, imagine that the dimensions are infinite, with <em>f</em> being a vector whose entries represent all the real numbers in some peculiar order. </p>
<p>In mathematical analysis, T is called an operator, mapping all real numbers (represented by the vector <em>f</em>) onto another infinite vector <em>g</em>. In other words, <em>f</em> and <em>g</em> can be viewed as real-valued functions, and T transforms the function <em>f</em> into a new function <em>g</em>. A simple case is when T is the derivative operator, transforming any function <em>f</em> into its derivative <em>g</em> = d<i>f/</i>d<i>x</i>. We define the powers of T as T^0 = I (the identity operator, with I(<em>f</em>) = <em>f</em>), T^2(<em>f</em>) = T(T(<em>f</em>)), T^3(<em>f</em>) = T(T^2(<em>f</em>)) and so on, just like the powers of a square matrix. Now let the fun begins.</p>
<p><strong>Exponential of the Derivative Operator</strong></p>
<p>We assume here that T is the derivative operator. Using the same notation as above, we have the same formula as if T was a matrix:</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/954724656?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/954724656?profile=RESIZE_710x" class="align-center"/></a></p>
<p>Applied to a function <em>f</em>, we have:</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/957090133?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/957090133?profile=RESIZE_710x" class="align-center"/></a></p>
<p>This is a simple application of Taylor series. So the exponential of the derivative operator is a shift operator.</p>
<p><strong>Inverse of the Derivative Operator</strong></p>
<p>Likewise, as for matrices, we can define the inverse of T as</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/954798699?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/954798699?profile=RESIZE_710x" class="align-center"/></a></p>
<p>If T was a matrix, the condition for convergence is that <span>all of the eigenvalues of T - I have absolute value smaller than 1.</span> For the derivative operator T applied to a function <em>f</em>, and under some conditions that guarantee convergence, it is easy to show that</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/954846726?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/954846726?profile=RESIZE_710x" class="align-center"/></a></p>
<p>The coefficients (for instance 1, -4, 6, -4, 1 in the last term displayed above) are just the binomial coefficients, with alternating signs.</p>
<p>We call the inverse of the derivative operator, the <em>pseudo-integral</em> operator. It is easy to prove that the pseudo-integral operator (as defined above), applied to the exponential function, yields the exponential function itself. So the exponential function is a fixed point (the only continuous one) of the pseudo-integral operator. More interestingly, in this case, the pseudo-integral operator is just the standard integral operator: they are both the same. Is this always the case regardless of the function <em>f</em>? It turns out that this is true for any function <em>f</em> that can be written as </p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/954958880?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/954958880?profile=RESIZE_710x" class="align-center"/></a></p>
<p>This covers a large class of functions, especially since the coefficients can also be complex numbers. These functions usually have a Taylor series expansion too. However, it does not apply to functions such as polynomials, due to lack of convergence of the formula, in that case.</p>
<p>In short, we have found a formula to compute the integral of a function, based solely on the function itself and its successive derivatives. The same technique can be used to invert more complicated linear operators, such as Laplace transforms.</p>
<p><strong>Exercise</strong></p>
<p>Apply the derivative operator to the pseudo-integral of a function <em>f</em>, using the above formula for the pseudo-integral. The result should be equal to <em>f</em>. This is the case if <em>f</em> belongs to the same family of functions as described above. Can you identify functions not belonging to that family of functions, for which the theory is still valid? Hint: try <em>f</em>(<em>x</em>) = exp(<i>b</i> <em>x</em>^2) or <em>f</em>(<em>x</em>) = <em>x</em> exp(<em>b</em> <em>x</em>), where <i>b</i> is a parameter.</p>
<p><em>To not miss this type of content in the future,<span> </span><a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter">subscribe</a><span> </span>to our newsletter. For related articles from the same author, <a href="http://www.datasciencecentral.com/profiles/blogs/my-data-science-machine-learning-and-related-articles" target="_blank" rel="noopener">click here</a><span> </span>or visit<span> </span><a href="http://www.vincentgranville.com/" target="_blank" rel="noopener">www.VincentGranville.com</a>. Follow me on<span> </span><a href="https://www.linkedin.com/in/vincentg/" target="_blank" rel="noopener">on LinkedIn</a>, or visit my old web page<span> </span><a href="http://www.datashaping.com">here</a>.</em></p>
<p><span style="font-size: 14pt;"><b>DSC Resources</b></span></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/new-books-and-resources-for-dsc-members">Book and Resources for DSC Members</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/comprehensive-repository-of-data-science-and-ml-resources">Comprehensive Repository of Data Science and ML Resources</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/advanced-machine-learning-with-basic-excel">Advanced Machine Learning with Basic Excel</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/difference-between-machine-learning-data-science-ai-deep-learning">Difference between ML, Data Science, AI, Deep Learning, and Statistics</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/my-data-science-machine-learning-and-related-articles">Selected Business Analytics, Data Science and ML articles</a></li>
<li><a href="http://careers.analytictalent.com/jobs/products">Hire a Data Scientist</a><span> </span>|<span> </span><a href="http://www.datasciencecentral.com/page/search?q=Python">Search DSC</a><span> </span>|<span> </span><a href="http://www.analytictalent.com">Find a Job</a></li>
<li><a href="http://www.datasciencecentral.com/profiles/blog/new">Post a Blog</a><span> </span>|<span> </span><a href="http://www.datasciencecentral.com/forum/topic/new">Forum Questions</a></li>
</ul>
<p><span>Follow us: </span><a href="https://twitter.com/DataScienceCtrl">Twitter</a><span> | </span><a href="https://www.facebook.com/DataScienceCentralCommunity/">Facebook</a></p>Top 10 Technology Trends of 2019tag:www.analyticbridge.datasciencecentral.com,2019-01-29:2004291:BlogPost:3911522019-01-29T18:43:29.000ZVincent Granvillehttps://www.analyticbridge.datasciencecentral.com/profile/VincentGranville
<p class="justifyfull" dir="ltr"><span>First days after the celebration of the New Year is the time when looking back we can analyze our actions, promises and draw conclusions whether our predictions and expectations came true. As 2018 came to its end, it is perfect time to analyze it and to set trends for the next year. The amount of data generated every minute is enormous. Therefore new approaches, techniques, and solutions have been developed.…</span></p>
<p class="justifyfull" dir="ltr"><span>First days after the celebration of the New Year is the time when looking back we can analyze our actions, promises and draw conclusions whether our predictions and expectations came true. As 2018 came to its end, it is perfect time to analyze it and to set trends for the next year. The amount of data generated every minute is enormous. Therefore new approaches, techniques, and solutions have been developed.</span></p>
<p class="justifyfull" dir="ltr"><span>Looking back to our article</span><span> </span><a rel="nofollow noopener" href="https://www.datasciencecentral.com/profiles/blogs/blog/the-top-10-technology-trends-of-2018/" target="_blank"><span>Top 10 Technology Trends of 2018</span></a><span> </span><span>we can say that we were preparing you for the upcoming changes related to aspects of security, changes provoked by the AI in business operations, extensive application of blockchains, further development of the Internet of Things (IoT), growing of NLP, etc. Some of these statements have been implemented in 2018, yet some will remain topical in 2019 as well. Only one factor remains stable - development. There is no doubt, the technologies will continue to develop, improve and upgrade to fit their purposes better.</span></p>
<p class="justifyfull" dir="ltr"><span>Primarily smart data technologies were actively applied only by huge enterprises and corporations. Today, big data has become available to a wide range of small businesses and companies. Both big enterprises and small companies tend to rely on big data in the questions of the intelligent business insights in their decision-making.</span></p>
<p class="justifyfull" dir="ltr"><span>The ever-growing stream of data may also present a challenge to business people. The prediction of changes in the role of big data and technologies is even more difficult. Thus, our top technology trends of 2019 are to serve a comprehensible roadmap for you.</span></p>
<p class="justifyfull" dir="ltr"><strong>2019 Trends</strong></p>
<p>1. Data security will reinforce its positions</p>
<p>2. Internet of Things will deliver new opportunities</p>
<p>3. Automation continues to be game-changing</p>
<p>4. AR is expected to overcome VR</p>
<p>To read the 10 trends with detailed information for each trend, <a href="https://www.datasciencecentral.com/profiles/blogs/top-10-technology-trends-of-2019" target="_blank" rel="noopener">follow this link</a>. </p>
<p><span style="font-size: 14pt;"><b>DSC Resources</b></span></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/new-books-and-resources-for-dsc-members">Book and Resources for DSC Members</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/comprehensive-repository-of-data-science-and-ml-resources">Comprehensive Repository of Data Science and ML Resources</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/advanced-machine-learning-with-basic-excel">Advanced Machine Learning with Basic Excel</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/difference-between-machine-learning-data-science-ai-deep-learning">Difference between ML, Data Science, AI, Deep Learning, and Statistics</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/my-data-science-machine-learning-and-related-articles">Selected Business Analytics, Data Science and ML articles</a></li>
<li><a href="http://careers.analytictalent.com/jobs/products">Hire a Data Scientist</a><span> </span>|<span> </span><a href="http://www.datasciencecentral.com/page/search?q=Python">Search DSC</a><span> </span>|<span> </span><a href="http://www.analytictalent.com">Find a Job</a></li>
<li><a href="http://www.datasciencecentral.com/profiles/blog/new">Post a Blog</a><span> </span>|<span> </span><a href="http://www.datasciencecentral.com/forum/topic/new">Forum Questions</a></li>
</ul>
<p><span>Follow us: </span><a href="https://twitter.com/DataScienceCtrl">Twitter</a><span> | </span><a href="https://www.facebook.com/DataScienceCentralCommunity/">Facebook</a></p>
<h2 class="justifyfull"></h2>
<p><span style="font-size: 12pt;"><strong><a href="https://www.facebook.com/DataScienceCentralCommunity/"></a></strong></span></p>Great Sunday Readingtag:www.analyticbridge.datasciencecentral.com,2019-01-27:2004291:BlogPost:3909422019-01-27T22:20:28.000ZVincent Granvillehttps://www.analyticbridge.datasciencecentral.com/profile/VincentGranville
<p>Extract from the upcoming Monday newsletter published by Data Science Central. Previous editions can be found <a href="https://www.datasciencecentral.com/page/previous-digests" rel="noopener" target="_blank" title="This external link will open in a new window">here</a>. The contribution flagged with a + is our selection for the picture of the week. To subscribe,<span> …</span></p>
<p>Extract from the upcoming Monday newsletter published by Data Science Central. Previous editions can be found <a href="https://www.datasciencecentral.com/page/previous-digests" rel="noopener" target="_blank" title="This external link will open in a new window">here</a>. The contribution flagged with a + is our selection for the picture of the week. To subscribe,<span> </span><a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" rel="noopener" target="_blank" title="This external link will open in a new window">follow this link</a>. To check the full digest and see the picture of the week, follow<span> </span><a href="https://www.datasciencecentral.com/profiles/blogs/weekly-digest-january-28" target="_blank" rel="noopener">this link</a>. </p>
<p><strong>Featured Resources and Technical Contributions</strong> </p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/23-statistical-concepts-explained-in-simple-english-part-7">23 Statistical Concepts Explained in Simple English - Part 7</a><span> </span></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/new-home-sales-projection-a-time-series-forecasting">New Home Sales Projection: Time Series Forecasting</a><span> </span></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/best-dynamically-typed-programming-languages-for-data-analysis">Best dynamically-typed programming languages for data analysis</a><span> </span></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/the-10-statistical-techniques-data-scientists-need-to-master-10">The 10 Statistical Techniques Data Scientists Need to Master</a><span> </span></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/sip-text-log-analysis-using-pandas">SIP text log analysis using Pandas</a><span> </span></li>
</ul>
<p><strong>Featured Articles and Forum Questions</strong></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/how-to-flourish-in-industry-4-0-the-fourth-industrial-revolution">How to Flourish in Industry 4.0, the Fourth Industrial Revolution</a> +</li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/advice-to-a-fresh-graduate-for-getting-a-job-in-ai-data-science">Advice to a fresh graduate for getting a job in AI/ Data Science</a> </li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/top-10-technology-trends-of-2019-1" target="_blank" rel="noopener">Top 10 Technology Trends of 2019</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/data-driven-marketing-strategy-spatial-analytics-for-micro-2">Spatial Analytics for Micro-marketing</a> </li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/doctors-are-from-venus-data-scientists-from-mars-or-why-ai-ml-is-">Why AI/ML is Moving so Slowly in Healthcare</a> </li>
<li><a href="https://www.analyticbridge.datasciencecentral.com/profiles/blogs/turn-customer-reviews-into-business-growth">Mining Customer Reviews to drive Business Growth</a> </li>
<li><a href="https://www.analyticbridge.datasciencecentral.com/profiles/blogs/graph-analytics-to-reinforce-anti-fraud-programs">Graph Analytics to Reinforce Anti-fraud Programs</a> </li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/bill-schmarzo-s-retrospective-data-science-ml-big-data-analytics-">Bill Schmarzo's Retrospective: Data Science, ML, Big Data Analytics...</a> </li>
</ul>
<p>Follow us: <a href="https://twitter.com/DataScienceCtrl" target="_blank" title="This external link will open in a new window" rel="noopener">Twitter</a><span> | </span><a href="https://www.facebook.com/DataScienceCentralCommunity/" target="_blank" title="This external link will open in a new window" rel="noopener">Facebook</a>. </p>
<p></p>Mining Customer Reviews to drive Business Growthtag:www.analyticbridge.datasciencecentral.com,2019-01-24:2004291:BlogPost:3909362019-01-24T22:30:00.000ZKaniska Mandalhttps://www.analyticbridge.datasciencecentral.com/profile/KaniskaMandal
<p class="p1"><span class="s1">A passionate customer always provides feedback about his favorite product if it touches his emotional chord.</span></p>
<p class="p1"><span class="s1">Product review contains wealth of information. Analyzing the review texts can unearth many hidden data points about the customer and the product. Such insights can help grow the business and gain revenue.</span></p>
<p class="p1"></p>
<p class="p1"><span class="s1">Lets look into a specific example. …</span></p>
<p class="p1"><span class="s1">A passionate customer always provides feedback about his favorite product if it touches his emotional chord.</span></p>
<p class="p1"><span class="s1">Product review contains wealth of information. Analyzing the review texts can unearth many hidden data points about the customer and the product. Such insights can help grow the business and gain revenue.</span></p>
<p class="p1"></p>
<p class="p1"><span class="s1">Lets look into a specific example. </span></p>
<p class="p1"></p>
<p class="p1"><span class="s1">Our customer Bob decides to buy an edge pillow. </span></p>
<p class="p1"><span class="s1"><a href="https://storage.ning.com/topology/rest/1.0/file/get/873092412?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/873092412?profile=RESIZE_710x" class="align-left"/></a></span></p>
<p class="p1"></p>
<p class="p1"></p>
<p class="p1"></p>
<p class="p1"></p>
<p class="p1"></p>
<p class="p1"></p>
<p class="p1"></p>
<p class="p1"></p>
<p class="p1"></p>
<p class="p1"></p>
<p class="p1"></p>
<p class="p1"><span class="s1">He provides an in-depth feedback after using the pillow.</span></p>
<p class="p1"><span class="s1"><i>I have suffered with Gerd, Gastritis and Esophagitis for 1yr now and have been to several doctors and taken numerous medicine. All doctors told me to sleep on an incline and add blocks under my bed but I did not want to elevate both me and my wife so I slept on 3 pillows for over a year. Now I have arthritis in my neck and sleeping on 3 pillows have not done much to keep the acid down out of my throat. This wedge pillow does a good jo of not just elevating your head but it raises your entire upper abdomen to keep heartburn away from this area. I used to get up every night because of heartburn, bloating and stomach pain ……..</i></span></p>
<p class="p1"></p>
<p class="p1"><span class="s1">So what we learn when we read the whole text:</span></p>
<p class="p1"><span class="s1"><b>Our customer is not too Happy</b></span><span class="s2">☹</span> <span class="s1"><b>.. but his Review comments provides interesting insights</b></span><span class="s2">☺</span></p>
<p class="p1"></p>
<p class="p1"><span class="s1">Lets now try to extract key signals and categorize them.</span></p>
<p class="p1"></p>
<p class="p1"><span class="s1">Health Concerns -> <strong><i>‘</i></strong></span><span class="s2"><strong>now my neck has become very stiff and painful</strong>’</span></p>
<p class="p2"><span class="s1">Product Reference -><span class="Apple-converted-space"> </span></span> <strong><span class="s2">Get Rolled-up Cheap Pillow</span></strong></p>
<p class="p1"><span class="s1">Positive Feedback -></span> <strong><span class="s2">This pillow keeps food down and acid down</span></strong></p>
<p class="p2"><span class="s1">Missing Feature -><span class="Apple-converted-space"> </span></span> <span class="s2"><b>does not have a steep incline</b></span></p>
<p class="p2"></p>
<p class="p2">So it will be great if we can build a system to automatically extract such signals and share the insights through interactive visualization.</p>
<p class="p2">Quick high level view of the system components:</p>
<p class="p2"><a href="https://storage.ning.com/topology/rest/1.0/file/get/873197776?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/873197776?profile=RESIZE_710x" class="align-left" style="padding: 1px;"/></a></p>
<p class="p2"></p>
<p class="p1"></p>
<p class="p1"></p>
<p class="p1"></p>
<p class="p1"></p>
<p class="p1"></p>
<p class="p1"></p>
<p class="p1"></p>
<p class="p1"></p>
<p class="p1"></p>
<p class="p1"></p>
<p class="p1"></p>
<p class="p1"></p>
<p class="p1"></p>
<p class="p1"></p>
<p class="p1"></p>
<p class="p1"></p>
<p class="p1"></p>
<p class="p1"></p>
<p class="p1"></p>
<p class="p1"></p>
<p class="p1"><span class="s1"><b>Technical Work Flow</b></span></p>
<ul>
<li class="p1"><span class="s1"><b>Ingest Review Streams<span class="Apple-converted-space"> </span> (Real-time)<span class="Apple-converted-space"> </span> [ Kafka -> Spark ]</b></span></li>
<li class="p1"><span class="s1"><b>Store raw text in document index store for free form text search</b></span></li>
<li class="p1"><span class="s1"><b>Analyze incoming data asynchronously</b></span><ul>
<li class="p1">Text analysis [ NLP using Spark-ML ]<ul>
<li class="p1">Tokenize (lowercase, split)</li>
<li class="p1">Clean (remove stop word)</li>
<li class="p1">Normalize (lemmatize, stem)</li>
</ul>
</li>
<li class="p1">vectorize attributes and lookup historical vectorized data to run period NLP model training workflow</li>
<li class="p1">match significant product terms by referring to [Product Taxonomy ]</li>
<li class="p1">match buyer’s preference [Buyer’s Profile]</li>
<li class="p1">match medical terms [Medical Ontology and Vocabs]</li>
<li class="p1">discover new product , topics using LDA</li>
<li class="p1">detect positive features , negative features</li>
<li class="p1">sentiment analysis using VADER ( valence Aware Dictionary and Sentiment Reasoner)</li>
<li class="p1">enrich the results by combining with product rating , product attribute rating , review votes</li>
<li class="p1">extract and match user interests</li>
<li class="p1">its very important to detect plagiarism and </li>
</ul>
</li>
<li class="p1">Store current insights into Redis / DynamoDB for quick lookup and also stream to websockets</li>
<li class="p2"><span class="s1"><strong>Visualize real-time insights</strong></span></li>
<li class="p2"><b>Historical analysis [ Elastic Search / Hadoop]</b><ul>
<li class="p2">periodically aggregate the above insights</li>
<li class="p2">refine product offering on historical insights</li>
<li class="p2">product popularity comparison by category</li>
<li class="p2">generate demand based on signals</li>
<li class="p2">recommend products based on attributes </li>
<li class="p2">find the hidden customers (channels / stores) and supply items to them need to buy in bulks</li>
<li class="p2">grow inventory and replenish items in local stores </li>
<li class="p2">customer retention through personalized offer based on what user liked and didn't like</li>
<li class="p2">sell in bulk to channels discovered from product texts and offer discounted price</li>
<li class="p2">extract the health concerns and accordingly correlate with medical conditions , drugs info , safety warnings and generate health recommendation and aggregated health score</li>
</ul>
</li>
<li class="p2"><span class="s1">Store aggregated and structured results in data warehouse cassandra or redshift </span></li>
<li class="p2">Visualize summary reports , insights and trends </li>
</ul>
<p class="p3"></p>
<p class="p1"><b>In order to extract above hidden patterns with correlated signals, we should implement the best possible mechanisms and Recurrent Neural Network </b></p>
<p class="p1"><b><a href="https://storage.ning.com/topology/rest/1.0/file/get/873403488?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/873403488?profile=RESIZE_710x" class="align-left" style="padding: 1px;" width="238" height="243"/></a></b></p>
<p class="p1"></p>
<p class="p1"></p>
<p class="p1"><span class="s1">Word Embeddings [1]</span></p>
<p class="p1"><span class="s1">• Document vs. Word Representations</span></p>
<p class="p1"><span class="s1">• Word2Vec vs Med2Vec</span></p>
<p class="p1"><span class="s1">• GloVe</span></p>
<p class="p1"><span class="s1">• Embeddings in Deep Learning</span></p>
<p class="p1"><span class="s1">• Visualizing Word Vectors: tSNE</span></p>
<p class="p1"><b> </b></p>
<p class="p2"></p>
<p class="p2"></p>
<p class="p2"></p>
<p class="p2"><span class="s2">Valence Aware Dictionary and Sentiment Reasoner -- can help evaluate </span><span class="s1">Buyer Sentiment Variations, positive/negative feedback ratio, feature attribute weightage enrichment and factor into all different types of product metrics computation (as explained above)</span></p>
<p class="p2"><span class="s1">Finally we can generate incredibly useful visualizations and use them for product enhancement and improving overall buyer's experience.</span></p>
<p class="p2"><span class="s1">Lets get back to original feedback on wedge pillow and see the wonderful insights that we can gain.</span></p>
<p class="p2"><span class="s1">Its noteworthy, how one can easily find the opportunity to sell wedge pillows to the rehabilitation center who ned them for their patients.</span></p>
<p class="p2"><span class="s1">Many customers who actually buy the wedge pillows have undergone some sort of knee problems.</span></p>
<p class="p2"></p>
<p class="p2">Just to understand the power of the knowledge that can be extracted from the reviews, lets quickly look into the insights gained from a set of feedbacks provided on 'Cream of Wheat: Whole Grain Hot Cereal'</p>
<p class="p2"><a href="https://storage.ning.com/topology/rest/1.0/file/get/873576626?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/873576626?profile=RESIZE_710x" class="align-left" style="padding: 1px;"/></a></p>
<p class="p2"></p>
<p class="p2"><span class="s1">Its amazing to discover how this particular food item helps Alzheimer's patients and mostly old people or persons with throat problems prefer this food item. </span></p>
<p class="p2"></p>
<p class="p2"><span class="s1"><a href="https://storage.ning.com/topology/rest/1.0/file/get/873583340?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/873583340?profile=RESIZE_710x" class="align-left" style="padding: 1px;"/></a></span></p>
<p class="p2"></p>
<p class="p2"></p>
<p class="p2"></p>
<p class="p2"></p>
<p class="p2"></p>
<p class="p2"></p>
<p class="p2"></p>
<p class="p2"></p>
<p class="p2"></p>
<p class="p2"></p>
<p class="p2"></p>
<p class="p2"></p>
<p class="p2"></p>
<p class="p2"></p>
<p class="p2"></p>
<p class="p2"></p>
<p class="p2"></p>
<p class="p2"></p>
<p class="p2"><span class="s1">Mining product review data can be real fun and can turn customer feedback into a continuous source of revenue.</span></p>Graph Analytics to Reinforce Anti-fraud Programstag:www.analyticbridge.datasciencecentral.com,2019-01-22:2004291:BlogPost:3905152019-01-22T07:30:00.000ZElise Devauxhttps://www.analyticbridge.datasciencecentral.com/profile/EliseDevaux
<p></p>
<p><span>Organizations across industries are adopting graph analytics to reinforce their anti-fraud programs. In this post, we examine three types of fraud graph analytics can help investigators combat: insurance fraud, credit card fraud, VAT fraud.</span></p>
<h1><span>Detecting fraud is about connecting the dots</span></h1>
<p><span><br></br></span> <span>In many areas, fraud investigators have at their disposal large datasets in which clues are hidden. These clues are left behind by…</span></p>
<p></p>
<p><span>Organizations across industries are adopting graph analytics to reinforce their anti-fraud programs. In this post, we examine three types of fraud graph analytics can help investigators combat: insurance fraud, credit card fraud, VAT fraud.</span></p>
<h1><span>Detecting fraud is about connecting the dots</span></h1>
<p><span><br/></span> <span>In many areas, fraud investigators have at their disposal large datasets in which clues are hidden. These clues are left behind by criminals who, on their side, try to hide their activity behind layers of more or less intricate schemes. To unveil illegal activities, investigators have to connect the pieces of the puzzle to discover evidence of wrongdoing.</span></p>
<p><span>Most anti-fraud applications are able to connect simple data points together to detect suspicious behaviors: an IP address to a user, withdrawal activities to a place of residence, or a loan request history to a client.</span></p>
<p><span>But these applications fall short on more complex analysis that would imply several levels of relationships or data types. This is mostly due to the technology on which these applications often rely and the data silos it creates. The relational databases that emerged in the ’80s are efficient at storing and analyzing tabular data but their underlying data model makes it difficult to connect data scattered across multiple tables.</span></p>
<p><span>The graph databases we’ve seen emerge in the recent years are designed for this purpose. Their data model is particularly well-suited to store and to organize data where <a href="https://linkurio.us/blog/unlocking-value-connected-data/">connections are as important as individual data points</a>. Connections are stored and indexed as first-class citizens, making it an interesting model for investigations in which you need to connect the dots. In this post, we review three common fraud schemes and see how a graph approach can help investigators defeat them.</span></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/837232028?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/837232028?profile=RESIZE_710x" class="align-center"/></a></p>
<h1><span>3 types of fraud graph analytics can combat<br/></span></h1>
<h2 id="insurancefraud">1) Insurance fraud</h2>
<p><span>Insurance fraud encompasses any act committed in the intent of defrauding an assurance process. It ranges from staged car accidents to faked deaths or exaggerated property damages. The FBI estimates that </span><a href="https://www.fbi.gov/stats-services/publications/insurance-fraud"><span>insurance fraud cost $40 billion</span></a><span> per year in the U.S.</span></p>
<p><span>As an example, people frequently team up and put together fake road traffic accident (RTA) claims, in which they report hard-to-disprove, light, personal injuries. Those fraud rings involve several criminal playing the various roles of conductors, passengers, witnesses and even doctors that certify injuries, or accomplice lawyers that file the claim.</span></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/837233894?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/837233894?profile=RESIZE_710x" class="align-center"/></a></p>
<p><span>There are too many claims filed every day for insurance analysts to analyze manually. Fraud investigation units have to rely on simple business rules to identify suspicious claims. But if the fraudsters made sure to avoid red flag case elements (unusual injury, recently purchased insurance policy, low velocity but significant injury etc) there is a chance they will go undetected and repeat the scheme.</span></p>
<p><span>This is where graph technology steps in. The graph approach brings data from various sources under a common model, so investigators can look at </span><i><span>all </span></i><span>the data at the same time, instead of isolated data silos. And this is exactly what they need because in these situations, what often gives away the fraudsters is abnormal connections to other elements.</span></p>
<p><span>These suspicious connections could be that the witness’s wife is connected to two similar cases, or that the doctor’s phone number is the same as the one of a conductor involved in another RTA claims, etc. Graph visualization and analysis platforms like Linkurious Enterprise allow investigators to pick up suspicious signs faster. They get a better understanding of the “big picture” and can identify abnormal connections to </span><a href="https://linkurio.us/blog/whiplash-for-cash-using-graphs-for-fraud-detection/"><span>detect insurance fraud</span></a><span>.</span></p>
<p></p>
<p><span><a href="https://storage.ning.com/topology/rest/1.0/file/get/837238458?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/837238458?profile=RESIZE_710x" class="align-center"/></a></span></p>
<p><span><br/></span> <span>Above is an e</span><span>xample graph visualization where we can identify one of those abnormal patterns that indicate insurance fraud of staged car accidents: Two customers (blue nodes) filed three claims (green nodes). We can identify a network of three customers connected through personal information such as phone (brown nodes), email (pink nodes) with the same lawyer (green node) involved every time. It is likely they are recycling stolen or fake identity to file fraudulent claims.</span></p>
<h2 id="creditcardfraud"><span>2) Payment card fraud</span></h2>
<p><span>Payment card fraud takes the form of criminals getting ahold of credit card information and proceeding to create unauthorized transactions. Card-present scenarios, in which criminals use a stolen or counterfeit credit card at an ATM or at the point-of-sale (POS) terminal of a physical store, affected </span><a href="https://geminiadvisory.io/card-fraud-on-the-rise/"><span>45,8 million cards in the U.S</span></a><span> in 2018. Despite a massive migration to the safer chip-based card, stolen credit card fraud is still a major issue.</span></p>
<p><span>In a commonly encountered situation, a criminal proceeds the following way:</span></p>
<ul>
<li><span>set up skimming devices at ATM or gas pump to steal the details stored in card’s magnetic stripes;</span></li>
<li><span>replicate the stolen card information into a counterfeit card;</span></li>
<li><span>use to stolen cards to withdraw money at ATM, buy goods or gift cards at shops;</span></li>
<li><span>cardholders notice unusual activity on their bank account and notify the authority.</span></li>
</ul>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/837240783?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/837240783?profile=RESIZE_710x" class="align-center"/></a></p>
<p></p>
<p><span>These situations are a perfect case for graph technology. While traditional technologies will hardly allow you to create a ‘big picture’ of heterogeneous data, the graph approach lets you collect the data in a model linking together: cardholders, transactions, terminals, and locations.</span></p>
<p><span>This way, when authorities are confronted with a surge of card-present fraud cases in a given region, graph technology can help </span><a href="https://linkurio.us/blog/stolen-credit-cards-and-fraud-detection-with-neo4j/"><span>identify the common point of compromise</span></a><span> by highlighting the common links within the various reported cases, no matter how large the dataset is. Credit card fraud is thus another type of fraud graph analytics can help detect and fight.</span></p>
<p></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/837247277?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/837247277?profile=RESIZE_710x" class="align-center"/></a></p>
<p></p>
<p><span>Above is an example of a graph visualization to identify a common point of compromise: Clients (blue nodes) report fraudulent purchases (orange nodes). We can identify through connections the common ATM (purple) where they made a withdrawal before the card was compromised.</span></p>
<h2 id="vatfraud"><span>3) VAT fraud</span></h2>
<p><span>Carousel fraud, also known as the missing trader, or VAT fraud, is the theft of VAT collected on the sale of goods initially bought VAT-free in another jurisdiction. This scheme is difficult to identify in time and losses can be massive as recent cases have shown.</span></p>
<p><span>In 2018, a single </span><a href="https://www.europol.europa.eu/newsroom/news/eu-wide-vat-fraud-organised-crime-group-busted"><span>VAT fraud ring cost more than 60 million euros</span></a><span> to the European economy. The criminal organization was selling products online through a wide network of shell companies and producing false invoices to perform VAT fraud. Generally, this is how the carousel works:</span></p>
<ul>
<li><span>Company A sells the goods company B VAT-free</span></li>
<li><span>Company B sells the goods to company C, charging the VAT</span></li>
<li><span>Company C sells the goods and claims a VAT refund to the tax agency of country A</span></li>
</ul>
<p></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/837249246?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/837249246?profile=RESIZE_710x" class="align-center"/></a></p>
<p></p>
<p><span>Those schemes are intricate and transactions quickly come after one after another to avoid raising suspicion. To make sense of the layers behind which criminals hide, investigators need an overview of the situation. Once again, graph technology can help bring together various data types to get a better understand of the financial context.</span></p>
<p><span>Then, platforms like Linkurious Enterprise provide support for pattern finding activity, leveraging the flexible query semantic of graph databases. Investigators can search across vast data collections for patterns indicative of the carousel: for example multiple transactions occurring in a short amount of time between companies from two different countries with a newly created intermediary company. From there, investigators can monitor flagged patterns and </span><a href="https://linkurio.us/blog/vat-fraud-mysterious-case-missing-trader/"><span>assess the existence of potential carousel fraud</span></a><span>.</span></p>
<p></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/837251824?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/837251824?profile=RESIZE_710x" class="align-center"/></a></p>
<p></p>
<p>Above is an example of a visualization to identify chains of transactions in VAT fraud: Companies (blue nodes) and their parent organizations (flags nodes) sell goods VAT-free and collect back VAT through complex layers of sales between EU and non-EU countries.</p>
<p></p>
<p id="2627" class="graf graf--p graf-after--p">Today, organizations use graph technology to fight fraud across activity sectors: insurance, banking, law enforcement or financial administrations. It is a complementary approach to traditional statistical and relational technologies because it gives the opportunity to look for clues within data connections, which is where the value often lies when it comes to fraud.</p>
<p id="6f3f" class="graf graf--p graf-after--p graf--trailing">(Initially published on<span> </span><a href="https://linkurio.us/blog/3-fraud-graph-analytics-help-defeat/" class="markup--anchor markup--p-anchor" rel="nofollow noopener" target="_blank">linkurio.us blog</a>)</p>
<p></p>Great Sunday Readingtag:www.analyticbridge.datasciencecentral.com,2019-01-20:2004291:BlogPost:3908042019-01-20T19:15:49.000ZVincent Granvillehttps://www.analyticbridge.datasciencecentral.com/profile/VincentGranville
<p>Extract from the upcoming Monday newsletter published by Data Science Central. Previous editions can be found <a href="https://www.datasciencecentral.com/page/previous-digests" rel="noopener" target="_blank" title="This external link will open in a new window">here</a>. The contribution flagged with a + is our selection for the picture of the week. To subscribe,<span> …</span></p>
<p>Extract from the upcoming Monday newsletter published by Data Science Central. Previous editions can be found <a href="https://www.datasciencecentral.com/page/previous-digests" rel="noopener" target="_blank" title="This external link will open in a new window">here</a>. The contribution flagged with a + is our selection for the picture of the week. To subscribe,<span> </span><a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" rel="noopener" target="_blank" title="This external link will open in a new window">follow this link</a>. To check the full digest and see the picture of the week, follow<span> </span><a href="https://www.datasciencecentral.com/profiles/blogs/weekly-digest-january-21" rel="noopener" target="_blank" title="This external link will open in a new window">this link</a>. </p>
<p><strong>Featured Resources and Technical Contributions</strong> </p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/your-guide-to-natural-language-processing-nlp" target="_blank" title="This external link will open in a new window" rel="noopener">Your Guide to Natural Language Processing (NLP)</a><span> </span></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/stocks-significance-testing-amp-p-hacking-how-volatile-is" target="_blank" title="This external link will open in a new window" rel="noopener">Stocks, Significance Testing & p-Hacking:<span> </span></a>How volatile is volatile?</li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/the-mathematics-of-data-science-understanding-the-foundations-of" target="_blank" title="This external link will open in a new window" rel="noopener">The Mathematics of Data Science<span> </span></a>- Understanding the foundations of Deep Learning through Linear Regression</li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/tableau-in-10-minutes-step-by-step-guide" target="_blank" title="This external link will open in a new window" rel="noopener">Tableau in 10 Minutes: Step-by-Step Guide</a><span> </span></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/pancake-a-python-package-for-model-stacking" target="_blank" title="This external link will open in a new window" rel="noopener">Pancake: A Python package for model stacking</a><span> </span></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/900-most-popular-ds-ml-articles-in-2018" target="_blank" title="This external link will open in a new window" rel="noopener">900 Most Popular DS & ML Articles in 2018</a><span> </span></li>
</ul>
<p><strong>Featured Articles and Forum Questions</strong></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/how-do-you-win-the-data-science-wars-you-cheat-by-doing-the" target="_blank" title="This external link will open in a new window" rel="noopener">How Do You Win the Data Science Wars? </a>You Cheat By Doing The Necessary Pre-work +</li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/supervised-vs-unsupervised-learning-whats-the-big-deal" target="_blank" title="This external link will open in a new window" rel="noopener">Supervised vs Unsupervised Learning...Whats the Big Deal?</a> </li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/exploit-the-economics-of-artificial-intelligence-with-design" target="_blank" title="This external link will open in a new window" rel="noopener">Exploit the Economics of AI with Design Thinking and Data Science</a> </li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/the-ai-ml-opportunity-landscape-in-healthcare-do-it-right-or-it-w" target="_blank" title="This external link will open in a new window" rel="noopener">The AI/ML Opportunity Landscape in Healthcare</a> </li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/19-controversial-articles-about-data-science" target="_blank" title="This external link will open in a new window" rel="noopener">19 Controversial Articles about Data Science</a> </li>
</ul>
<p></p>
<p>Follow us: <a href="https://twitter.com/DataScienceCtrl" target="_blank" title="This external link will open in a new window" rel="noopener">Twitter</a><span> | </span><a href="https://www.facebook.com/DataScienceCentralCommunity/" target="_blank" title="This external link will open in a new window" rel="noopener">Facebook</a>. </p>Understanding the foundations of Deep Learning through Linear Regressiontag:www.analyticbridge.datasciencecentral.com,2019-01-16:2004291:BlogPost:3904972019-01-16T16:48:52.000ZVincent Granvillehttps://www.analyticbridge.datasciencecentral.com/profile/VincentGranville
<p>This article was written by <a href="https://www.datasciencecentral.com/profile/ajitjaokar" rel="noopener" target="_blank">Ajit Joakar</a>. </p>
<p>In this longish post, I have tried to explain Deep Learning starting from familiar ideas like machine learning. This approach forms a part of my forthcoming book. I have used this approach in my teaching. It is based on ‘learning by exception,' i.e. understanding one concept and it’s limitations and then understanding how the subsequent concept…</p>
<p>This article was written by <a href="https://www.datasciencecentral.com/profile/ajitjaokar" target="_blank" rel="noopener">Ajit Joakar</a>. </p>
<p>In this longish post, I have tried to explain Deep Learning starting from familiar ideas like machine learning. This approach forms a part of my forthcoming book. I have used this approach in my teaching. It is based on ‘learning by exception,' i.e. understanding one concept and it’s limitations and then understanding how the subsequent concept overcomes that limitation.</p>
<p>The roadmap we follow is:</p>
<ul>
<li>Linear Regression</li>
<li>Multiple Linear Regression</li>
<li>Polynomial Regression</li>
<li>General Linear Model</li>
<li>Perceptron Learning</li>
<li>Multi-Layer Perceptron</li>
</ul>
<p>We thus develop a chain of thought that starts with linear regression and extends to multilayer perceptron (Deep Learning). Also, for simplification, I have excluded other forms of Deep Learning such as CNN and LSTM, i.e. we confine ourselves to the multilayer Perceptron when it comes to Deep Learning. Why start with Linear Regression? Because it is an idea familiar to many even at high school levels.</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/779088792?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/779088792?profile=RESIZE_710x" class="align-center"/></a></p>
<p>To read the full article, <a href="https://www.datasciencecentral.com/profiles/blogs/the-mathematics-of-data-science-understanding-the-foundations-of" target="_blank" rel="noopener">follow this link</a>. For more about deep learning, <a href="https://www.datasciencecentral.com/page/search?q=deep+learning" target="_blank" rel="noopener">click here</a>. For more about regression, <a href="https://www.datasciencecentral.com/page/search?q=regression" target="_blank" rel="noopener">click here</a>. </p>
<p><span style="font-size: 14pt;"><b>DSC Resources</b></span></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/new-books-and-resources-for-dsc-members">Book and Resources for DSC Members</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/comprehensive-repository-of-data-science-and-ml-resources">Comprehensive Repository of Data Science and ML Resources</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/advanced-machine-learning-with-basic-excel">Advanced Machine Learning with Basic Excel</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/difference-between-machine-learning-data-science-ai-deep-learning">Difference between ML, Data Science, AI, Deep Learning, and Statistics</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/my-data-science-machine-learning-and-related-articles">Selected Business Analytics, Data Science and ML articles</a></li>
<li><a href="http://careers.analytictalent.com/jobs/products">Hire a Data Scientist</a><span> </span>|<span> </span><a href="http://www.datasciencecentral.com/page/search?q=Python">Search DSC</a><span> </span>|<span> </span><a href="http://www.analytictalent.com">Find a Job</a></li>
<li><a href="http://www.datasciencecentral.com/profiles/blog/new">Post a Blog</a><span> </span>|<span> </span><a href="http://www.datasciencecentral.com/forum/topic/new">Forum Questions</a></li>
</ul>
<p><span>Follow us: </span><a href="https://twitter.com/DataScienceCtrl">Twitter</a><span> | </span><a href="https://www.facebook.com/DataScienceCentralCommunity/">Facebook</a></p>5 reasons why graph visualization matterstag:www.analyticbridge.datasciencecentral.com,2019-01-11:2004291:BlogPost:3904852019-01-11T16:25:33.000ZElise Devauxhttps://www.analyticbridge.datasciencecentral.com/profile/EliseDevaux
<p>Why is graph visualization so important? How can it help businesses sifting through large amounts of complex data? We explore the answer in this post through 5 advantages of graph visualization and different use cases.</p>
<h1><span>What is graph visualization</span></h1>
<p><span>Also called network, a graph is a collection of nodes (or vertices) and edges (or links). Each node represents a single data point (a person, a phone number, a transaction) and each edge represents how two nodes…</span></p>
<p>Why is graph visualization so important? How can it help businesses sifting through large amounts of complex data? We explore the answer in this post through 5 advantages of graph visualization and different use cases.</p>
<h1><span>What is graph visualization</span></h1>
<p><span>Also called network, a graph is a collection of nodes (or vertices) and edges (or links). Each node represents a single data point (a person, a phone number, a transaction) and each edge represents how two nodes are connected (a person </span><i><span>possess </span></i><span>a phone number for example). This way of representing data is well suited for scenarios involving connections (social networks, telecommunication networks, protein interactions, and a lot more).</span></p>
<p><span>Graph visualization is the visual representation of the nodes and edges of a graph. Dedicated algorithms, called layouts, calculate the node positions and display the data on two (sometimes three) dimensional spaces. Graph visualization tools provide user-friendly web interfaces to interact and explore graph data.</span></p>
<div id="attachment_5890" class="wp-caption aligncenter"><p class="wp-caption-text" style="text-align: center;"></p>
<p class="wp-caption-text" style="text-align: center;"><a href="https://storage.ning.com/topology/rest/1.0/file/get/726169892?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/726169892?profile=RESIZE_710x" class="align-center"/></a></p>
<p class="wp-caption-text" style="text-align: center;">A simple graph visualization made with Linkurious Enterprise – 9 nodes representing investors (blue), companies (green) and market (orange) and 8 edges indicating how they are connected.</p>
</div>
<p><span><br/>These graph visualizations are simply visualizations of data modeled as graphs. Any type of data asset that contains information about connections can be modeled and visualized as a graph, even data initially stored in a tabular way. For instance, the data from our example above could be extracted from a simple spreadsheet as depicted below.<br/><br/></span></p>
<table style="margin-left: auto; margin-right: auto;">
<tbody><tr><td><span>Company ID</span></td>
<td><span>Company name</span></td>
<td><span>Investors name</span></td>
<td><span>Market</span></td>
</tr>
<tr class="alt-table-row"><td><span>1</span></td>
<td><span>Systran</span></td>
<td><span>Softbank Ventures Korea</span></td>
<td><span>Software</span></td>
</tr>
<tr><td><span>2</span></td>
<td><span>Exakis</span></td>
<td><span>Naxicap Partners; IRDI-ICSO; IRDI Midi Pyrenees</span></td>
<td><span>Software</span></td>
</tr>
<tr class="alt-table-row"><td><span>3</span></td>
<td><span>Voluntis</span></td>
<td><span>Qualcomm</span></td>
<td><span>Software</span></td>
</tr>
</tbody>
</table>
<p style="text-align: center;"><em><span>A table-based model of our first example</span></em></p>
<p><span><br/>The data could also be stored in a relational database or in a graph database, a system </span><a href="https://neo4j.com/why-graph-databases/"><span>optimized for the storage and analysis of complex and connected data</span></a><span>.</span></p>
<p><span>In the end, graph visualization is a way to better understand and manipulate connected data. And it offers several advantages. </span></p>
<h2><span style="font-size: 18pt;">The benefits of graph visualization</span></h2>
<p></p>
<p><span>Interactive visualization tools are an essential layer to identify insights and generate value from connected data. There are a number of reasons why graph visualization is useful:</span></p>
<ol>
<li><span>You will</span><b><span> </span>spend less time assimilating information</b><span> because the human brain processes visual information much faster than written one. Visually displaying data ensures a faster comprehension which, in the end, reduces the time to action.<br/></span></li>
<li><span>You have a</span><b><span> </span>higher chance to discover insights</b><span> by interacting with data. Graph visualization tools offer the possibility to manipulate the data. It encourages data appropriation, its questioning and in the end increases the possibility to discover actionable insights. </span><a href="https://www.tableau.com/sites/default/files/media/8604-ra-business-intelligence-analytics.pdf"><span>A study showed</span></a><span> that managers who use visual data discovery tools are 28% more likely to find timely information, than those who rely solely on managed reporting and dashboards.<br/><br/></span></li>
<li><span>You can achieve a</span><b><span> </span>better understanding of a problem</b><span> by visualizing patterns and context. Graph visualization tools are perfect to visualize relationships but also to understand the context of the data. You get a complete overview of how everything is connected which allows to identify trends and correlations in your data.<br/><br/></span></li>
<li><b>It’s an effective form of communication</b><span>. Visual representations offer a more intuitive way to understand the data and are an impactful medium to share your findings with decision-makers.<br/><br/></span></li>
<li><b>Everybody can work with graph visualization</b><span>, not only technical users. More users can access the insights since specific programming skills are not required to interact with graph visualizations. This increases the value creation potential.<br/><br/></span></li>
</ol>
<p><span>Let’s illustrate some of these benefits with a very simple example. We have a data sample of eleven individuals with information about who works with who. Below is the same data sample in two formats: a table and a graph visualization.<br/><br/></span></p>
<p style="text-align: center;"><span><a href="https://storage.ning.com/topology/rest/1.0/file/get/726173371?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/726173371?profile=RESIZE_710x" class="align-center"/></a></span></p>
<div id="attachment_5898" class="wp-caption aligncenter"><p class="wp-caption-text" style="text-align: center;">Table of our data sample (click for full view)</p>
<p class="wp-caption-text" style="text-align: center;"></p>
<p class="wp-caption-text" style="text-align: center;"><a href="https://storage.ning.com/topology/rest/1.0/file/get/726175663?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/726175663?profile=RESIZE_710x" class="align-center"/></a></p>
</div>
<div id="attachment_5892" class="wp-caption aligncenter"><p class="wp-caption-text" style="text-align: center;">Graph visualization of our data sample (click for full view)</p>
</div>
<p><span><br/>In our second format, we’ve modeled the connections between persons as edges to obtain a graph.<br/> While in the first table it’s pretty hard to understand how those people work together, we get a clearer view with the graph visualization. We are able to distinguish two groups and an individual who seems to be the link between them, a pattern that we did not notice at first in the table.</span></p>
<p></p>
<h2><span style="font-size: 18pt;">How graph visualization is being used</span></h2>
<p></p>
<p><span><a href="https://linkurio.us/blog/category/use-case/">Many industries are using graph technology</a> to leverage their connected data and reach their goals. At Linkurious, we work with companies from a large variety of fields. Their common point, however, is the need to</span><b><span> </span>find connections or understand dependencies</b><span> within their data. Below are a few examples of typical use-cases of graph visualization and the organizations who use it.<br/></span></p>
<p></p>
<p><b>Anti-Financial crime</b></p>
<p><span>Banks, insurance companies, and financial institutions have a common urgency to face: fraud. From money laundering to insurance fraud to bank fraud, each of these organizations is required to detect, sometimes complex, fraud schemes. The data visualized often combine customer information, claims details, financial records, watch-listed individuals or organizations. For them, graph visualization is a good way to detect suspicious connections or patterns. It’s also an intuitive way to investigate fraud rings and criminals networks ramifications.</span></p>
<p></p>
<p><b>Cybersecurity</b></p>
<p><span>Today you’ll find cyber, or IT, security in many large organizations, financial institutions, and security consultancy services. Organizations need to protect themselves from vulnerabilities like zero-day vulnerabilities and DDoS or phishing attacks. They collect data from servers, routers or application logs and network status in order to detect suspicious activity. Graph visualization is a great tool to digest this data and detect suspicious patterns in a glimpse. It makes the finding of compromised elements easier thanks to the visual exploration of connections.</span></p>
<p></p>
<p><b>Intelligence</b></p>
<p><span>Almost every government has its intelligence agency. To support law enforcement, national security or military objectives, these organizations collect and analyze data from various sources. The detection and identification of terrorist networks, for instance, became a crucial objective in the past decades. Visualizing connections between people, emails, transactions or phone records is a key to ease such investigations. </span></p>
<p></p>
<p><b>IT operations management</b></p>
<p><span>The field of IT operations management keeps growing with our increasing reliance on computer systems, networks and the growth of the Internet of Things. But because of the growing complexity of infrastructures, managing networks is often a challenge. Graph visualization allows IT managers to visualize dependencies between their assets (servers, switches, routers, applications, etc). It’s an intuitive way to perform impact or root cause analysis.</span></p>
<p></p>
<p><b>Enterprise architecture</b></p>
<p><span>Numerous mature organizations implement enterprise architecture management. It consists of synchronizing business and IT data. The goal is to analyze, plan and transform the business processes, applications, data and infrastructure to maintain the organization ability to change and innovate. With graph visualization, enterprise architects can visualize the organization assets and their dependencies. It helps to conduct impact analysis, obtain insights on the current situation (as-is) and plan the right actions.</span></p>
<p></p>
<p><b>Life science</b></p>
<p><span>Protein interactions, drug compositions, disease networks: for life science data analysis almost everything is about connections and dependencies. However, the large amount of data often makes it difficult for researchers to identify insights and look for dependencies. Graph visualization makes large amounts of data more accessible and easier to read. It has many different applications, from linking drugs with adverse events and diseases with phenotypes to visualizing network or understand how diseases spread.</span></p>
<p></p>
<p><span>This article was initially posted on <a href="https://linkurio.us/blog/why-graph-visualization-matters/" target="_blank" rel="noopener">Linkurious blog</a></span></p>5 Predictions about Data Science, Machine Learning, and AI for 2019tag:www.analyticbridge.datasciencecentral.com,2018-12-21:2004291:BlogPost:3900312018-12-21T01:30:00.000ZVincent Granvillehttps://www.analyticbridge.datasciencecentral.com/profile/VincentGranville
<p><strong><em>Summary:</em></strong><em> Here are our 5 predictions for data science, machine learning, and AI for 2019. We also take a look back at last year’s predictions to see how we did.</em></p>
<p> </p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/401132209?profile=original" rel="noopener" target="_blank"><img class="align-right" src="https://storage.ning.com/topology/rest/1.0/file/get/401132209?profile=original&width=250" width="250"></img></a> It’s that time of year again when we do a look back in order to offer a look forward. What trends will speed up, what things will actually happen,…</p>
<p><strong><em>Summary:</em></strong><em> Here are our 5 predictions for data science, machine learning, and AI for 2019. We also take a look back at last year’s predictions to see how we did.</em></p>
<p> </p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/401132209?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/401132209?profile=original&width=250" width="250" class="align-right"/></a>It’s that time of year again when we do a look back in order to offer a look forward. What trends will speed up, what things will actually happen, and what things won’t in the coming year for data science, machine learning, and AI.</p>
<p>We’ve been watching and reporting on these trends all year and we scoured the web and some of our professional contacts to find out what others are thinking. </p>
<p> </p>
<p><span><strong>Here’s a Quick Look at Last Year’s Predictions and How We Did.</strong></span></p>
<ol>
<li><em>What we said: Both model production and data prep will become increasingly automated. Larger data science operations will converge on a single platform (of many available). Both of these trends are in response to the groundswell movement for efficiency and effectiveness. In a nutshell allowing fewer data scientists to do the work of many.</em> </li>
</ol>
<p>Clearly a win. <span> </span><a href="https://www.datasciencecentral.com/profiles/blogs/practicing-no-code-data-science" target="_self">No code data science</a><span> </span>is on the rise as is end-to-end integration in advanced analytic platforms.</p>
<ol start="2">
<li><em>What we said: Data Science continues to develop specialties that mean the mythical ‘full stack’ data scientist will disappear.</em></li>
</ol>
<p>To read all 2018 predictions, and compare with the updated 2019 version, <a href="https://www.datasciencecentral.com/profiles/blogs/5-predictions-about-data-science-machine-learning-and-ai-for-2019" target="_blank" rel="noopener">click here</a>. </p>
<p><span style="font-size: 14pt;"><strong>Announcement</strong></span></p>
<ul>
<li><a href="https://dsc.news/2UZnoQ6">Leverage All Your Data With Cloud Analytics<span> </span></a>- On-demand Webinar<span> </span></li>
</ul>New Books in AI, Machine Learning, and Data Sciencetag:www.analyticbridge.datasciencecentral.com,2018-12-02:2004291:BlogPost:3896612018-12-02T01:26:14.000ZVincent Granvillehttps://www.analyticbridge.datasciencecentral.com/profile/VincentGranville
<p>We are in the process of writing and adding new material (compact eBooks) exclusively available to our members, and written in simple English, by world leading experts in AI, data science, and machine learning. In the upcoming months, the following will be added:</p>
<ul>
<li>The Machine Learning Coding Book</li>
<li>Off-the-beaten-path Statistics and Machine Learning Techniques </li>
<li>Encyclopedia of Statistical Science</li>
<li>Original Math, Stat and Probability Problems - with…</li>
</ul>
<p>We are in the process of writing and adding new material (compact eBooks) exclusively available to our members, and written in simple English, by world leading experts in AI, data science, and machine learning. In the upcoming months, the following will be added:</p>
<ul>
<li>The Machine Learning Coding Book</li>
<li>Off-the-beaten-path Statistics and Machine Learning Techniques </li>
<li>Encyclopedia of Statistical Science</li>
<li>Original Math, Stat and Probability Problems - with Solutions</li>
<li>Computational Number Theory for Data Scientists</li>
<li>Randomness, Pattern Recognition, Simulations, Signal Processing - New developments</li>
</ul>
<p>We invite you to<span> </span><a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" target="_blank" rel="noopener">sign up here</a><span> </span>to not miss these free books. Previous material (also for members only) can be found<span> </span><a href="https://www.datasciencecentral.com/page/member" target="_blank" rel="noopener">here</a>.</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/135807237?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/135807237?profile=original" class="align-center"/></a></p>
<p></p>
<p>Currently, the following content is available:</p>
<p><strong>1. Book: Enterprise AI - An Application Perspective</strong> </p>
<p>Enterprise AI: An applications perspective takes a use case driven approach to understand the deployment of AI in the Enterprise. Designed for strategists and developers, the book provides a practical and straightforward roadmap based on application use cases for AI in Enterprises. The authors (Ajit Jaokar and Cheuk Ting Ho) are data scientists and AI researchers who have deployed AI applications for Enterprise domains. The book is used as a reference for Ajit and Cheuk's new course on Implementing Enterprise AI.</p>
<p>The table of content is available<span> </span><a href="https://www.datasciencecentral.com/profiles/blogs/free-ebook-enterprise-ai-an-applications-perspective" target="_blank" rel="noopener">here</a>. The book can be accessed<span> </span><a href="https://www.datasciencecentral.com/page/free-books-1" target="_blank" rel="noopener">here</a><span> </span>(members only.)</p>
<p><strong>2. Book: Applied Stochastic Processes</strong></p>
<p>Full title:<span> </span><em>Applied Stochastic Processes, Chaos Modeling, and Probabilistic Properties of Numeration Systems</em>. Published June 2, 2018. Author: Vincent Granville, PhD. (104 pages, 16 chapters.)</p>
<p>This book is intended to professionals in data science, computer science, operations research, statistics, machine learning, big data, and mathematics. In 100 pages, it covers many new topics, offering a fresh perspective on the subject. It is accessible to practitioners with a two-year college-level exposure to statistics and probability. The compact and tutorial style, featuring many applications (Blockchain, quantum algorithms, HPC, random number generation, cryptography, Fintech, web crawling, statistical testing) with numerous illustrations, is aimed at practitioners, researchers and executives in various quantitative fields.</p>
<p>New ideas, advanced topics, and state-of-the-art research are discussed in simple English, without using jargon or arcane theory. It unifies topics that are usually part of different fields (data science, operations research, dynamical systems, computer science, number theory, probability) broadening the knowledge and interest of the reader in ways that are not found in any other book. This short book contains a large amount of condensed material that would typically be covered in 500 pages in traditional publications. Thanks to cross-references and redundancy, the chapters can be read independently, in random order.</p>
<p>The table of content is available<span> </span><a href="https://www.datasciencecentral.com/profiles/blogs/fee-book-applied-stochastic-processes" target="_blank" rel="noopener">here</a>. The book can be accessed<span> </span><a href="https://www.datasciencecentral.com/page/free-books-1" target="_blank" rel="noopener">here</a><span> </span>(members only.)</p>
<p><span><b>DSC Resources</b></span></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/comprehensive-repository-of-data-science-and-ml-resources">Comprehensive Repository of Data Science and ML Resources</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/advanced-machine-learning-with-basic-excel">Advanced Machine Learning with Basic Excel</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/difference-between-machine-learning-data-science-ai-deep-learning">Difference between ML, Data Science, AI, Deep Learning, and Statistics</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/my-data-science-machine-learning-and-related-articles">Selected Business Analytics, Data Science and ML articles</a></li>
<li><a href="http://careers.analytictalent.com/jobs/products">Hire a Data Scientist</a><span> </span>|<span> </span><a href="https://www.datasciencecentral.com/page/search?q=Python">Search DSC</a><span> </span>|<span> </span><a href="http://www.analytictalent.com/">Find a Job</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blog/new">Post a Blog</a><span> </span>|<span> </span><a href="https://www.datasciencecentral.com/forum/topic/new">Forum Questions</a></li>
</ul>Things that Aren’t Working in Deep Learningtag:www.analyticbridge.datasciencecentral.com,2018-11-21:2004291:BlogPost:3894292018-11-21T17:00:42.000ZVincent Granvillehttps://www.analyticbridge.datasciencecentral.com/profile/VincentGranville
<p><strong><em>Summary:</em></strong><span> </span><em> This may be the golden age of deep learning but a lot can be learned by looking at where deep neural nets aren’t working yet. This can be a guide to calming the hype. It can also be a roadmap to future opportunities once these barriers are behind us. The full article is accessible <a href="https://www.datasciencecentral.com/profiles/blogs/things-that-aren-t-working-in-deep-learning" rel="noopener" target="_blank">here</a>, below is a…</em></p>
<p><strong><em>Summary:</em></strong><span> </span><em> This may be the golden age of deep learning but a lot can be learned by looking at where deep neural nets aren’t working yet. This can be a guide to calming the hype. It can also be a roadmap to future opportunities once these barriers are behind us. The full article is accessible <a href="https://www.datasciencecentral.com/profiles/blogs/things-that-aren-t-working-in-deep-learning" target="_blank" rel="noopener">here</a>, below is a snapshot.. </em></p>
<p>We are living in the golden age of deep learning. This is quite literally the technology that launched 10,000 startups (to paraphrase Kevin Kelly’s prophetic prediction from 2014 “The business plans of the next 10,000 startups are easy to forecast:<span> </span><em>Take X and add AI</em>.”) Well that happened.</p>
<p>Kelly was speaking more broadly about AI, but over the last four years we’ve come to understand that it’s about CNNs and RNN/LSTMs that are actually commercially ready and driving this. </p>
<p>Although the last two years have been fairly quiet in terms of new technique and technology breakthroughs for data science, it hasn’t been totally quiet. Like the emergence of Temporal Convolutional Nets<span> </span><a href="https://www.datasciencecentral.com/profiles/blogs/temporal-convolutional-nets-tcns-take-over-from-rnns-for-nlp-pred"><em><u>(TCNs) to replace RNNs</u></em></a><span> </span>in language translation, research goes on to see how deep learning and specifically CNN architecture can be pushed into new applications.</p>
<p> </p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/135609852?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/135609852?profile=original&width=225" width="225" class="align-center"/></a></p>
<p> </p>
<p><span><strong>Roadblocks to Deep Learning</strong></span></p>
<p>Which brings us to our current topic which is to understand what some of the major roadblocks in research are in trying to expand deep learning into new areas. </p>
<p>In calling our attention to ‘things that aren’t working in deep learning’, we aren’t suggesting that these things will never work, but rather that researchers are currently identifying major stumbling blocks to moving forward.</p>
<p>The value of this is two-fold. First it can help steer us away from projects that might on the surface look like deep learning will work, but in fact may take a year or years to work out. Second, we should keep our eye on these particular issues since once they are resolved they will represent opportunities that others may have decided weren’t possible.</p>
<p>Here are several that we spotted in the research.</p>
<p>Read full article <a href="https://www.datasciencecentral.com/profiles/blogs/things-that-aren-t-working-in-deep-learning" target="_blank" rel="noopener">here</a>. </p>
<p><em>To make sure you keep getting these emails, please add mail@newsletter.datasciencecentral.com to your address book or whitelist us. <span>To subscribe, </span><a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" target="_blank" rel="noopener">follow this link</a><span>. </span></em></p>Finding insights with graph analyticstag:www.analyticbridge.datasciencecentral.com,2018-10-04:2004291:BlogPost:3889692018-10-04T15:30:00.000ZElise Devauxhttps://www.analyticbridge.datasciencecentral.com/profile/EliseDevaux
<p><span>From detecting anomalies to understanding what are the key elements in a network, or highlighting communities, graph analytics reveal information that would otherwise remain hidden in your data. We will see how to integrate your graph analytics with Linkurious Enterprise to detect and investigate insights in your connected data.</span></p>
<p><span id="more-6665"></span></p>
<h2><span>What is graph analytics</span></h2>
<h3><span>Definition and…</span></h3>
<p><span>From detecting anomalies to understanding what are the key elements in a network, or highlighting communities, graph analytics reveal information that would otherwise remain hidden in your data. We will see how to integrate your graph analytics with Linkurious Enterprise to detect and investigate insights in your connected data.</span></p>
<p><span id="more-6665"></span></p>
<h2><span>What is graph analytics</span></h2>
<h3><span>Definition and methods</span></h3>
<p></p>
<p><span>Graph analytics is a set of tools and methods aiming at extracting knowledge from data modeled as a graph. The graph paradigm is ideal to make the best out of connected data</span><span>, which value resides for the most part in its relationships. But even with data modeled as a graph, extracting knowledge and providing insights can be challenging. Faced with multi-dimensional data and very large datasets, analysts need tools to accelerate the discovery of insights.</span></p>
<p></p>
<p><span>The field of graph theory has spawned multiple algorithms that analysts can rely on to find insights hidden in graph data. Below are the some of the popular graph algorithms and how they can help find insights for use-cases such as fraud, network management, anti-money, intelligence analysis or cybersecurity:</span></p>
<p></p>
<ul>
<li><b>Pattern matching algorithms<span> </span></b><span>allow to identify one or several subgraphs with a given structure within a graph. Example: A company node with the country property containing “Luxembourg” connected to at least five officer nodes with a registered address in France.</span></li>
<li><b>Traversal and pathfinding algorithms<span> </span></b><span>determine paths between nodes within the graph, without knowing what connections exist or how many of them separate the two nodes. In money laundering investigations, path analysis can help determine how money flows through a network of individuals, how it goes from company A to person B. Example: the <a href="https://en.wikipedia.org/wiki/Shortest_path_problem">shortest path algorithm.</a></span></li>
<li><b>Connectivity algorithms<span> </span></b><span>find the minimum number of nodes or edges that need to be removed to disconnect the remaining nodes from each other. It is helpful to determine weaknesses in an IT network for instance and find out which infrastructure points are sensitive and can take it down. Example: the <a href="https://en.wikipedia.org/wiki/Strongly_connected_component">Strongly Connected Components algorithm</a></span></li>
<li><b>Community detection algorithms</b><span> identify clusters or groups, of nodes densely connected within the graph. This is particularly helpful to find groups of people that might belong to a common criminal organization. Example: the <a href="https://en.wikipedia.org/wiki/Louvain_Modularity">Louvain method</a>, the label propagation algorithm.</span></li>
<li><b>Centrality algorithms</b><span> determine a node’s relative importance within a graph by looking at how connected it is to other nodes. It is used for instance to identify key people within organizations. Example: the <a href="https://en.wikipedia.org/wiki/PageRank">PageRank algorithm</a>, degree centrality, closeness centrality, betweenness centrality</span></li>
</ul>
<h3><span>Architecture blueprint for graph analytics</span></h3>
<p></p>
<p><span>Depending on your data, your use-case, and the questions you have to answer, technology and infrastructure can differ from one organization to another. But a generic graph analytics architecture usually consists of the following layers:</span></p>
<p></p>
<ul>
<li><span><strong>Linkurious Enterprise</strong>: the browser-based platform and its server are used by investigation teams to visualize and analyze graph data. It retrieves data in real-time from graph databases.</span></li>
<li><span><strong>Graph databases</strong>: transactional systems storing data as graphs and managing operations such as data retrieval or writing. They perfectly handle real-time queries, making them great online transaction processing (OLTP) systems.</span></li>
<li><span><strong>Graph processing systems</strong>: a set of analytical engines shipping with common graph algorithms and handling large-scale online analytical processing (OLAP) on graphs.</span></li>
</ul>
<div id="attachment_6667" class="wp-caption aligncenter"><img class="lazy size-full wp-image-6667 lazy-loaded" src="https://linkurio.us/wp-content/uploads/2018/09/data_processing.jpg" alt="graph analytics Linkurious schema" width="738" height="508"/><p class="wp-caption-text">Architecture blueprint for graph analytics</p>
<p class="wp-caption-text"></p>
</div>
<p><span>Linkurious Enterprise acts as a front-end where analysts and investigators can easily retrieve information. The data accessed by Linkurious Enterprise is stored in a graph database. Graph databases are well suited for real-time querying and long-term persistence but are usually not designed for running complex graph algorithms at scale. As a result, our clients tend to push this sort of workload to dedicated graph processing frameworks such as <a href="http://spark.apache.org/">Spark</a>/<a href="https://spark.apache.org/graphx/">GraphX</a>. The results are then persisted back in the graph database as new properties (eg a PageRank score property for example) and thus become available to Linkurious Enterprise.</span></p>
<h2><span>Applying graph analytics to the Paradise Papers data</span></h2>
<p></p>
<p><span>In this section, we take a closer look at a real-life graph dataset, the </span><a href="https://offshoreleaks.icij.org/pages/database"><span>Paradise Papers dataset</span></a><span>, created by the ICIJ to <a href="https://linkurio.us/blog/big-data-technology-fraud-investigations/">investigate the world offshore finance industry</a>. We use Linkurious Enterprise to query, analyze and visualize the data using graph analytics tools and methods.</span></p>
<p></p>
<h3><span>The setup</span></h3>
<div id="attachment_6669" class="wp-caption aligncenter"><img class="lazy wp-image-6669 lazy-loaded" src="https://linkurio.us/wp-content/uploads/2018/09/data_processing_2.png" alt="Linkurious graph analytics" width="744" height="553"/><p class="wp-caption-text">The setup used in our example</p>
<p class="wp-caption-text"></p>
</div>
<p><span>For the purpose of this example, we relied on the architecture pictured above:</span></p>
<ul>
<li><span>A Linkurious Enterprise instance</span></li>
<li><span>A <a href="https://linkurio.us/solution/neo4j/">Neo4j graph database</a></span></li>
<li><span>The </span><a href="https://neo4j.com/developer/graph-algorithms/"><span>Neo4j graph algorithms</span></a><span> library, a plugin that provides parallel versions of common graph algorithms for Neo4j exposed as Cypher procedures.</span></li>
</ul>
<h3><span>The Paradise Papers dataset</span></h3>
<p></p>
<p><span>The dataset is made of 1,582,953 nodes and 2,398,680 edges. It aggregates data from four investigations of the ICIJ: the Offshore Leaks, the Panama Papers, the Bahamas Leaks and the Paradise Papers.</span></p>
<p></p>
<p><span>The graph data model has four types of nodes and three types of edges as depicted below.</span></p>
<p></p>
<div id="attachment_6672" class="wp-caption aligncenter"><img class="lazy wp-image-6672 lazy-loaded" src="https://linkurio.us/wp-content/uploads/2018/09/data_model.png" alt="Paradise papers linkurious" width="543" height="355"/><p class="wp-caption-text">Graph data model of the Paradise Papers dataset</p>
<p class="wp-caption-text"></p>
</div>
<p><span>In the following sections, we will see how to use different graph analytics approaches such as graph pattern matching, PageRank analysis, and the Louvain community detection method. While implementing graph analytics requires some technical knowledge, we will see how Linkurious Enterprise can make graph analytics results accessible to every analyst via simple tools. Among these tools are query templates, an alert dashboard, and a visualization interface.</span></p>
<p></p>
<h3><span>Graph pattern matching in Linkurious Enterprise</span></h3>
<p></p>
<p><span>A simple method for identifying patterns in a graph is to use graph languages to describe the shape of the data you are looking for. As a developer, you can do it in the interface of your favorite graph database but also within the Linkurious Enterprise interface.</span></p>
<p></p>
<p><span>What if you want to be warned every time a certain graph pattern appears in your data? Via the Linkurious Enterprise alert system, you set up alerts for graph patterns you want to monitor. Every time a new match is detected in the database, it’s recorded and available for users to review. This is useful in a fraud monitoring context for instance where you’d want to be notified when instances of known fraud schemes occur.</span></p>
<p></p>
<p><span>In the video below, we set up a new alert in Linkurious Enterprise for a specific pattern. The alert contains a graph query looking for addresses tied to more than five entities or company officers.</span></p>
<p></p>
<p><iframe src="https://www.youtube.com/embed/A2-7xAg_3ug?wmode=opaque" width="560" height="315" frameborder="0" allowfullscreen="allowfullscreen"></iframe>
</p>
<p></p>
<p><span>Once the alert is saved, users access a match list and can start investigating the results. Below, we review one of the findings from the alert investigation interface. </span></p>
<p></p>
<p><iframe src="https://www.youtube.com/embed/zmEd_J3iq-M?wmode=opaque" width="560" height="315" frameborder="0" allowfullscreen="allowfullscreen"></iframe>
</p>
<p></p>
<p><span>When looking at a node representing a company, you may want to know what are all the other companies it is sharing the same addresses with. The answer can be retrieved manually, by expanding and filtering the data. Or it can be retrieved via a graph query, which requires technical skills. With Linkurious Enterprise’ query templates, you can apply pre-formatted graph queries with the click of a button and accelerate your data exploration. Users run query templates by right-clicking on a node in the visualization and choosing the desired template from the menu. </span></p>
<p></p>
<p><span>Below is an example of how to set up a query template. We configure it to retrieve, for a given company officer, all the other officers it is connected to via a shared address or a shared company.</span></p>
<p></p>
<p><iframe src="https://www.youtube.com/embed/-EfMaVCoAZU?wmode=opaque" width="560" height="315" frameborder="0" allowfullscreen="allowfullscreen"></iframe>
</p>
<p></p>
<p><span>Once the query is configured, users can easily access and run it from the visualization interface to speed up their investigations.</span></p>
<p></p>
<p><iframe src="https://www.youtube.com/embed/ySkVS3FRHS8?wmode=opaque" width="560" height="315" frameborder="0" allowfullscreen="allowfullscreen"></iframe>
</p>
<p></p>
<p><span>In addition to these features, users can rely on Linkurious Enterprise styling and filtering capabilities to analyze the data faster. Once the results of the query are displayed, styles and filters are essential to refine the results, reduce the noise and highlight the key elements.</span></p>
<p></p>
<p><span>In the next section, we see how to automate the identification of unusual companies within the French network using the PageRank algorithm and Linkurious Enterprise’s alert system.</span></p>
<p></p>
<h3><span>Identifying key nodes with the PageRank algorithm</span></h3>
<p></p>
<p><span>To use graph algorithms in Linkurious Enterprise, you will first need to run them on your backend and save their results as new properties in your graph database. In this example, we show how to identify key nodes in your network using the PageRank algorithm. This centrality algorithm will compute a score assessing the relative importance of various nodes within a network.</span></p>
<p><span>One line of code is enough to run the algorithm in Neo4j and create a new node property, “pagerank_g” with the resulting PageRank score.</span></p>
<p></p>
<table>
<tbody><tr><td><span>// Computation of PageRank<br/></span> <span>CALL algo.pageRank(null,null,{write:true,writeProperty:’pagerank_g’})</span></td>
</tr>
</tbody>
</table>
<p></p>
<p><span>Once this has been added to our graph, we can start exploiting the results in Linkurious Enterprise.</span></p>
<p><span>We created a new alert, leveraging the PageRank results. The query is simple: it searches for Entity nodes connected to other nodes (Countries, Officer, Intermediary) located in France. It also collects their PageRank scores and ranks them by order of importance. Every matching sub-graph is recorded by the alert system and can be investigated. By sorting results by their PageRank scores, we can focus our investigation on the most important companies within the French network.</span></p>
<p></p>
<table>
<tbody><tr class="alt-table-row"><td><span>// Detect French entities with a high PageRank</span><p></p>
<p><span>MATCH (a:Entity)-[r]-(b)<br/></span> <span>WHERE b.countries = “France”<br/></span> <span>WITH a.pagerank as score, a, COLLECT( distinct r) as r, COLLECT( distinct b) as b, count(b) as degree<br/></span> <span>RETURN a, score, a.name as name, r, b, degree<br/></span> <span>ORDER BY score DESC</span></p>
</td>
</tr>
</tbody>
</table>
<p></p>
<p><span>In the example below, we review one of the top matches recorded by the alert system. </span></p>
<p></p>
<p><iframe src="https://www.youtube.com/embed/J2ARFuykM_A?wmode=opaque" width="560" height="315" frameborder="0" allowfullscreen="allowfullscreen"></iframe>
</p>
<p></p>
<p><span>In addition to these features, users can rely on Linkurious Enterprise styling and filtering capabilities to analyze the data faster. For instance, it’s possible to size and filter the nodes based on their PageRank score to get a faster understanding of the situations as depicted in the image below.</span></p>
<p></p>
<div id="attachment_6688" class="wp-caption aligncenter"><img class="lazy size-full wp-image-6688 lazy-loaded" src="https://linkurio.us/wp-content/uploads/2018/09/sizing.png" alt="style and analytics" width="941" height="452"/><p class="wp-caption-text"><br/> A size is applied to “location” nodes based on their PageRank score to highlight nodes of importance.</p>
</div>
<p><span>By enriching the data with additional information, the PageRank algorithm helped us focus on nodes of interest. The alert system in Linkurious Enterprise helps us classify the results and provides a user-friendly interface for investigation. In the next section, we see how to detect community of interest with a single click using the Louvain algorithm and the query template system.</span></p>
<h3><span>Identifying interesting communities via the Louvain modularity</span></h3>
<p></p>
<p><span>In the example below, we implement the Louvain algorithm to identify communities within our network. We look specifically at communities of company officers based on their relationships. The snippet of code below identifies communities and adds a new property “communityLouvain” property to each node, representing the community it belongs to.</span></p>
<p></p>
<table>
<tbody><tr><td><span>// Computation of Louvain modularity</span><p></p>
<p><span>CALL algo.louvain(<br/></span> <span> ‘MATCH (p:Officer) RETURN id(p) as id’,<br/></span> <span> ‘MATCH (p1:Officer)-[:OFFICER_OF]->(:Entity)<-[:OFFICER_OF]-(p2:Officer)<br/></span> <span> RETURN id(p1) as source, id(p2) as target’,<br/></span> <span> {graph:’cypher’,write:true});</span></p>
</td>
</tr>
</tbody>
</table>
<p></p>
<p><span>Then, we leverage the data generated by the algorithm in a query template to retrieve in a click for a given “Officer” node, the other officers belonging to the same community. Instead of manually exploring each of the nodes’ </span><span>neighbors to </span><span>identify a potential community, the query template instantly provides an answer the analysts can then refine. Below is the code used in the query template.</span></p>
<p></p>
<table>
<tbody><tr class="alt-table-row"><td><span>//Retrieve the officer nodes who belong to the same community</span><p></p>
<p><span>MATCH (a:Officer)<br/></span> <span>WHERE ID(a) = {{“My param”:node}}<br/></span> <span>WITH a<br/></span> <span>MATCH p = (a:Officer)-[*..4]-(b:Officer)<br/></span> <span>WHERE a.communityLouvain = b.communityLouvain<br/></span> <span>RETURN p</span></p>
</td>
</tr>
</tbody>
</table>
<p></p>
<p><span>We can now retrieve, in a click, officers of the same community from any given officer in the visualization interface. In the example below, we apply this to Boris Rotemberg, a Russian oligarch, opening an investigation on his close connections. Once the results of the query are displayed, styles and filters are essential to refine the results, reduce the noise and highlight the key elements.</span></p>
<p></p>
<p><iframe src="https://www.youtube.com/embed/ZlO-4Kif1Bo?wmode=opaque" width="560" height="315" frameborder="0" allowfullscreen="allowfullscreen"></iframe>
</p>
<p></p>
<p><span>Graph analytics and graph visualization are complementary. The existing graph analytics tools and methods make it possible to extract information from large amounts of connected data, generating valuable insights.</span></p>
<p></p>
<p>With platforms like Linkurious Enterprise, every user can take advantage of graph analytics from their browser via an intuitive interface. From detecting financial crimes, such as money laundering or tax evasion, to spotting fraud, or fighting organized crime, analysts find the insights they need.</p>Lots of Open Source Datasets to Make Your AI Bettertag:www.analyticbridge.datasciencecentral.com,2018-10-03:2004291:BlogPost:3887132018-10-03T16:49:20.000ZVincent Granvillehttps://www.analyticbridge.datasciencecentral.com/profile/VincentGranville
<p><strong>Summary</strong>: There are several approaches to reducing the cost of training data for AI, one of which is to get it for free. Here are some excellent sources.</p>
<p>Recently we wrote that training data (not just data in general) is the new oil. It’s the difficulty and expense of acquiring labeled training data that causes many deep learning projects to be abandoned.</p>
<p>It also matters a great deal just how good you want your new deep learning app to be. A 2016 study by…</p>
<p><strong>Summary</strong>: There are several approaches to reducing the cost of training data for AI, one of which is to get it for free. Here are some excellent sources.</p>
<p>Recently we wrote that training data (not just data in general) is the new oil. It’s the difficulty and expense of acquiring labeled training data that causes many deep learning projects to be abandoned.</p>
<p>It also matters a great deal just how good you want your new deep learning app to be. A 2016 study by Goodfellow, Bengio and Courville concluded you could get ‘acceptable’ performance with about 5,000 labeled examples per category BUT it would take 10 Million labeled examples per category to “match or exceed human performance”.</p>
<p>There are a number of technologies coming up through research now that promise more accurate auto labeling to make creating training data less costly and time consuming. Snorkel from the Stanford Dawn Project is one we covered recently. This area is getting a lot of research attention.</p>
<p><a href="http://api.ning.com:80/files/dn3W6VVS1GWcF*80Wl86JnEMHV8DCKGGWYr9if78ZEh4n99IIfw2xcZrbkaQt1PpKJj-BEuKaN7cAyv9mOdHDV29s3YGjvg2/Capture.PNG" target="_self"><img src="http://api.ning.com:80/files/dn3W6VVS1GWcF*80Wl86JnEMHV8DCKGGWYr9if78ZEh4n99IIfw2xcZrbkaQt1PpKJj-BEuKaN7cAyv9mOdHDV29s3YGjvg2/Capture.PNG" width="415" class="align-center"/></a></p>
<p></p>
<p>Another approach is to build on someone else’s work using publicly available datasets. You can begin by building your model in the borrowed set, you can blend your data with the borrowed data, or you could use the transfer learning approach to repurpose the front end of an existing model to train on your more limited data.</p>
<p>Whatever your strategy, the ability to build on publicly available datasets is always something you’ll want to consider, so your ability to find them becomes key.</p>
<p>Here are some notes on where you might start your search. These won’t all be labeled image and text but a lot of them are. And for those of you looking to use ML and statistical techniques, there’s plenty here for you too.</p>
<p><a href="https://www.datasciencecentral.com/profiles/blogs/lots-of-free-open-source-datasets-to-make-your-ai-better" target="_blank" rel="noopener">Read full article here</a>. </p>
<p><em>To make sure you keep getting these emails, please add mail@newsletter.datasciencecentral.com </em><span><em>to your address book or whitelist us. </em> </span></p>Run Deep learning models for free using google colaboratorytag:www.analyticbridge.datasciencecentral.com,2018-10-01:2004291:BlogPost:3887082018-10-01T15:07:17.000Zsuresh kumar Gorakalahttps://www.analyticbridge.datasciencecentral.com/profile/sureshkumarGorakala
<h3>What is Google Colab:</h3>
<p><br></br><span>We all know that deep learning algorithms improve the accuracy of AI applications to great extent. But this accuracy comes with requiring heavy computational processing units such as GPU for developing deep learning models. Many of the machine learning developers cannot afford GPU as they are very costly and find this as a roadblock for learning and developing Deep learning applications. To help the AI, machine learning developers Google has released…</span></p>
<h3>What is Google Colab:</h3>
<p><br/><span>We all know that deep learning algorithms improve the accuracy of AI applications to great extent. But this accuracy comes with requiring heavy computational processing units such as GPU for developing deep learning models. Many of the machine learning developers cannot afford GPU as they are very costly and find this as a roadblock for learning and developing Deep learning applications. To help the AI, machine learning developers Google has released a free cloud based service Google Colaboratory - Jupyter notebook environment with free GPU processing capabilities with no strings attached for using this service. It is a ready to use service which requires no set at all. </span><br/><br/><span>Any AI developers can use this free service to develop deep learning applications using popular AI libraries like Tensorflow, Pytorch, Keras, etc.</span></p>
<p></p>
<h3>Setting up colab:</h3>
<p><br/><i>Go to google drive → new item → More → colaboratory<span> </span></i><br/><br/></p>
<div class="separator"><a href="https://1.bp.blogspot.com/-S6WyVM9ZEfQ/W5QLeuTPwaI/AAAAAAAAGUA/geEtAT20lNUpH5ruI4wUqL4KPlh36WBbgCLcBGAs/s1600/google%2Bcolab%2Bintro.png"><img border="0" height="188" src="https://1.bp.blogspot.com/-S6WyVM9ZEfQ/W5QLeuTPwaI/AAAAAAAAGUA/geEtAT20lNUpH5ruI4wUqL4KPlh36WBbgCLcBGAs/s320/google%2Bcolab%2Bintro.png" width="320" class="align-center"/></a></div>
<p><br/><span>This opens up a python Jupyter notebook in browser.</span><br/><br/></p>
<div class="separator"><a href="https://2.bp.blogspot.com/-x4vpgKvfLio/W5QLjzuZd8I/AAAAAAAAGUE/XedcX87NjQ0bH-j7afeIp3PX3XGxHnHEgCLcBGAs/s1600/colab%2Bpython%2Bnotebook.png"><img border="0" height="75" src="https://2.bp.blogspot.com/-x4vpgKvfLio/W5QLjzuZd8I/AAAAAAAAGUE/XedcX87NjQ0bH-j7afeIp3PX3XGxHnHEgCLcBGAs/s320/colab%2Bpython%2Bnotebook.png" width="320" class="align-center"/></a></div>
<p><br/><span>By default, the Jupyter notebook runs on Python 2.7 version and CPU processor. We may change the python version to Python 3.6 and processing capability to GPU by changing the settings as shown below:</span><br/><br/><i>Go to Runtime → Change runtime type<span> </span></i><br/><br/></p>
<div class="separator"><a href="https://3.bp.blogspot.com/-VJzJTIqn7vE/W5QLoz79ZFI/AAAAAAAAGUI/w_Ead9jgisgZtqZxzWlwCPcJ9taPMjWcgCLcBGAs/s1600/colab%2Bgpu%2Bruntime.png"><img border="0" height="192" src="https://3.bp.blogspot.com/-VJzJTIqn7vE/W5QLoz79ZFI/AAAAAAAAGUI/w_Ead9jgisgZtqZxzWlwCPcJ9taPMjWcgCLcBGAs/s320/colab%2Bgpu%2Bruntime.png" width="320" class="align-center"/></a></div>
<p><br/><span>This opens up a Notebook settings pop-up where we can change Runtime Type to Python 3.6 and processing Hardware to GPU.</span><br/><br/></p>
<div class="separator"><a href="https://1.bp.blogspot.com/-YfE2Rc19ouU/W5QLuDr0giI/AAAAAAAAGUM/k-pRP6V1BLEqOYDGm4N7nWdf-46aVicJACLcBGAs/s1600/python%2Bcolab%2Bgpu.png"><img border="0" height="128" src="https://1.bp.blogspot.com/-YfE2Rc19ouU/W5QLuDr0giI/AAAAAAAAGUM/k-pRP6V1BLEqOYDGm4N7nWdf-46aVicJACLcBGAs/s320/python%2Bcolab%2Bgpu.png" width="320" class="align-center"/></a></div>
<p><br/><span>Bingo, your python environment with the processing power of GPU is ready use.</span><br/><br/><b>Important things to remember:</b><span> </span></p>
<ul>
<li>The supported browsers are Chrome and Firefox</li>
<li>Currently only Python is supported</li>
<li>We can you use upto 12 hours of processing time in one go</li>
</ul>
<p><span>Let’s check if our newly created Jupyter notebook works perfectly. Run below commands and see if we are getting expected results. </span><br/><br/></p>
<div class="separator"><a href="https://3.bp.blogspot.com/-A7P6JXA656k/W5QL0FW427I/AAAAAAAAGUY/iHX8814IkbkC-L91S8kTPrwLjWkuh7HQQCLcBGAs/s1600/python%2Bcolab%2Bgoogle%2Bai.png"><img border="0" height="115" src="https://3.bp.blogspot.com/-A7P6JXA656k/W5QL0FW427I/AAAAAAAAGUY/iHX8814IkbkC-L91S8kTPrwLjWkuh7HQQCLcBGAs/s320/python%2Bcolab%2Bgoogle%2Bai.png" width="320" class="align-center"/></a></div>
<p><br/><span>By default most frequently used python libraries such as Numpy, Pandas, scipy, Sklearn, Matplotlib etc are pre-installed when we create a notebook. Below we can see plotting </span></p>
<div class="separator"><a href="https://3.bp.blogspot.com/-IOBqLxS9lyQ/W5QL7FO9ODI/AAAAAAAAGUg/_BqQ-wfrgiw1J7Uc1PXrqpa8UG7dC6HVQCLcBGAs/s1600/google%2Bpython%2Bcolab%2Bai.png"><img border="0" height="137" src="https://3.bp.blogspot.com/-IOBqLxS9lyQ/W5QL7FO9ODI/AAAAAAAAGUg/_BqQ-wfrgiw1J7Uc1PXrqpa8UG7dC6HVQCLcBGAs/s320/google%2Bpython%2Bcolab%2Bai.png" width="320" class="align-center"/></a></div>
<div class="separator"></div>
<div class="separator"><h3>For Running Machine Learning example, <a href="http://www.dataperspective.info/2018/09/getting-started-with-google-laboratory-deep-learning.html" target="_blank" rel="noopener">find here</a></h3>
</div>Who cares if unsupervised machine learning is supervised learning in disguise?tag:www.analyticbridge.datasciencecentral.com,2018-09-23:2004291:BlogPost:3886892018-09-23T19:34:28.000ZDanko Nikolichttps://www.analyticbridge.datasciencecentral.com/profile/DankoNikolic
<p><span>Previously, we saw how unsupervised learning actually <a href="https://www.analyticbridge.datasciencecentral.com/profiles/blogs/supervised-learning-in-disguise-the-truth-about-unsupervised" rel="noopener" target="_blank">has built-in supervision</a>, albeit hidden from the user.</span></p>
<p><span>In this post we will see how supervised and unsupervised learning algorithms share more in common than the textbooks would suggest. As a matter of fact, both classes can use identical…</span></p>
<p><span>Previously, we saw how unsupervised learning actually <a href="https://www.analyticbridge.datasciencecentral.com/profiles/blogs/supervised-learning-in-disguise-the-truth-about-unsupervised" target="_blank" rel="noopener">has built-in supervision</a>, albeit hidden from the user.</span></p>
<p><span>In this post we will see how supervised and unsupervised learning algorithms share more in common than the textbooks would suggest. As a matter of fact, both classes can use identical equations for creating mathematical models of the data, and both can use identical learning algorithms to find optimal parameter values for those models.</span></p>
<p><span>The consequence of this relation is that one can easily transform a supervised learning method into an unsupervised one, and vice versa. The only change you need to do is determine how Y will be computed; that is, you have to decide how your error for learning (training) will be defined.</span></p>
<p><span>You may have not noticed so far, but the general linear model (GLM) has been used as a versatile model with a versatile set of learning methods in order to create various supervised and unsupervised learning methods.</span></p>
<p><span>When one thinks of GLM, probably the first methods that come to mind are regression and inferential statistics (e.g., ANOVA), both of which fall into the category of supervised learning. However, GLM has been used just as extensively in unsupervised setups. This relates to dimensionality reduction techniques in which the algorithm is not being told with which dimensions particular data points are being saturated. Rather, the algorithm is left to “discover” on its own those dimensions. <a href="https://en.wikipedia.org/wiki/Principal_component_analysis">Principal component analysis (PCA)</a> and various forms of <a href="https://en.wikipedia.org/wiki/Factor_analysis">factor analyses</a> are all examples of unsupervised applications of GLM.</span></p>
<p><span>This easy jump from supervised to unsupervised is not just a property of simple models such as GLM. Exactly the same applies to computationally elaborate methods such as <a href="https://en.wikipedia.org/wiki/Deep_learning">deep learning neural networks</a>. A neural network can be easily set to operate with supervision or unsupervised; most commonly known ones are supervised applications, such as image recognition in which humans initially provided labels about the categories to which each image belongs. The network then learns that assignment, and if everything is done right, is capable of correctly classifying new images representing those trained categories (e.g., distinguishing human faces from houses; from tools; etc.).</span></p>
<p><span>Neural networks can be used just as efficiently in an unsupervised learning setup. Perhaps the most common examples are auto-encoders, which are capable of detecting anomalies in data. Here, the network is trained to produce an output that has exactly the same values as the inputs it receives. The difference between what it has generated and what it should have generated i.e., the error, is used for adjusting its synaptic weights. The training continues until the network can do the job satisfactorily well using data that have not been used for training (i.e., test data set).</span></p>
<p><span>What makes this learning non-trivial is that the topology of the neural network is made such that at least one of the hidden layers has a smaller number of units than the number of units in the input (and output) layer(s). This forces the network to find a representation of the data with reduced dimensionality, similar to that performed by PCA and factor analyses.</span></p>
<p><span>Such networks are useful for applications in which labels possibly do not exist, or would be impractically difficult to obtain. Also, they can be very useful for applications in which collection of labels may take years, such as for example, <a href="https://en.wikipedia.org/wiki/Fraud#Detection">fraud detection</a> and <a href="https://en.wikipedia.org/wiki/Predictive_maintenance">predictive maintenance</a>.</span></p>
<p><span>A piece of advice to data scientists: don’t be afraid to turn your supervised learning method into an unsupervised one or vice versa, if you see that this fits your problem. You will need some creative thinking and more coding than usual but as a result, you may end up with exactly the solution that the task you are solving requires.</span></p>
<p><span>Here is one general rule to keep in mind: supervised learning methods will always be capable of solving a wider range of different real-life problems than unsupervised ones. This is because supervised ones are much more specialized: their error computation is already determined by the algorithm. In addition, error computation is limited to whatever can be extracted from the input data. In contrast, unsupervised methods, being open to error data coming from the outside world, can basically take advantage of the errors “computed” by the entire external universe – including the physical events underlying the actual phenomenon that these methods are trying to model (e.g., a real physical event of a machine becoming broken provides the training information for a predictive model of whether a machine will soon be broken).</span></p>
<p><span>All other things being equal, supervised methods will require less data and computational power to achieve a similar result. Unsupervised algorithms can learn to classify objects, as for example <a href="https://arxiv.org/pdf/1112.6209v5.pdf">cats</a>. But this comes with the expense of a lot more resources than needed for a supervised equivalent. In case of Google’s algorithm that discovered cats in images, 10 million images were required, 1 billion connections, 16,000 computer cores, three days of computation and a team of eight scientists from Google and Stanford. That’s a lot of resources.</span></p>
<p><span>In conclusion, we now know the terms ‘supervised’ and ‘unsupervised’ may be misleading, as there is quite a bit of supervision in unsupervised learning. Maybe a better analogy would be if supervised learning was referred to as ‘micro-managed learning’, and instead of unsupervised learning we used the term ‘macro-managed learning’. These two would probably better describe what is actually happening in the background of the respective algorithms.</span></p>
<p><span>Knowing that supervised and unsupervised methods can be seen as two different applications of the same general set of tools can be quite useful for creative problem solving in data science. By assuming a bit of an inventive attitude, one can relatively effortlessly convert an existing method from one form to another, as circumstances require.</span></p>Introduction to Deep Learningtag:www.analyticbridge.datasciencecentral.com,2018-09-21:2004291:BlogPost:3889382018-09-21T18:00:00.000ZVincent Granvillehttps://www.analyticbridge.datasciencecentral.com/profile/VincentGranville
<p><em>Guest blog post by Zied HY. Zied is <span>Senior Data Scientist at Capgemini Consulting. He is</span><span> specialized in building predictive models utilizing both traditional statistical methods (Generalized Linear Models, Mixed Effects Models, Ridge, Lasso, etc.) and modern machine learning techniques (XGBoost, Random Forests, Kernel Methods, neural networks, etc.). Zied</span><span> run some workshops for university students (ESSEC, HEC, Ecole polytechnique) interested in Data…</span></em></p>
<p><em>Guest blog post by Zied HY. Zied is <span>Senior Data Scientist at Capgemini Consulting. He is</span><span> specialized in building predictive models utilizing both traditional statistical methods (Generalized Linear Models, Mixed Effects Models, Ridge, Lasso, etc.) and modern machine learning techniques (XGBoost, Random Forests, Kernel Methods, neural networks, etc.). Zied</span><span> run some workshops for university students (ESSEC, HEC, Ecole polytechnique) interested in Data Science and its applications, and he is </span><span>the co-founder of Global International Trading (GIT), a central purchasing office based in Paris.</span></em></p>
<p>I have started reading about Deep Learning for over a year now through several articles and research papers that I came across mainly in LinkedIn, Medium and Arxiv.</p>
<p><a href="http://api.ning.com:80/files/GjwTpjp3Hv0V7uY7wafGPsSxom9UjC9HKBCSmByO6mIS0i5FZjdmJecEb6s5wbNbrGQ0G3Jg0lhYkefS08bjrw7iDZC-DlHo/Capture.PNG" target="_self"><img src="http://api.ning.com:80/files/GjwTpjp3Hv0V7uY7wafGPsSxom9UjC9HKBCSmByO6mIS0i5FZjdmJecEb6s5wbNbrGQ0G3Jg0lhYkefS08bjrw7iDZC-DlHo/Capture.PNG" width="666" class="align-center"/></a></p>
<p>When I virtually attended the MIT 6.S191 Deep Learning courses during the last few weeks, I decided to begin to put some structure in my understanding of Neural Networks through this series of articles.</p>
<p>I will go through the first four courses:</p>
<ol>
<li>Introduction to Deep Learning</li>
<li>Sequence Modeling with Neural Networks</li>
<li>Deep learning for computer vision - Convolutional Neural Networks</li>
<li>Deep generative modeling</li>
</ol>
<p>For each course, I will outline the main concepts and add more details and interpretations from my previous readings and my background in statistics and machine learning.</p>
<p>Starting from the second course, I will also add an application on an open-source dataset for each course.</p>
<p>That said, let’s go!</p>
<p>Read the first part, <a href="https://www.datasciencecentral.com/profiles/blogs/introduction-to-deep-learning" target="_blank" rel="noopener">here</a>. </p>Analytics Translator – The Most Important New Role in Analyticstag:www.analyticbridge.datasciencecentral.com,2018-09-12:2004291:BlogPost:3888422018-09-12T23:30:00.000ZVincent Granvillehttps://www.analyticbridge.datasciencecentral.com/profile/VincentGranville
<p><strong><em>Summary:</em></strong><em> The role of Analytics Translator was recently identified by McKinsey as the most important new role in analytics, and a key factor in the failure of analytic programs when the role is absent.</em></p>
<p> <a href="https://api.ning.com/files/Y5oclxjGfk79NjR6Sq1eV4i4s0nDgeaA3rF-LfiliryKp-kTctZenwzHMCAtAdwrJFJBwUql2Z6RS3rEoCFtIOt2fT9ZheNN/goodtranslator.png" target="_self"><img class="align-center" src="https://api.ning.com/files/Y5oclxjGfk79NjR6Sq1eV4i4s0nDgeaA3rF-LfiliryKp-kTctZenwzHMCAtAdwrJFJBwUql2Z6RS3rEoCFtIOt2fT9ZheNN/goodtranslator.png?width=500" width="500"></img></a></p>
<p>The role of Analytics Translator was…</p>
<p><strong><em>Summary:</em></strong><em> The role of Analytics Translator was recently identified by McKinsey as the most important new role in analytics, and a key factor in the failure of analytic programs when the role is absent.</em></p>
<p> <a href="https://api.ning.com/files/Y5oclxjGfk79NjR6Sq1eV4i4s0nDgeaA3rF-LfiliryKp-kTctZenwzHMCAtAdwrJFJBwUql2Z6RS3rEoCFtIOt2fT9ZheNN/goodtranslator.png" target="_self"><img src="https://api.ning.com/files/Y5oclxjGfk79NjR6Sq1eV4i4s0nDgeaA3rF-LfiliryKp-kTctZenwzHMCAtAdwrJFJBwUql2Z6RS3rEoCFtIOt2fT9ZheNN/goodtranslator.png?width=500" width="500" class="align-center"/></a></p>
<p>The role of Analytics Translator was recently<span> identified by McKinsey </span>as the most important new role in analytics, and a key factor in the failure of analytic programs when the role is absent.</p>
<p>As our profession of data science has evolved, any number of authors including myself has offered different taxonomies to describe the differences among the different ‘tribes’ of data scientists. We may disagree on the categories but we agree that we’re not all alike.</p>
<p>Ten years ago, around the time that Hadoop and Big Data went open source there was still a perception that data scientists should be capable of performing every task in the analytics lifecycle. </p>
<p>The obvious skills were model creation and deployment, and data blending and munging. Other important skills in this bucket would have included setting up data infrastructure (data lakes, streaming architectures, Big Data NoSQL DBs, etc.). And finally the skills that were just assumed to come with seniority, storytelling (explaining it to executive sponsors), and great project management skills.</p>
<p>Frankly, when I entered the profession, this was true and for the most part, in those early projects, I did indeed do it all.</p>
<p><span><strong>Data Science – A Profession of Specialties</strong></span></p>
<p>It’s fair to say that today nobody expects this. Ours is rapidly becoming a field of specialists, defined by data types (NLP, image, streaming, classic static data), role (data engineer, junior data scientist, senior data scientist), or by use cases (predictive maintenance, inventory forecasting, personalized marketing, fraud detection, chatbot UIs, etc.). These aren’t rigid boundaries and a good data scientist may bridge several of these, but not all.</p>
<p><em>Read full article <a href="https://www.datasciencecentral.com/profiles/blogs/analytics-translator-the-most-important-new-role-in-analytics" target="_blank" rel="noopener">here</a>. (By Bill Vorhies)</em></p>New Perspective on the Central Limit Theorem and Statistical Testingtag:www.analyticbridge.datasciencecentral.com,2018-09-11:2004291:BlogPost:3887582018-09-11T03:07:16.000ZVincent Granvillehttps://www.analyticbridge.datasciencecentral.com/profile/VincentGranville
<p>You won't learn this in textbooks, college classes, or data camps. Some of the material in this article is very advanced yet presented in simple English, with an Excel implementation for various statistical tests, and no arcane theory, jargon, or obscure theorems. It has a number of applications, in finance in particular. This article covers several topics under a unified approach, so it was not easy to find a title. In particular, we discuss:</p>
<ul>
<li>When the central limit theorem…</li>
</ul>
<p>You won't learn this in textbooks, college classes, or data camps. Some of the material in this article is very advanced yet presented in simple English, with an Excel implementation for various statistical tests, and no arcane theory, jargon, or obscure theorems. It has a number of applications, in finance in particular. This article covers several topics under a unified approach, so it was not easy to find a title. In particular, we discuss:</p>
<ul>
<li>When the central limit theorem fails: what to do, and case study</li>
<li>Various original statistical tests, some unpublished, for instance to test if an empirical statistical distribution (based on observations) is symmetric or not, or whether two distributions are identical</li>
<li>The power and mysteries of stable (also called divisible) statistical distributions</li>
<li>Dealing with weighted sums of random variables, especially with decaying weights</li>
<li>Fun number theory problems and algorithms associated with these statistical problems</li>
<li>Decomposing a (theoretical or empirical / observed) statistical distribution into elementary components, just like decomposing a complex molecule into atoms</li>
</ul>
<p>The focus is on principles, methodology, and techniques applicable to, and useful in many applications. For those willing to do a deeper dive on these topics, many references are provided. This article, written as a tutorial, is accessible to professionals with elementary statistical knowledge, like stats 101. It is also written in a compact style, so that you can grasp all the material in hours rather than days. This simple article covers topics that you could learn in MIT, Stanford, Berkeley, Princeton or Harvard classes aimed at PhD students. Some is state-of-the-art research results published here for the first time, and made accessible to the data science of data engineer novice. I think mathematicians (being one myself) will also enjoy it. Yet, emphasis is on applications rather than theory. </p>
<p><a href="https://api.ning.com/files/UYRM2GNIhq-ru2JsTXAkZRR5ELOXn6MbeILUWc1DK-pgcpBeYwZIVIipUvyMnnkK*aiDh1EwxZ9PlWWlMct*15jpKxaHJYGK/Capture.PNG" target="_self"><img src="https://api.ning.com/files/UYRM2GNIhq-ru2JsTXAkZRR5ELOXn6MbeILUWc1DK-pgcpBeYwZIVIipUvyMnnkK*aiDh1EwxZ9PlWWlMct*15jpKxaHJYGK/Capture.PNG" class="align-center"/></a></p>
<p>Finally, we focus here on sums of random variables. The next article will focus on mixtures rather than sums, providing more flexibility for modeling purposes, or to decompose a complex distribution in elementary components. In both cases, my approach is mostly non-parametric, and based on robust statistical techniques, capable of handling outliers without problems, and not subject to over-fitting.</p>
<p><strong>Content</strong></p>
<p>1. Central Limit Theorem: New Approach</p>
<p>2. Stable and Attractor Distributions</p>
<ul>
<li>Using decaying weights</li>
<li>More about stable distributions and their applications</li>
</ul>
<p>3. Non CLT-compliant Weighted Sums, and their Attractors</p>
<ul>
<li>Testing for normality</li>
<li>Testing for symmetry and dependence on kernel</li>
<li>Testing for semi-stability</li>
<li>Conclusions</li>
</ul>
<p>Read full article <a href="https://www.datasciencecentral.com/profiles/blogs/new-perspective-on-central-limit-theorem-and-related-stats-topics" target="_blank" rel="noopener">here</a>. </p>Free Book: Applied Stochastic Processestag:www.analyticbridge.datasciencecentral.com,2018-09-08:2004291:BlogPost:3880372018-09-08T17:16:14.000ZVincent Granvillehttps://www.analyticbridge.datasciencecentral.com/profile/VincentGranville
<p><span>Full title: <em>Applied Stochastic Processes, Chaos Modeling, and Probabilistic Properties of Numeration Systems</em>. Published June 2, 2018. Author: Vincent Granville, PhD. (104 pages, 16 chapters.)</span></p>
<p><span>This book is intended for professionals in data science, computer science, operations research, statistics, machine learning, big data, and mathematics. In 100 pages, it covers many new topics, offering a fresh perspective on the subject. It is accessible to…</span></p>
<p><span>Full title: <em>Applied Stochastic Processes, Chaos Modeling, and Probabilistic Properties of Numeration Systems</em>. Published June 2, 2018. Author: Vincent Granville, PhD. (104 pages, 16 chapters.)</span></p>
<p><span>This book is intended for professionals in data science, computer science, operations research, statistics, machine learning, big data, and mathematics. In 100 pages, it covers many new topics, offering a fresh perspective on the subject. It is accessible to practitioners with a two-year college-level exposure to statistics and probability. The compact and tutorial style, featuring many applications (Blockchain, quantum algorithms, HPC, random number generation, cryptography, Fintech, web crawling, statistical testing) with numerous illustrations, is aimed at practitioners, researchers and executives in various quantitative fields.</span></p>
<p><span><a href="https://api.ning.com/files/CDU-2PoeH6QL59VbutPY3gfauFigHvxp-W***7EvE8CSAAkPRxclPHjC0vp2k7x5xq9usL-RBvb5VpM0Fl1PI5v3z1ABZ2*g/Capture.PNG" target="_self"><img src="https://api.ning.com/files/CDU-2PoeH6QL59VbutPY3gfauFigHvxp-W***7EvE8CSAAkPRxclPHjC0vp2k7x5xq9usL-RBvb5VpM0Fl1PI5v3z1ABZ2*g/Capture.PNG" width="298" class="align-center"/></a></span></p>
<p><span>New ideas, advanced topics, and state-of-the-art research are discussed in simple English, without using jargon or arcane theory. It unifies topics that are usually part of different fields (data science, operations research, dynamical systems, computer science, number theory, probability) broadening the knowledge and interest of the reader in ways that are not found in any other book. This short book contains a large amount of condensed material that would typically be covered in 500 pages in traditional publications. Thanks to cross-references and redundancy, the chapters can be read independently, in random order.</span></p>
<p><span>This book is available for Data Science Central members exclusively. The text in blue consists of clickable links to provide the reader with additional references. Source code and Excel spreadsheets summarizing computations, are also accessible as hyperlinks for easy copy-and-paste or replication purposes. The most recent version of this book is available <a href="https://www.datasciencecentral.com/page/free-books-1">from this link</a>, accessible to DSC members only. </span></p>
<p><span><strong>About the author</strong></span></p>
<p><span>Vincent Granville is a start-up entrepreneur, patent owner, author, investor, pioneering data scientist with 30 years of corporate experience in companies small and large (eBay, Microsoft, NBC, Wells Fargo, Visa, CNET) and a former VC-funded executive, with a strong academic and research background including Cambridge University.</span></p>
<div><a href="https://api.ning.com/files/UIp6-tRQFlaQCp5MXb3Vc5xIwb-42rDt2lXnT*8T4104w2gLB3bVd0vmn8TRWTncn1aE52CRYnjMOBlSc76yYK8sl62kFyyq/redline.png" target="_self"><img src="https://api.ning.com/files/UIp6-tRQFlaQCp5MXb3Vc5xIwb-42rDt2lXnT*8T4104w2gLB3bVd0vmn8TRWTncn1aE52CRYnjMOBlSc76yYK8sl62kFyyq/redline.png" width="750" class="align-center"/></a></div>
<p><span><strong>Download the book (members only) </strong></span></p>
<p><span><a href="https://www.datasciencecentral.com/page/free-books-1" target="_blank" rel="noopener">Click here</a> to get the book. For Data Science Central members only. </span><span>If you have any issues accessing the book please contact us at info@datasciencecentral.com.</span></p>
<div><a href="https://api.ning.com/files/UIp6-tRQFlaQCp5MXb3Vc5xIwb-42rDt2lXnT*8T4104w2gLB3bVd0vmn8TRWTncn1aE52CRYnjMOBlSc76yYK8sl62kFyyq/redline.png" target="_self"><img src="https://api.ning.com/files/UIp6-tRQFlaQCp5MXb3Vc5xIwb-42rDt2lXnT*8T4104w2gLB3bVd0vmn8TRWTncn1aE52CRYnjMOBlSc76yYK8sl62kFyyq/redline.png" width="750" class="align-center"/></a></div>
<p><span><strong>Content</strong></span></p>
<p><span>The book covers the following topics:</span><span> </span></p>
<p><span><strong>1. Introduction to Stochastic Processes</strong></span></p>
<p><span>We introduce these processes, used routinely by Wall Street quants, with a simple approach consisting of re-scaling random walks to make them time-continuous, with a finite variance, based on the central limit theorem.</span></p>
<ul>
<li><span>Construction of Time-Continuous Stochastic Processes</span></li>
<li><span>From Random Walks to Brownian Motion</span></li>
<li><span>Stationarity, Ergodicity, Fractal Behavior</span></li>
<li><span>Memory-less or Markov Property</span></li>
<li><span>Non-Brownian Process</span></li>
</ul>
<p><span><strong>2. Integration, Differentiation, Moving Averages</strong></span></p>
<p><span>We introduce more advanced concepts about stochastic processes. Yet we make these concepts easy to understand even to the non-expert. This is a follow-up to Chapter 1.</span></p>
<ul>
<li><span>Integrated, Moving Average and Differential Process</span></li>
<li><span>Proper Re-scaling and Variance Computation</span></li>
<li><span>Application to Number Theory Problem</span></li>
</ul>
<p><span><strong>3. Self-Correcting Random Walks</strong></span></p>
<p><span>We investigate here a breed of stochastic processes that are different from the Brownian motion, yet are better models in many contexts, including Fintech.</span><span> </span></p>
<ul>
<li><span>Controlled or Constrained Random Walks</span></li>
<li><span>Link to Mixture Distributions and Clustering</span></li>
<li><span>First Glimpse of Stochastic Integral Equations</span></li>
<li><span>Link to Wiener Processes, Application to Fintech</span></li>
<li><span>Potential Areas for Research</span></li>
<li><span>Non-stochastic Case</span></li>
</ul>
<p><span><strong>4. Stochastic Processes and Tests of Randomness</strong></span></p>
<p><span>In this transition chapter, we introduce a different type of stochastic process, with number theory and cryptography applications, analyzing statistical properties of numeration systems along the way -- a recurrent theme in the next chapters, offering many research opportunities and applications. While we are dealing with deterministic sequences here, they behave very much like stochastic processes, and are treated as such. Statistical testing is central to this chapter, introducing tests that will be also used in the last chapters.</span></p>
<ul>
<li><span>Gap Distribution in Pseudo-Random Digits</span></li>
<li><span>Statistical Testing and Geometric Distribution</span></li>
<li><span>Algorithm to Compute Gaps</span></li>
<li><span>Another Application to Number Theory Problem</span></li>
<li><span>Counter-Example: Failing the Gap Test</span></li>
</ul>
<p><span><strong>5. Hierarchical Processes</strong></span></p>
<p><span>We start discussing random number generation, and numerical and computational issues in simulations, applied to an original type of stochastic process. This will become a recurring theme in the next chapters, as it applies to many other processes.</span></p>
<ul>
<li><span>Graph Theory and Network Processes</span></li>
<li><span>The Six Degrees of Separation Problem</span></li>
<li><span>Programming Languages Failing to Produce Randomness in Simulations</span></li>
<li><span>How to Identify and Fix the Previous Issue</span></li>
<li><span>Application to Web Crawling</span></li>
</ul>
<p><span><strong>6. Introduction to Chaotic Systems</strong></span></p>
<p><span>While typically studied in the context of dynamical systems, the logistic map can be viewed as a stochastic process, with an equilibrium distribution and probabilistic properties, just like numeration systems (next chapters) and processes introduced in the first four chapters.</span></p>
<ul>
<li><span>Logistic Map and Fractals</span></li>
<li><span>Simulation: Flaws in Popular Random Number Generators</span></li>
<li><span>Quantum Algorithms</span></li>
</ul>
<p><span><strong>7. Chaos, Logistic Map and Related Processes</strong></span></p>
<p><span>We study processes related to the logistic map, including a special logistic map discussed here for the first time, with a simple equilibrium distribution. This chapter offers a transition between chapter 6, and the next chapters on numeration system (the logistic map being one of them.)</span></p>
<ul>
<li><span>General Framework</span></li>
<li><span>Equilibrium Distribution and Stochastic Integral Equation</span></li>
<li><span>Examples of Chaotic Sequences</span></li>
<li><span>Discrete, Continuous Sequences and Generalizations</span></li>
<li><span>Special Logistic Map</span></li>
<li><span>Auto-regressive Time Series</span></li>
<li><span>Literature</span></li>
<li><span>Source Code with Big Number Library</span></li>
<li><span>Solving the Stochastic Integral Equation: Example</span></li>
</ul>
<p><span><strong>8. Numerical and Computational Issues</strong></span></p>
<p><span>These issues have been mentioned in chapter 7, and also appear in chapters 9, 10 and 11. Here we take a deeper dive and offer solutions, using high precision computing with BigNumber libraries. </span></p>
<ul>
<li><span>Precision Issues when Simulating, Modeling, and Analyzing Chaotic Processes</span></li>
<li><span>When Precision Matters, and when it does not</span></li>
<li><span>High Precision Computing (HPC)</span></li>
<li><span>Benchmarking HPC Solutions</span></li>
<li><span>How to Assess the Accuracy of your Simulation Tool</span></li>
</ul>
<p><span><strong>9. Digits of Pi, Randomness, and Stochastic Processes</strong></span></p>
<p><span>Deep mathematical and data science research (including a result about the randomness of Pi, which is just a particular case) are presented here, without using arcane terminology or complicated equations. Numeration systems discussed here are a particular case of deterministic sequences behaving just like the stochastic process investigated earlier, in particular the logistic map, which is a particular case.</span></p>
<ul>
<li><span>Application: Random Number Generation</span></li>
<li><span>Chaotic Sequences Representing Numbers</span></li>
<li><span>Data Science and Mathematical Engineering</span></li>
<li><span>Numbers in Base 2, 10, 3/2 or Pi</span></li>
<li><span>Nested Square Roots and Logistic Map</span></li>
<li><span>About the Randomness of the Digits of Pi</span></li>
<li><span>The Digits of Pi are Randomly Distributed in the Logistic Map System</span></li>
<li><span>Paths to Proving Randomness in the Decimal System</span></li>
<li><span>Connection with Brownian Motions</span></li>
<li><span>Randomness and the Bad Seeds Paradox</span></li>
<li><span>Application to Cryptography, Financial Markets, Blockchain, and HPC</span></li>
<li><span>Digits of Pi in Base Pi</span></li>
</ul>
<p><span><strong>10. Numeration Systems in One Picture</strong></span></p>
<p><span>Here you will find a summary of much of the material previously covered on chaotic systems, in the context of numeration systems (in particular, chapters 7 and 9.)</span></p>
<ul>
<li><span>Summary Table: Equilibrium Distribution, Properties</span></li>
<li><span>Reverse-engineering Number Representation Systems</span></li>
<li><span>Application to Cryptography</span></li>
</ul>
<p><span><strong>11. Numeration Systems: More Statistical Tests and Applications</strong></span></p>
<p><span>In addition to featuring new research results and building on the previous chapters, the topics discussed here offer a great sandbox for data scientists and mathematicians.</span><span> </span></p>
<ul>
<li><span>Components of Number Representation Systems</span></li>
<li><span>General Properties of these Systems</span></li>
<li><span>Examples of Number Representation Systems</span></li>
<li><span>Examples of Patterns in Digits Distribution</span></li>
<li><span>Defects found in the Logistic Map System</span></li>
<li><span>Test of Uniformity</span></li>
<li><span>New Numeration System with no Bad Seed</span></li>
<li><span>Holes, Autocorrelations, and Entropy (Information Theory)</span></li>
<li><span>Towards a more General, Better, Hybrid System</span></li>
<li><span>Faulty Digits, Ergodicity, and High Precision Computing</span></li>
<li><span>Finding the Equilibrium Distribution with the Percentile Test</span></li>
<li><span>Central Limit Theorem, Random Walks, Brownian Motions, Stock Market Modeling</span></li>
<li><span>Data Set and Excel Computations</span></li>
</ul>
<p><span><strong>12. The Central Limit Theorem Revisited</strong></span></p>
<p><span>The central limit theorem explains the convergence of discrete stochastic processes to Brownian motions, and has been cited a few times in this book. Here we also explore a version that applies to deterministic sequences. Such sequences and treated as stochastic processes in this book.</span></p>
<ul>
<li><span>A Special Case of the Central Limit Theorem</span></li>
<li><span>Simulations, Testing, and Conclusions</span></li>
<li><span>Generalizations</span></li>
<li><span>Source Code</span></li>
</ul>
<p><span><strong>13. How to Detect if Numbers are Random or Not</strong></span></p>
<p><span>We explore here some deterministic sequences of numbers, behaving like stochastic processes or chaotic systems, together with another interesting application of the central limit theorem.</span></p>
<ul>
<li><span>Central Limit Theorem for Non-Random Variables</span></li>
<li><span>Testing Randomness: Max Gap, Auto-Correlations and More</span></li>
<li><span>Potential Research Areas</span></li>
<li><span>Generalization to Higher Dimensions</span></li>
</ul>
<p><span><strong>14. Arrival Time of Extreme Events in Time Series</strong></span></p>
<p><span>Time series, as discussed in the first chapters, are also stochastic processes. Here we discuss a topic rarely investigated in the literature: the arrival times, as opposed to the extreme values (a classic topic), associated with extreme events in time series.</span></p>
<ul>
<li><span>Simulations</span></li>
<li><span>Theoretical Distribution of Records over Time</span></li>
</ul>
<p><span><strong>15. Miscellaneous Topics</strong></span></p>
<p><span>We investigate topics related to time series as well as other popular stochastic processes such as spatial processes.</span></p>
<ul>
<li><span>How and Why: Decorrelate Time Series</span></li>
<li><span>A Weird Stochastic-Like, Chaotic Sequence</span></li>
<li><span>Stochastic Geometry, Spatial Processes, Random Circles: Coverage Problem</span></li>
<li><span>Additional Reading (Including Twin Points in Point Processes)</span></li>
</ul>
<p><span><strong>16. Exercises</strong></span></p>Invitation to Join Data Science Centraltag:www.analyticbridge.datasciencecentral.com,2018-09-08:2004291:BlogPost:3880342018-09-08T17:14:58.000ZVincent Granvillehttps://www.analyticbridge.datasciencecentral.com/profile/VincentGranville
<p><span>Join the largest community of machine learning (ML), deep learning, AI, data science, business analytics, BI, operations research, mathematical and statistical professionals: <a href="https://www.datasciencecentral.com/main/authorization/signUp?" target="_self">Sign up here</a>. If instead, you are only interested in receiving our newsletter, you can subscribe <a href="https://www.datasciencecentral.com/page/newsletter" rel="noopener" target="_blank">here</a>. There is no…</span></p>
<p><span>Join the largest community of machine learning (ML), deep learning, AI, data science, business analytics, BI, operations research, mathematical and statistical professionals: <a href="https://www.datasciencecentral.com/main/authorization/signUp?" target="_self">Sign up here</a>. If instead, you are only interested in receiving our newsletter, you can subscribe <a href="https://www.datasciencecentral.com/page/newsletter" target="_blank" rel="noopener">here</a>. There is no cost.</span></p>
<p><span><a href="https://api.ning.com/files/nNamryvi-7CILtzx7HJ3oWcaMpVJLQK6QTJAc1OSDxnNxA2eFk7aGrwJrlrhivnA9IneIbcISUeeg4CMrl8yk4MJthUPOxiZ/x00.jpg" target="_self"><img src="https://api.ning.com/files/nNamryvi-7CILtzx7HJ3oWcaMpVJLQK6QTJAc1OSDxnNxA2eFk7aGrwJrlrhivnA9IneIbcISUeeg4CMrl8yk4MJthUPOxiZ/x00.jpg" width="400" class="align-center"/></a></span></p>
<p><span class="font-size-3">The full membership includes, in addition to the newsletter subscription:</span></p>
<ul>
<li><span>Access to <a href="https://www.datasciencecentral.com/page/member" target="_blank" rel="noopener">member-only pages</a>, our free data science eBooks, data sets, code snippets, and solutions to data science / machine learning / mathematical challenges.</span></li>
<li><span class="font-size-3">Support to all your questions regarding our community.</span></li>
<li><span class="font-size-3">Data sets, projects, cheat sheets, tutorials, programming tips, summarized information easy to digest, DSC webinars, data science events (conferences, workshops), new books, and news. </span></li>
<li><span class="font-size-3">Ability to post <a href="https://www.datasciencecentral.com/profiles/blog/list?promoted=1" target="_blank" rel="noopener">blogs</a> and <a href="https://www.datasciencecentral.com/forum/topic/featured" target="_blank" rel="noopener">forum questions</a>, as well as comments, and get answers from experts in their field. </span></li>
</ul>
<p><span class="font-size-3">You can easily unsubscribe at any time. Our weekly digest features selected discussions, articles written by experts, forum questions and announcements aimed at machine learning, AI, IoT, analytics, data science, BI, operations research and big data practitioners.</span></p>
<p><span class="font-size-3">It covers topics such as deep learning, AI, blockchain, visualization, automated machine learning, Hadoop, data integration and engineering, statistical science, computational statistics, analytics, pure data science, data security, and even computer-intensive methods in number theory. It includes</span></p>
<ul>
<li><span class="font-size-3">Exclusive content for subscribers only: our upcoming book on automated data science (coming soon), detailed research reports about the data science community (for instance, best cities for data scientists, with growth trends), API's (top Twitter accounts, various forecasting apps) and more</span></li>
<li><span class="font-size-3">New book and new journal announcements</span></li>
<li><span class="font-size-3">Salary surveys - how much a Facebook data scientist makes</span></li>
<li><span class="font-size-3">Workshops, webinars and conference announcements </span></li>
<li><span class="font-size-3">Programs and certifications for data scientists</span></li>
<li><span class="font-size-3">Case studies, success stories, benchmarks</span></li>
<li><span class="font-size-3">New analytic companies/products announcements</span></li>
<li><span class="font-size-3">Sample source code, questions about coding and algorithms</span></li>
</ul>
<p><span class="font-size-3"><a href="https://api.ning.com/files/DGjgPH8*8vnOpgpRfvHNENdZAfV3aKDxhTnl50JQPNhQIZRLPEpLE7hzYhU0id7*HAx*Hwxbp176P4l9AUxx47p2K9MnKt9f/bor99.PNG" target="_self"><img src="https://api.ning.com/files/DGjgPH8*8vnOpgpRfvHNENdZAfV3aKDxhTnl50JQPNhQIZRLPEpLE7hzYhU0id7*HAx*Hwxbp176P4l9AUxx47p2K9MnKt9f/bor99.PNG" width="713" class="align-center"/></a></span></p>
<p><span class="font-size-3"><strong><a href="https://www.datasciencecentral.com/main/authorization/signUp?" target="_self">Click here to sign up</a></strong> and start receiving our newsletter. We respect your privacy: member information (email address etc.) is kept confidential and never shared.</span></p>Curious Mathematical Problemtag:www.analyticbridge.datasciencecentral.com,2018-08-31:2004291:BlogPost:3877642018-08-31T05:00:00.000ZVincent Granvillehttps://www.analyticbridge.datasciencecentral.com/profile/VincentGranville
<p>Let us consider the following equation:</p>
<p><a href="http://api.ning.com:80/files/i1Rtw3QpThCT*hg12mfyTenJel5zU3S0VSB50Pg47HmpgohY56zD*cuqD6PZGrgj0TztTupRjzlkY8Wb5Clgv65STo7nmwty/Capture.PNG" target="_self"><img class="align-center" src="http://api.ning.com:80/files/i1Rtw3QpThCT*hg12mfyTenJel5zU3S0VSB50Pg47HmpgohY56zD*cuqD6PZGrgj0TztTupRjzlkY8Wb5Clgv65STo7nmwty/Capture.PNG" width="359"></img></a></p>
<p>Prove that</p>
<ul>
<li><em>x</em> = log(Pi) = 1.14472988584... is a very good approximation of a solution, up to 10 digits.</li>
<li>Using <a href="https://www.datasciencecentral.com/page/search?q=high+performance+computing" rel="noopener" target="_blank">high…</a></li>
</ul>
<p>Let us consider the following equation:</p>
<p><a href="http://api.ning.com:80/files/i1Rtw3QpThCT*hg12mfyTenJel5zU3S0VSB50Pg47HmpgohY56zD*cuqD6PZGrgj0TztTupRjzlkY8Wb5Clgv65STo7nmwty/Capture.PNG" target="_self"><img src="http://api.ning.com:80/files/i1Rtw3QpThCT*hg12mfyTenJel5zU3S0VSB50Pg47HmpgohY56zD*cuqD6PZGrgj0TztTupRjzlkY8Wb5Clgv65STo7nmwty/Capture.PNG" width="359" class="align-center"/></a></p>
<p>Prove that</p>
<ul>
<li><em>x</em> = log(Pi) = 1.14472988584... is a very good approximation of a solution, up to 10 digits.</li>
<li>Using <a href="https://www.datasciencecentral.com/page/search?q=high+performance+computing" target="_blank" rel="noopener">high performance computing</a> or other means, prove that it is correct up to 1,000 digits.</li>
<li>Is <em>x</em> = log(Pi) an exact solution?</li>
</ul>
<p>If the answer to the last question is positive, this would mean that log(Pi) is NOT a transcendental number, but rather, an algebraic number. A remarkable result in itself!</p>
<p><a href="http://api.ning.com:80/files/i1Rtw3QpThDa0xwQFjeLDwPnngWVNDqfgCjVoG-*uiWaxad4dEb2806ZbrredIuHeI2RhpubMePcpIpkvgHQcPSRyeh34mlK/Capture.PNG" target="_self"><img src="http://api.ning.com:80/files/i1Rtw3QpThDa0xwQFjeLDwPnngWVNDqfgCjVoG-*uiWaxad4dEb2806ZbrredIuHeI2RhpubMePcpIpkvgHQcPSRyeh34mlK/Capture.PNG" width="473" class="align-center"/></a></p>
<p style="text-align: center;"><em>Source for picture: <a href="https://en.wikipedia.org/wiki/Algebraic_number" target="_blank" rel="noopener">algebraic numbers</a></em></p>
<p><strong>Solution and related problem</strong></p>
<p>Any real number larger or equal to 1 is a solution, so there is nothing particular with log(Pi). A more subtle version of this problem is to ask the student to solve the following equation:</p>
<p><a href="http://api.ning.com:80/files/0jbYLowImptu1FJeLENprdxtHN9hfUDstIW0wzYx*JB9Tzl6lE-J4BvB1v*MWPILiUd-q2qdT5AE8zyZQhSKL02mxuZ-t1k4/Capture.PNG" target="_self"><img src="http://api.ning.com:80/files/0jbYLowImptu1FJeLENprdxtHN9hfUDstIW0wzYx*JB9Tzl6lE-J4BvB1v*MWPILiUd-q2qdT5AE8zyZQhSKL02mxuZ-t1k4/Capture.PNG" width="395" class="align-center"/></a></p>
<p>We know from the previous problem that if <em>x</em>^5 - <em>x</em>^2 - 1 = <em>x</em>^2 - 1, the equality holds. Thus to find a solution, we just need to solve <em>x</em>^5 - <em>x</em>^2 - 1 = <em>x</em>^2 - 1. The cubic root of 2 is a solution.</p>
<p>More generally, let's define</p>
<p><a href="http://api.ning.com:80/files/EPN0*n2RQk-UaArUOZc6-7gW5XTp*O*74dQt3wJWQGPCTpbtu7jyEbN-YsJ8pyODVGbD-8JTJt8TK1DXpTVQhGy6eitTl9A9/Capture.PNG" target="_self"><img src="http://api.ning.com:80/files/EPN0*n2RQk-UaArUOZc6-7gW5XTp*O*74dQt3wJWQGPCTpbtu7jyEbN-YsJ8pyODVGbD-8JTJt8TK1DXpTVQhGy6eitTl9A9/Capture.PNG" width="617" class="align-center"/></a></p>
<p>Then the (unique) real-valued solution to the equation <em>f</em>(<em>x</em>) = 0 is given by</p>
<p><a href="http://api.ning.com:80/files/EPN0*n2RQk-UcRLN85xZDwf*kg3I5oM5pPNyRExCFEsVFK*EKM9*wv7b-cXJmNlLBO0D3wFURQGc9Xfn-DIBSluPjLW2kjcj/Capture.PNG" target="_self"><img src="http://api.ning.com:80/files/EPN0*n2RQk-UcRLN85xZDwf*kg3I5oM5pPNyRExCFEsVFK*EKM9*wv7b-cXJmNlLBO0D3wFURQGc9Xfn-DIBSluPjLW2kjcj/Capture.PNG" width="95" class="align-center"/></a></p>
<p>In particular, if <em>p</em> = 3, then <em>x</em> = 2. If <em>p</em> = 2 + log(2) / log(3), then <em>x</em> = 3. Note that the function <em>f</em> is monotonic, and thus invertible. What is the inverse of <em>f</em>?</p>
<p><em>For related articles from the same author, <a href="http://www.datasciencecentral.com/profiles/blogs/my-data-science-machine-learning-and-related-articles" target="_blank" rel="noopener">click here</a><span> </span>or visit<span> </span><a href="http://www.vincentgranville.com/" target="_blank" rel="noopener">www.VincentGranville.com</a>. Follow me on<span> </span><a href="https://www.linkedin.com/in/vincentg/" target="_blank" rel="noopener">on LinkedIn</a>, or visit my old web page<span> </span><a href="http://www.datashaping.com">here</a>.</em></p>
<p><span style="font-size: 14pt;"><b>DSC Resources</b></span></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/invitation-to-join-data-science-central">Invitation to Join Data Science Central</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/fee-book-applied-stochastic-processes">Free Book: Applied Stochastic Processes</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/comprehensive-repository-of-data-science-and-ml-resources">Comprehensive Repository of Data Science and ML Resources</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/advanced-machine-learning-with-basic-excel">Advanced Machine Learning with Basic Excel</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/difference-between-machine-learning-data-science-ai-deep-learning">Difference between ML, Data Science, AI, Deep Learning, and Statistics</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/my-data-science-machine-learning-and-related-articles">Selected Business Analytics, Data Science and ML articles</a></li>
<li><a href="http://careers.analytictalent.com/jobs/products">Hire a Data Scientist</a><span> </span>|<span> </span><a href="http://www.datasciencecentral.com/page/search?q=Python">Search DSC</a><span> </span>|<span> </span><a href="http://classifieds.datasciencecentral.com">Classifieds</a><span> </span>|<span> </span><a href="http://www.analytictalent.com">Find a Job</a></li>
<li><a href="http://www.datasciencecentral.com/profiles/blog/new">Post a Blog</a><span> </span>|<span> </span><a href="http://www.datasciencecentral.com/forum/topic/new">Forum Questions</a></li>
</ul>Career Transition Towards Data Analytics & Sciencetag:www.analyticbridge.datasciencecentral.com,2018-08-30:2004291:BlogPost:3878542018-08-30T23:00:00.000ZVincent Granvillehttps://www.analyticbridge.datasciencecentral.com/profile/VincentGranville
<p><em>Here is<a class="nolink"> </a><a href="https://www.datasciencecentral.com/profile/RafaelKnuth">Rafael Knuth</a>'s story.</em></p>
<p>In 1992, I entered the job market and landed a job as an advertising copywriter for McDonald’s. I was tasked with ideating radio, TV and print advertisements to curb burger, fries and soft drink sales. The internet did not exist in the public domain back then, and my first laptop was actually a mechanical type writer. Around 2000, I became a freelance…</p>
<p><em>Here is<a class="nolink"> </a><a href="https://www.datasciencecentral.com/profile/RafaelKnuth">Rafael Knuth</a>'s story.</em></p>
<p>In 1992, I entered the job market and landed a job as an advertising copywriter for McDonald’s. I was tasked with ideating radio, TV and print advertisements to curb burger, fries and soft drink sales. The internet did not exist in the public domain back then, and my first laptop was actually a mechanical type writer. Around 2000, I became a freelance marketing manager, working for small and mid sized businesses. At that time, my English was not good enough to work for companies outside of my home country Germany (it’s still far from perfect).</p>
<p>Fast forward 10 years, I was still working as a marketing guy, yet after years of self-study, my English became profoundly workable. I managed to acquire some of the largest US based IT and software companies as my clients, and in 2013, I started teaching myself to code. Back then, I was increasingly worried that as a technology illiterate, I might be flushed out of the job market in a forseeable future.</p>
<p>At the moment of writing this post, I am bootstrapping a data literacy consultancy, catering to large enterprises around the globe. I teach business users how to work with Excel in ways they haven’t seen before. Plus, I teach them how to code and work with data in a utility scale environment. My learning journey was tough, but it can be smooth for any business leveraging on my experience.</p>
<p>My biggest fear of becoming jobless turned into the business opportunity of my lifetime.</p>
<p><a href="https://api.ning.com/files/2J31BhRGuNGKyQUaHYJE5GAjD7QMlJFwq6YmcayVyNdaSojTSOIjWkNb6jWqnuj66sNyBVw7SBNGFqTTlVHhje50z8GD-Fmh/136236172186172185_rc.jpg" target="_self"><img src="https://api.ning.com/files/2J31BhRGuNGKyQUaHYJE5GAjD7QMlJFwq6YmcayVyNdaSojTSOIjWkNb6jWqnuj66sNyBVw7SBNGFqTTlVHhje50z8GD-Fmh/136236172186172185_rc.jpg?width=750" width="750" class="align-full"/></a></p>
<p><em>T-Systems employees protesting against their employer’s decision to release 10,000 workers who don’t possess any coding skills. Source: Verdi | Markus Fring</em></p>
<p><strong>10 observations I made during my own transition, which might propel yours</strong></p>
<p>You might be tempted to say: “Nah, that’s not me. An ad guy turned consultant!” And you know what? You’re right! I‘m not you. Just take my observations and use them to craft your own, unique career transition story. Use my learnings to avoid unpleasant surprises and expensive mistakes. In 5 to 10 years, so I hope, we will have thousands of stories, and distinct career transition patterns will emerge.</p>
<p>The observations below are not sorted in order of importance, I rather arranged them for the sake of an easily digestible narrative. Let’s get started.</p>
<ul>
<li>1<sup>st</sup> observation: just learning to code will get you nowhere</li>
<li>2<sup>nd</sup> observation: Excel is dead, long live Excel</li>
<li>3<sup>rd</sup> observation: the more you share, the more you learn</li>
<li>4<sup>th</sup> observation: citizen data scientists are coming, yet their scope is limited</li>
</ul>
<p>Read full article with detailed explanations regarding each of the 10 observations, as well as what Rafael found to be the most useful things to learn (languages, statistical techniques, etc.) <a href="https://www.datasciencecentral.com/profiles/blogs/career-transition-towards-data-analytics-amp-science-here-s-my" target="_blank" rel="noopener">here</a>. </p>
<p><span style="font-size: 14pt;"><b>DSC Resources</b></span></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/invitation-to-join-data-science-central">Invitation to Join Data Science Central</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/fee-book-applied-stochastic-processes">Free Book: Applied Stochastic Processes</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/comprehensive-repository-of-data-science-and-ml-resources">Comprehensive Repository of Data Science and ML Resources</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/advanced-machine-learning-with-basic-excel">Advanced Machine Learning with Basic Excel</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/difference-between-machine-learning-data-science-ai-deep-learning">Difference between ML, Data Science, AI, Deep Learning, and Statistics</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/my-data-science-machine-learning-and-related-articles">Selected Business Analytics, Data Science and ML articles</a></li>
<li><a href="http://careers.analytictalent.com/jobs/products">Hire a Data Scientist</a><span> </span>|<span> </span><a href="http://www.datasciencecentral.com/page/search?q=Python">Search DSC</a><span> </span>|<span> </span><a href="http://classifieds.datasciencecentral.com">Classifieds</a><span> </span>|<span> </span><a href="http://www.analytictalent.com">Find a Job</a></li>
<li><a href="http://www.datasciencecentral.com/profiles/blog/new">Post a Blog</a><span> </span>|<span> </span><a href="http://www.datasciencecentral.com/forum/topic/new">Forum Questions</a></li>
</ul>
<p></p>