]]>

]]>

]]>

We are in the process of writing and adding new material (compact eBooks) exclusively available to our members, and written in simple English, by world leading experts in AI, data science, and machine learning. In the upcoming months, the following will be added:The Machine Learning Coding BookOff-the-beaten-path Statistics and Machine Learning Techniques Encyclopedia of Statistical ScienceOriginal Math, Stat and Probability Problems - with SolutionsComputational Number Theory for Data ScientistsRandomness, Pattern Recognition, Simulations, Signal Processing - New developmentsWe invite you to sign up here to not miss these free books. Previous material (also for members only) can be found here.Currently, the following content is available:1. Book: Enterprise AI - An Application Perspective Enterprise AI: An applications perspective takes a use case driven approach to understand the deployment of AI in the Enterprise. Designed for strategists and developers, the book provides a practical and straightforward roadmap based on application use cases for AI in Enterprises. The authors (Ajit Jaokar and Cheuk Ting Ho) are data scientists and AI researchers who have deployed AI applications for Enterprise domains. The book is used as a reference for Ajit and Cheuk's new course on Implementing Enterprise AI.The table of content is available here. The book can be accessed here (members only.)2. Book: Applied Stochastic ProcessesFull title: Applied Stochastic Processes, Chaos Modeling, and Probabilistic Properties of Numeration Systems. Published June 2, 2018. Author: Vincent Granville, PhD. (104 pages, 16 chapters.)This book is intended to professionals in data science, computer science, operations research, statistics, machine learning, big data, and mathematics. In 100 pages, it covers many new topics, offering a fresh perspective on the subject. It is accessible to practitioners with a two-year college-level exposure to statistics and probability. The compact and tutorial style, featuring many applications (Blockchain, quantum algorithms, HPC, random number generation, cryptography, Fintech, web crawling, statistical testing) with numerous illustrations, is aimed at practitioners, researchers and executives in various quantitative fields.New ideas, advanced topics, and state-of-the-art research are discussed in simple English, without using jargon or arcane theory. It unifies topics that are usually part of different fields (data science, operations research, dynamical systems, computer science, number theory, probability) broadening the knowledge and interest of the reader in ways that are not found in any other book. This short book contains a large amount of condensed material that would typically be covered in 500 pages in traditional publications. Thanks to cross-references and redundancy, the chapters can be read independently, in random order.The table of content is available here. The book can be accessed here (members only.)DSC ResourcesComprehensive Repository of Data Science and ML ResourcesAdvanced Machine Learning with Basic ExcelDifference between ML, Data Science, AI, Deep Learning, and StatisticsSelected Business Analytics, Data Science and ML articlesHire a Data Scientist | Search DSC | Find a JobPost a Blog | Forum QuestionsSee More

Summary: This may be the golden age of deep learning but a lot can be learned by looking at where deep neural nets aren’t working yet. This can be a guide to calming the hype. It can also be a roadmap to future opportunities once these barriers are behind us. The full article is accessible here, below is a snapshot.. We are living in the golden age of deep learning. This is quite literally the technology that launched 10,000 startups (to paraphrase Kevin Kelly’s prophetic prediction from 2014 “The business plans of the next 10,000 startups are easy to forecast: Take X and add AI.”) Well that happened.Kelly was speaking more broadly about AI, but over the last four years we’ve come to understand that it’s about CNNs and RNN/LSTMs that are actually commercially ready and driving this. Although the last two years have been fairly quiet in terms of new technique and technology breakthroughs for data science, it hasn’t been totally quiet. Like the emergence of Temporal Convolutional Nets (TCNs) to replace RNNs in language translation, research goes on to see how deep learning and specifically CNN architecture can be pushed into new applications. Roadblocks to Deep LearningWhich brings us to our current topic which is to understand what some of the major roadblocks in research are in trying to expand deep learning into new areas. In calling our attention to ‘things that aren’t working in deep learning’, we aren’t suggesting that these things will never work, but rather that researchers are currently identifying major stumbling blocks to moving forward.The value of this is two-fold. First it can help steer us away from projects that might on the surface look like deep learning will work, but in fact may take a year or years to work out. Second, we should keep our eye on these particular issues since once they are resolved they will represent opportunities that others may have decided weren’t possible.Here are several that we spotted in the research.Read full article here. To make sure you keep getting these emails, please add mail@newsletter.datasciencecentral.com to your address book or whitelist us. To subscribe, follow this link. See More

]]>

These were features that I liked in Perl. Wondering if there is a way to make it work with Python?Automated memory allocation / de-allocation (for variables, arrays, hash tables etc.)Turning your program into an executable (that is, pre-compiled.)Automated variable initialization (variables, arrays don't even need to be declared, much less initialized)Automated type casting (e.g. automatically treating a same variable as an integer or string depending on the context: integer when performing a multiplication, or string for concatenation)You are going to say that this makes for terrible programming, but in my case I use the code only for myself, and I'd rather focus on the algorithms rather than the coding / debugging itself. Also, wondering if there are options for automated de-bugging.Also wondering how to produce sounds in Python, and which random number generator it uses. Finally, is high precision computing (like 500 digits of accuracy) is reliable in Python, using the default BigNum libraries?Thanks.See More

Full title: Applied Stochastic Processes, Chaos Modeling, and Probabilistic Properties of Numeration Systems. Published June 2, 2018. Author: Vincent Granville, PhD. (104 pages, 16 chapters.)This book is intended for professionals in data science, computer science, operations research, statistics, machine learning, big data, and mathematics. In 100 pages, it covers many new topics, offering a fresh perspective on the subject. It is accessible to practitioners with a two-year college-level exposure to statistics and probability. The compact and tutorial style, featuring many applications (Blockchain, quantum algorithms, HPC, random number generation, cryptography, Fintech, web crawling, statistical testing) with numerous illustrations, is aimed at practitioners, researchers and executives in various quantitative fields.New ideas, advanced topics, and state-of-the-art research are discussed in simple English, without using jargon or arcane theory. It unifies topics that are usually part of different fields (data science, operations research, dynamical systems, computer science, number theory, probability) broadening the knowledge and interest of the reader in ways that are not found in any other book. This short book contains a large amount of condensed material that would typically be covered in 500 pages in traditional publications. Thanks to cross-references and redundancy, the chapters can be read independently, in random order.This book is available for Data Science Central members exclusively. The text in blue consists of clickable links to provide the reader with additional references. Source code and Excel spreadsheets summarizing computations, are also accessible as hyperlinks for easy copy-and-paste or replication purposes. The most recent version of this book is available from this link, accessible to DSC members only. About the authorVincent Granville is a start-up entrepreneur, patent owner, author, investor, pioneering data scientist with 30 years of corporate experience in companies small and large (eBay, Microsoft, NBC, Wells Fargo, Visa, CNET) and a former VC-funded executive, with a strong academic and research background including Cambridge University.Download the book (members only) Click here to get the book. For Data Science Central members only. If you have any issues accessing the book please contact us at info@datasciencecentral.com.ContentThe book covers the following topics: 1. Introduction to Stochastic ProcessesWe introduce these processes, used routinely by Wall Street quants, with a simple approach consisting of re-scaling random walks to make them time-continuous, with a finite variance, based on the central limit theorem.Construction of Time-Continuous Stochastic ProcessesFrom Random Walks to Brownian MotionStationarity, Ergodicity, Fractal BehaviorMemory-less or Markov PropertyNon-Brownian Process2. Integration, Differentiation, Moving AveragesWe introduce more advanced concepts about stochastic processes. Yet we make these concepts easy to understand even to the non-expert. This is a follow-up to Chapter 1.Integrated, Moving Average and Differential ProcessProper Re-scaling and Variance ComputationApplication to Number Theory Problem3. Self-Correcting Random WalksWe investigate here a breed of stochastic processes that are different from the Brownian motion, yet are better models in many contexts, including Fintech. Controlled or Constrained Random WalksLink to Mixture Distributions and ClusteringFirst Glimpse of Stochastic Integral EquationsLink to Wiener Processes, Application to FintechPotential Areas for ResearchNon-stochastic Case4. Stochastic Processes and Tests of RandomnessIn this transition chapter, we introduce a different type of stochastic process, with number theory and cryptography applications, analyzing statistical properties of numeration systems along the way -- a recurrent theme in the next chapters, offering many research opportunities and applications. While we are dealing with deterministic sequences here, they behave very much like stochastic processes, and are treated as such. Statistical testing is central to this chapter, introducing tests that will be also used in the last chapters.Gap Distribution in Pseudo-Random DigitsStatistical Testing and Geometric DistributionAlgorithm to Compute GapsAnother Application to Number Theory ProblemCounter-Example: Failing the Gap Test5. Hierarchical ProcessesWe start discussing random number generation, and numerical and computational issues in simulations, applied to an original type of stochastic process. This will become a recurring theme in the next chapters, as it applies to many other processes.Graph Theory and Network ProcessesThe Six Degrees of Separation ProblemProgramming Languages Failing to Produce Randomness in SimulationsHow to Identify and Fix the Previous IssueApplication to Web Crawling6. Introduction to Chaotic SystemsWhile typically studied in the context of dynamical systems, the logistic map can be viewed as a stochastic process, with an equilibrium distribution and probabilistic properties, just like numeration systems (next chapters) and processes introduced in the first four chapters.Logistic Map and FractalsSimulation: Flaws in Popular Random Number GeneratorsQuantum Algorithms7. Chaos, Logistic Map and Related ProcessesWe study processes related to the logistic map, including a special logistic map discussed here for the first time, with a simple equilibrium distribution. This chapter offers a transition between chapter 6, and the next chapters on numeration system (the logistic map being one of them.)General FrameworkEquilibrium Distribution and Stochastic Integral EquationExamples of Chaotic SequencesDiscrete, Continuous Sequences and GeneralizationsSpecial Logistic MapAuto-regressive Time SeriesLiteratureSource Code with Big Number LibrarySolving the Stochastic Integral Equation: Example8. Numerical and Computational IssuesThese issues have been mentioned in chapter 7, and also appear in chapters 9, 10 and 11. Here we take a deeper dive and offer solutions, using high precision computing with BigNumber libraries. Precision Issues when Simulating, Modeling, and Analyzing Chaotic ProcessesWhen Precision Matters, and when it does notHigh Precision Computing (HPC)Benchmarking HPC SolutionsHow to Assess the Accuracy of your Simulation Tool9. Digits of Pi, Randomness, and Stochastic ProcessesDeep mathematical and data science research (including a result about the randomness of Pi, which is just a particular case) are presented here, without using arcane terminology or complicated equations. Numeration systems discussed here are a particular case of deterministic sequences behaving just like the stochastic process investigated earlier, in particular the logistic map, which is a particular case.Application: Random Number GenerationChaotic Sequences Representing NumbersData Science and Mathematical EngineeringNumbers in Base 2, 10, 3/2 or PiNested Square Roots and Logistic MapAbout the Randomness of the Digits of PiThe Digits of Pi are Randomly Distributed in the Logistic Map SystemPaths to Proving Randomness in the Decimal SystemConnection with Brownian MotionsRandomness and the Bad Seeds ParadoxApplication to Cryptography, Financial Markets, Blockchain, and HPCDigits of Pi in Base Pi10. Numeration Systems in One PictureHere you will find a summary of much of the material previously covered on chaotic systems, in the context of numeration systems (in particular, chapters 7 and 9.)Summary Table: Equilibrium Distribution, PropertiesReverse-engineering Number Representation SystemsApplication to Cryptography11. Numeration Systems: More Statistical Tests and ApplicationsIn addition to featuring new research results and building on the previous chapters, the topics discussed here offer a great sandbox for data scientists and mathematicians. Components of Number Representation SystemsGeneral Properties of these SystemsExamples of Number Representation SystemsExamples of Patterns in Digits DistributionDefects found in the Logistic Map SystemTest of UniformityNew Numeration System with no Bad SeedHoles, Autocorrelations, and Entropy (Information Theory)Towards a more General, Better, Hybrid SystemFaulty Digits, Ergodicity, and High Precision ComputingFinding the Equilibrium Distribution with the Percentile TestCentral Limit Theorem, Random Walks, Brownian Motions, Stock Market ModelingData Set and Excel Computations12. The Central Limit Theorem RevisitedThe central limit theorem explains the convergence of discrete stochastic processes to Brownian motions, and has been cited a few times in this book. Here we also explore a version that applies to deterministic sequences. Such sequences and treated as stochastic processes in this book.A Special Case of the Central Limit TheoremSimulations, Testing, and ConclusionsGeneralizationsSource Code13. How to Detect if Numbers are Random or NotWe explore here some deterministic sequences of numbers, behaving like stochastic processes or chaotic systems, together with another interesting application of the central limit theorem.Central Limit Theorem for Non-Random VariablesTesting Randomness: Max Gap, Auto-Correlations and MorePotential Research AreasGeneralization to Higher Dimensions14. Arrival Time of Extreme Events in Time SeriesTime series, as discussed in the first chapters, are also stochastic processes. Here we discuss a topic rarely investigated in the literature: the arrival times, as opposed to the extreme values (a classic topic), associated with extreme events in time series.SimulationsTheoretical Distribution of Records over Time15. Miscellaneous TopicsWe investigate topics related to time series as well as other popular stochastic processes such as spatial processes.How and Why: Decorrelate Time SeriesA Weird Stochastic-Like, Chaotic SequenceStochastic Geometry, Spatial Processes, Random Circles: Coverage ProblemAdditional Reading (Including Twin Points in Point Processes)16. ExercisesSee More

Summary: There are several approaches to reducing the cost of training data for AI, one of which is to get it for free. Here are some excellent sources.Recently we wrote that training data (not just data in general) is the new oil. It’s the difficulty and expense of acquiring labeled training data that causes many deep learning projects to be abandoned.It also matters a great deal just how good you want your new deep learning app to be. A 2016 study by Goodfellow, Bengio and Courville concluded you could get ‘acceptable’ performance with about 5,000 labeled examples per category BUT it would take 10 Million labeled examples per category to “match or exceed human performance”.There are a number of technologies coming up through research now that promise more accurate auto labeling to make creating training data less costly and time consuming. Snorkel from the Stanford Dawn Project is one we covered recently. This area is getting a lot of research attention.Another approach is to build on someone else’s work using publicly available datasets. You can begin by building your model in the borrowed set, you can blend your data with the borrowed data, or you could use the transfer learning approach to repurpose the front end of an existing model to train on your more limited data.Whatever your strategy, the ability to build on publicly available datasets is always something you’ll want to consider, so your ability to find them becomes key.Here are some notes on where you might start your search. These won’t all be labeled image and text but a lot of them are. And for those of you looking to use ML and statistical techniques, there’s plenty here for you too.Read full article here. To make sure you keep getting these emails, please add mail@newsletter.datasciencecentral.com to your address book or whitelist us. See More

Guest blog post by Zied HY. Zied is Senior Data Scientist at Capgemini Consulting. He is specialized in building predictive models utilizing both traditional statistical methods (Generalized Linear Models, Mixed Effects Models, Ridge, Lasso, etc.) and modern machine learning techniques (XGBoost, Random Forests, Kernel Methods, neural networks, etc.). Zied run some workshops for university students (ESSEC, HEC, Ecole polytechnique) interested in Data Science and its applications, and he is the co-founder of Global International Trading (GIT), a central purchasing office based in Paris.I have started reading about Deep Learning for over a year now through several articles and research papers that I came across mainly in LinkedIn, Medium and Arxiv.When I virtually attended the MIT 6.S191 Deep Learning courses during the last few weeks, I decided to begin to put some structure in my understanding of Neural Networks through this series of articles.I will go through the first four courses:Introduction to Deep LearningSequence Modeling with Neural NetworksDeep learning for computer vision - Convolutional Neural NetworksDeep generative modelingFor each course, I will outline the main concepts and add more details and interpretations from my previous readings and my background in statistics and machine learning.Starting from the second course, I will also add an application on an open-source dataset for each course.That said, let’s go!Read the first part, here. See More