Summary: The Gartner Magic Quadrant for Data Science and Machine Learning Platforms is just out the big news is how much more capable all the platforms have become. Of course there are also some interesting winner and loser stories.The Gartner Magic Quadrant for Data Science and Machine Learning Platforms is just out for 2020. The really big news is how many excellent choices are now available. In a remarkable move, the whole field of competitors has moved strongly up and to the right offering more and more Leaders or near-leader Visionaries than ever before.It’s a mark of maturity in our industry that so many platforms offer fully capable model development, operationalizing, and management features. That list of requirements as defined by Gartner grows longer every year and earning a better rating requires increasing capability and increasing customer satisfaction.What Are the Major Changes?As in previous years we’ve charted the major changes in position using green arrows for improvement and red arrows to indicate a reduced rating. The blue dots are current ratings and the gray dots are from a year ago.Read the full article here with the 2020 version of the above chart, with comments.See More

In this notebook, we try to predict the positive (label 1) or negative (label 0) sentiment of the sentence. We use the UCI Sentiment Labelled Sentences Data Set.Sentiment analysis is very useful in many areas. For example, it can be used for internet conversations moderation. Also, it is possible to predict ratings that users can assign to a certain product (food, household appliances, hotels, films, etc) based on the reviews.In this notebook we are using two families of machine learning algorithms: Naive Bayes (NB) and long short term memory (LSTM) neural networks.AYLIENDeeplearning4jUnderstanding LSTM NetworksEmpirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling The Unreasonable Effectiveness of Recurrent Neural NetworksWe will use pandas, numpy for data manipulation, nltk for natural language processing, matplotlib, seaborn and plotly for data visualization, sklearn and keras for learning the models.Read the full article with source code and illustrations, here. See More

]]>

How a Physics-Driven Analytics Platform Detects Reliability ThreatsA physics-driven analytics platform aids in improvements to the reliability and efficiency of connected mechanical systems. The solution analyzes large quantities of time series data from IoT sensors to help identify issues affecting system performance in real-time as well as provide accurate data for predictive maintenance. Our presenter chose a time series database for its high ingest and storage of time series data as well as its ability to easily send this data into their systems for predictive analytics. Register today.Job SpotlightSenior Consultant: Market Research - RSGManager-Optimization Consulting - River Logic Inc.Sr. Research Data Scientist - UC DavisFeatured JobsData Scientist, Analytics - FacebookData Science Manager - Aviation - UberSr. Data Scientist, Business Intelligence - VimeoData Scientist, Content Platform - SpotifyCyber Threat Analytics Engineer - NikeAnalytics Data Engineer III - John DeereData Scientist - Inference, Core Host - AirbnbData Analytics Engineer - FVV - Volvo GroupData Engineer - Red HatSenior Director of Data Science and Machine Learning - AdobeData Scientist / Sr. Data Scientist - Blizzard EntertainmentData Scientist - The Coca-Cola CompanySenior Data Scientist - TripAdvisorAI Scientist - Electronic Arts (EA)Senior Software Engineer, Data Science Platform - NetflixCheck out the most recent jobs on AnalyticTalent.comUpcoming DSC Webinars and ResourcesEnsuring Quality Data Labels from Outsourced Teams - eBookForecasting: Prophet & Time Series Database - Feb 25How a Physics-Driven Analytics Platform Detects Reliability Threats - Feb 26Mathematical Optimization Modeling: Learn the Basics - March 10Developing and Testing Shiny Apps - March 12See More

Probably the worst error is thinking there is a correlation when that correlation is purely artificial. Take a data set with 100,000 variables, say with 10 observations. Compute all the (99,999 * 100,000) / 2 cross-correlations. You are almost guaranteed to find one above 0.999. This is best illustrated in may article How to Lie with P-values (also discussing how to handle and fix it.)This is being done on such a large scale, I think it is probably the main cause of fake news, and the impact is disastrous on people who take for granted what they read in the news or what they hear from the government. Some people are sent to jail based on evidence tainted with major statistical flaws. Government money is spent, propaganda is generated, wars are started, and laws are created based on false evidence. Sometimes the data scientist has no choice but to knowingly cook the numbers to keep her job. Usually, these “bad stats” end up being featured in beautiful but faulty visualizations: axes are truncated, charts are distorted, observations and variables are carefully chosen just to make a (wrong) point.Read the full article here. Related articlesHow to Lie with P-valuesFour Types of Data ScientistDebunking Forbes Article about the Death of the Data ScientistWhy You Should be a Data Science Generalist - and How to Become OneBecoming a Billionaire Data Scientist vs Struggling to Get a $100k Job Is a PhD helpful for a data science career?If data science is in demand, why is it so hard to get a job?Why do people with no experience want to become data scientists?Why is Becoming a Data Scientist so Difficult?Full Stack Data Scientist: The Elusive Unicorn and Data HackerStatistical Significance and p-Values Take Another BlowAre data science or stats curricula in US too specialized?How do you identify an actual data scientist?Is it still possible today to become a self-taught data scientist?Will the job outlook for data scientists severely decline after 2020?Why Logistic Regression should be the last thing you learnSource for picture: here See More

]]>

Fermat's last conjecture has puzzled mathematicians for 300 years, and was eventually proved only recently. In this note, I propose a generalization, that could actually lead to a much simpler proof and a more powerful result with broader applications, including to solve numerous similar equations. As usual, my research involves a significant amount of computations and experimental math, as an exploratory step before stating new conjectures, and eventually trying to prove them. The methodology is very similar to that used in data science, involving the following steps:Identify and process the data. Here the data set consists of all real numbers; it is infinite, which brings its own challenges. On the plus side, the data is public and accessible to everyone, though very powerful computation techniques are required, usually involving a distributed architecture. Data cleaning: in this case, inaccuracies are caused by no using enough precision; the solution consists of finding better / faster algorithms for your computations, and sometimes having to work with exact arithmetic, using Bignum libraries.Sample data and perform exploratory analysis to identify patterns. Formulate hypotheses. Perform statistical tests to validate (or not) these hypotheses. Then formulate conjectures based on this analysis. Build models (about how your numbers seem to behave) and focus on models offering the best fit. Perform simulations based on your model, see if your numbers agree with your simulations, by testing on a much larger set of numbers. Discard conjectures that do not pass these tests.Formally prove or disprove retained conjectures, when possible. Then write a conclusion if possible: in this case, a new, major mathematical theorem, showing potential applications. This last step is similar to data scientists presenting the main insights of their analysis, to a layman audience.See full article for explanations about this table (representing the number of solutions)The motivation in this article is two-fold:Presenting a new path that can lead to new interesting results and theoretical research in mathematics (yet my writing style and content is accessible to the layman).Offering data scientists and machine learning / AI practitioners (including newbies) an interesting framework to test their programming, discovery and analysis skills, using a huge (infinite) data set that has been available to everyone since the beginning of times, and applied to a fascinating problem. Read full article here. For more math-oriented articles, visit this page (check the math section), or download my books, available here.See More

Hundreds of programming languages dominate the data science and statistics market: Python, R, SAS and SQL are standouts. If you're looking to branch out and add a new programming language to your skill set, which one should you learn? This one picture breaks down the differences between the four languages.View the full picture (with pluses and minuses) as well as related articles, here. Below are more resources for specific languages, including comparisons between languages, and same algorithms illustrated in different languages.PythonPython vs RRSQLSASJuliaScalaJavaCMatlabTo quickly learn these languages or refresh your skills, check out our cheat sheets.See More

]]>

While many of the programming libraries encapsulate the inner working details of graph and other algorithms, as a data scientist it helps a lot having a reasonably good familiarity of such details. A solid understanding of the intuition behind such algorithms not only helps in appreciating the logic behind them but also helps in making conscious decisions about their applicability in real life cases. There are several graph based algorithms and most notable are the shortest path algorithms. Algorithms such as Dijkstra’s, Bellman Ford, A*, Floyd-Warshall and Johnson’s algorithms are commonly encountered. While these algorithms are discussed in many text books and informative resources online, I felt that not many provided visual examples that would otherwise illustrate the processing steps to sufficient granularity enabling easy understanding of the working details. As such, I had to use simple enough graphs to visualize the algorithmic flow for my own understanding and I wanted to share my examples along with the explanations through this article. Since there are many algorithms to illustrate, I decided to divide the article into several parts. In part 1, I have illustrated Dijkstra’s and Bellman-Ford algorithms. Before diving into algorithms, I also wanted to highlight salient points on the graph data structure.Content of this article:Quick Primer On Graph Data StructureDijkstra’s AlgorithmBellman-Ford AlgorithmMore Algorithms To CoverRead the full article here. Written by Murali Kashaboina, Tech. Executive, PhD Researcher AI/ML/DS, Data Scientist, Industry Speaker, Entrepreneur.See More