A Data Science Central Community
The Waring conjecture - actually a problem associated with a number of conjectures, many now being solved - is one of the most fascinating mathematical problems. This article covers new aspects of this problem, with a generalization and new conjectures, some with a tentative solution, and a new framework to tackle the problem. Yet it is written in simple English and accessible to the layman.
I also review a number of famous related mathematical conjectures, including one with a $1…Continue
Added by Vincent Granville on January 10, 2018 at 6:00pm — No Comments
Summary: Here are our 6 predictions for data science, machine learning, and AI for 2018. Some are fast track and potentially disruptive, some take the hype off over blown claims and set realistic expectations for the coming year.
It’s that time of year again when we do a look back in order to offer a look forward. What trends will speed up, what things will actually happen, and what things won’t in the coming year for data science, machine…Continue
Added by Vincent Granville on December 14, 2017 at 3:00pm — No Comments
Here we discuss an application of HPC (not high performance computing, instead high precision computing, which is a special case of HPC) applied to dynamical systems such as the logistic map in chaos theory. defined as X(k) = 4 X(k) (1 - X(k-1)).
For all these systems, the loss of precision propagates exponentially, to the point that after 50 iterations, all generated values are completely wrong. Tons of articles have been written on this subject - none of them acknowledging the…Continue
Added by Vincent Granville on November 13, 2017 at 7:00pm — No Comments
There is no need to get confused with multiple linear regression, generalized linear model or general linear methods. The general linear model or multivariate regression model is a statistical linear model and is written as Y = XB + U.
Usually, a linear model includes a number of different statistical models such as ANOVA, ANCOVA, MANOVA, MANCOVA, ordinary linear regression, t-test and F-test. The GLM is a generalization of multiple…
In this post, we learn about building a basic search engine or document retrieval system using Vector space model. This use case is widely used in information retrieval systems. Given a set of documents and search term(s)/query we need to retrieve relevant documents that are similar to the search query.
The problem statement explained above is represented as in below image. …Continue
Added by suresh kumar Gorakala on November 7, 2017 at 6:30am — No Comments
Here we describe well-known chaotic sequences, including new generalizations, with application to random number generation, highly non-linear auto-regressive models for times series, simulation, random permutations, and the use of big numbers (libraries available in programming languages to work with numbers with hundreds of decimals) as standard computer precision almost always produces completely erroneous results after a few iterations -- a fact rarely if ever mentioned in the scientific…Continue
Added by Vincent Granville on November 6, 2017 at 8:30pm — No Comments
Section 3 was added on October 11. Section 4 was added on October 19. A $2,000 award is offered to solve any of the open questions, click here for details.
This is another off-the-beaten-path problem, one that you won't find in textbooks. You can solve it using data science methods (my approach) but the mathematician with some…Continue
Data scientists use a range of tools in their work and some of these eventually require programming. This book, titled The Art and Craft of Computer Programming, is a guide to computer programming. It does not focus on a specific programming language, but instead contains the essential material from a first year Computer Science course. The book is available from Amazon.com.…Continue
The narrow-sensed OLAP
OLAP is part and parcel of a BI application. As the name suggests, the word is an acronym for online analytical processing. Users, frontline employees, to be precise, are responsible for performing various types of data processing online.
But, the concept of OLAP tends to be used in a very narrow sense. It has almost become an equivalence of multidimensional analysis. Based on a prebuilt data cubic, the analysis performs summarization…Continue
Added by JIANG Buxing on October 9, 2017 at 4:00am — No Comments
You will find here nine interesting topics that you won't learn in college classes. Most have interesting applications in business and elsewhere. They are not especially difficult, and I explain them in simple English. Yet they are not part of the traditional statistical curriculum, and even many data scientists with a PhD degree have not heard about some of these concepts.…Continue
Added by Vincent Granville on October 2, 2017 at 1:43pm — No Comments
Companies and enterprises are facing a daily grind, while they are also required to see to it that their customers are happy & satisfied, operations is efficient and employees are satisfied; and all this makes running the business – a real challenge. “Audience is the new business model”, and if any organization is struggling miserably to communicate with customers or their audience, there certainly is a negative impact of it across the business plan,…Continue
Added by Chirag Shivalker on September 26, 2017 at 12:30pm — No Comments
Being in a highly technical, complex field it is easy to sometimes lose the ‘human aspect’ of the solutions we are developing. We focus on apply edge computing concepts, or whether a seasonality model works better for our predictive accuracy than some other approach. Don't get me wrong, these are all important activities. However, in working with many firms in developing, deploying and supporting advanced analytics solutions, particularly in the domain of the Industrial IoT space, it’s often…Continue
Added by Ed Crowley on September 26, 2017 at 3:00pm — No Comments
I recently posted an article featuring a non traditional approach to find large prime numbers. The research section of this article offers interesting challenges, both for data scientists interested in mathematics, and for mathematicians interested in data science and big data. My approach is data, pattern recognition, and machine learning heavy. Here is the introduction:
Large prime numbers have been a topic of considerable research, for its own mathematical beauty, as well as to…Continue
Added by Vincent Granville on September 21, 2017 at 9:00pm — No Comments
In the past year I have also worked with Deep Learning techniques, and I would like to share with you how to make and train a Convolutional Neural Network from scratch, using tensorflow. Later on we can use this knowledge as a building block to make interesting Deep Learning applications.
The pictures here are from the full article. Source code is also provided.…Continue
Added by ahmet taspinar on September 7, 2017 at 7:30am — No Comments
A typical day in the life of an Analyst
An Analyst works on varied projects with multiple deliverables and varied duties depending on the business objectives.
However there are some tasks that can be easily classified as “common everyday duties” in a “typical work day of a business analyst”
Added by Ivy Pro School on September 3, 2017 at 2:00pm — No Comments
From the OLAP concept in earlier years to the agile BI over the last few years, BI vendors never stop advertising the self-service capability, claiming that business users will be able to perform analytics by themselves. Since there are strong self-service needs among users, the two really hit it off and it is very likely that a quick deal is made. The question is - does a BI product’s self-service functionality enable a truly flexible data analytics by business users?
There isn’t a…Continue
Posted on DSC today and yesterday
Added by Vincent Granville on August 22, 2017 at 12:39pm — No Comments
Logarithms turn a product of numbers into a sum of numbers: log(xy) = log(x) + log(y). Hyperlogarithms generalize the concept as follows: Hlog(XY) = Hlog(X) + Hlog(y), where X and Y are any kind of objects, and the product and sum are replaced by operators in some arbitrary space. …Continue
ECommerce fraud is growing quickly, creating new challenges in terms of prevention and detection. As merchants gather more and more information about customers and their behaviors, the key element in the fight against fraud is now to draw on the connections within the data collected to uncover fraudulent behaviors. In this post we explain why and how graph technologies are crucial in the detection of eCommerce fraud.…
Added by Elise Devaux on August 9, 2017 at 9:30am — No Comments
This picture speaks more than words. It explains the concept or false positive and false negative, that is, what is referred to by statisticians as Type I and Type II errors.
Other great pictures summarizing data science and statistical concepts, can be found…Continue
Added by Vincent Granville on August 10, 2017 at 5:17pm — No Comments