A Data Science Central Community

*This articles discusses some of the data challenges that the healthcare industry faces. It also revisits how Statice's collaboration with the leading health organization Roche to test the use of synthetic medical data for clinical research and what opportunities we see from this.*

Maybe more than for other industries, research and innovation…

ContinueAdded by Elise Devaux on September 14, 2020 at 4:12am — No Comments

Given *n* observations *x*1, ..., x*n*, the generalized mean (also called *power mean*) is defined as

The case *p* = 1 corresponds to the traditional arithmetic mean,…

Added by Vincent Granville on August 30, 2020 at 9:30am — No Comments

**10 use-cases for privacy-preserving synthetic data**

Fast-evolving data protection laws are constantly reshaping the data landscape. The organizational ability to **overcome sensitive data usage restrictions** while safeguarding customer privacy will be a **key driver of tomorrow’s successful businesses**. This blog presents ten concrete **applications for privacy-preserving synthetic…**

Added by Elise Devaux on August 5, 2020 at 6:59am — No Comments

This blog takes a closer look at the concept of privacy-preserving synthetic data. It answers the question “what is synthetic data” and looks at the origin of synthetic data in the context of data privacy. It also presents one way of generating privacy-preserving synthetic data and its benefits for organizations.…

ContinueAdded by Elise Devaux on July 2, 2020 at 11:30am — No Comments

As I learn about data privacy, I’m starting to realize how large the ecosystem is. I focused here on a category that spans across the data privacy landscape, **Privacy Enhancing Technologies (PETs)**. In the post, I cover:…

Added by Elise Devaux on June 12, 2020 at 2:14am — No Comments

Bernouilli lattice processes may be one of the simplest examples of point processes, and can be used as an introduction to learn about more complex spatial processes that rely on advanced measure theory for their definition. In this article, we show the differences and analogies between Bernouilli lattice processes on the standard rectangular or hexagonal grid, and the Poisson process, including convergence of discrete lattice processes to continuous Poisson process, mainly in two…

ContinueAdded by Vincent Granville on June 5, 2020 at 1:11pm — No Comments

*Summary:** Explaining data science to a non-data scientist isn’t as easy as it sounds. You may know a lot about math, tools, techniques, data, and computer architecture but the question is how do you explain this briefly without getting buried in the detail. You might try this approach.*

Continue

Added by Vincent Granville on June 4, 2020 at 5:05pm — No Comments

Product of two large primes are at the core of many encryption algorithms, as factoring the product is very hard for numbers with a few hundred digits. The two prime factors are associated with the encryption keys (public and private keys). Here we describe a new approach to factoring a big number that is the product of two primes of roughly the same size. It is designed especially to handle this problem and identify flaws in encryption algorithms. …

ContinueAdded by Vincent Granville on May 27, 2020 at 12:20pm — No Comments

This post discusses what actually makes data anonymous, share about the misconception we have of it and describe the problems it raises.

Added by Elise Devaux on May 23, 2020 at 1:00pm — No Comments

We discuss a simple trick to significantly accelerate the convergence of an algorithm when the error term decreases in absolute value over successive iterations, with the error term oscillating (not necessarily periodically) between positive and negative values.

We first illustrate the technique on a well known and simple case: the computation of log 2 using its well know, slow-converging series. We then discuss a very interesting and more complex case, before finally focusing on a…

ContinueAdded by Vincent Granville on May 5, 2020 at 5:37pm — No Comments

One of the main challenges in data science projects is managing stakeholder expectations. Often those in the business will have little idea of the complexity and timescales of seemingly simple tasks.

**Sourcing Data**

Consider sourcing data. In some organisations, with a non-collaborative culture, something as simple as getting a file of data from IT can take weeks. Add on time to check the data, spend time with someone to explain it, handle revisions and…

ContinueAdded by Andrew Watson on May 1, 2020 at 7:00am — No Comments

The methodology described here has broad applications, leading to new statistical tests, new type of ANOVA (analysis of variance), improved design of experiments, interesting fractional factorial designs, a better understanding of irrational numbers leading to cryptography, gaming and Fintech applications, and high quality random numbers generators (and when you really need them). It also features exact arithmetic / high performance computing and distributed algorithms to compute millions of…

ContinueAdded by Vincent Granville on February 29, 2020 at 11:00pm — No Comments

*Summary:**The Gartner Magic Quadrant for Data Science and Machine Learning Platforms is just out the big news is how much more capable all the platforms have become. Of course there are also some interesting winner and loser stories.*

The Gartner Magic Quadrant for Data Science and Machine Learning Platforms is just out for 2020. The really big news is how many excellent choices are now available. In a remarkable move, the whole field…

ContinueAdded by Vincent Granville on February 21, 2020 at 9:25am — No Comments

In this notebook, we try to predict the positive (label 1) or negative (label 0) sentiment of the sentence. We use the UCI Sentiment Labelled Sentences Data Set.

Sentiment analysis is very useful in many areas. For example, it can be used for internet conversations moderation. Also, it is possible to predict ratings that users can assign to a certain product (food, household appliances, hotels,…

ContinueAdded by Vincent Granville on February 19, 2020 at 8:42pm — No Comments

Probably the worst error is thinking there is a correlation when that correlation is purely artificial. Take a data set with 100,000 variables, say with 10 observations. Compute all the (99,999 * 100,000) / 2 cross-correlations. You are almost guaranteed to find one above 0.999. This is best illustrated in may article How to Lie with P-values (also discussing…

ContinueAdded by Vincent Granville on February 7, 2020 at 9:48am — No Comments

Fermat's last conjecture has puzzled mathematicians for 300 years, and was eventually proved only recently. In this note, I propose a generalization, that could actually lead to a much simpler proof and a more powerful result with broader applications, including to solve numerous similar equations. As usual, my research involves a significant amount of computations and experimental math, as an exploratory step before stating new conjectures, and eventually trying to prove them. The…

ContinueAdded by Vincent Granville on January 30, 2020 at 1:09am — No Comments

Hundreds of programming languages dominate the data science and statistics market: Python, R, SAS and SQL are standouts. If you're looking to branch out and add a new programming language to your skill set, which one should you learn? This one picture breaks down the differences between the four languages.…

ContinueAdded by Vincent Granville on January 28, 2020 at 8:41pm — No Comments

While many of the programming libraries encapsulate the inner working details of graph and other algorithms, as a data scientist it helps a lot having a reasonably good familiarity of such details. A solid understanding of the intuition behind such algorithms not only helps in appreciating the logic behind them but also helps in making conscious decisions about their applicability in real life cases. There are several graph based algorithms and most notable are the shortest path…

ContinueAdded by Vincent Granville on January 21, 2020 at 10:12am — No Comments

In 2019, Google announced TensorFlow 2.0, it is a major leap from the existing TensorFlow 1.0. The key differences are as follows:

**Ease of use:** Many old libraries (example tf.contrib) were removed, and some consolidated. For example, in TensorFlow1.x the model could be made using Contrib, layers, Keras or estimators, so many options for the same task confused many new users. TensorFlow 2.0 promotes TensorFlow Keras for model experimentation and Estimators…

Added by Vincent Granville on January 9, 2020 at 9:49am — No Comments

*Summary:** AI/ML itself is the next big thing for many fields if you’re on the outside looking in. But if you’re a data scientist it’s possible to see those advancements that will propel AI/ML to its next phase of utility.*

“The Next Big Thing in AI/ML is…” as the lead to an article is probably the most…

ContinueAdded by Vincent Granville on January 7, 2020 at 7:41am — No Comments

- Data-driven innovation in healthcare: synthetical clinical data
- Introduction to privacy-preserving synthetic data
- How “anonymous” is anonymous data?
- Use the Data Insights Iceberg to Manage Stakeholder Expectations
- New Books in AI, Machine Learning, and Data Science
- Python for Automating Your Quality Analysis
- 40+ Modern Tutorials Covering All Aspects of Machine Learning

- Data-driven innovation in healthcare: synthetical clinical data
- The Exponential Mean: Alternative to Classic Means
- 10 use-cases for privacy-preserving synthetic data
- Introduction to privacy-preserving synthetic data
- PETs: the technologies organization should consider adopting
- Bernouilli Lattice Models - Connection to Poisson Processes
- Explaining Data Science to a Non-Data Scientist

- Data science jobs not requiring human interactions
- Data Science – the Foundation for Leading Banks
- Blog - R vs Python. Which one has higher demand on the job market? A short study
- 10 Tools For Working With Big Data For Successful Analytics
- The 8 worst predictive modeling techniques
- Common Errors in Machine Learning due to Poor Statistics Knowledge
- Machine Learning with Signal Processing Techniques

- data (143)
- analytics (142)
- asymptotix (131)
- Analytics (125)
- Data (114)
- Business (49)
- predictive (46)
- big (45)
- Intelligence (42)
- Big (42)

**2020**

**2019**

- December (5)
- November (6)
- October (5)
- September (3)
- August (9)
- July (4)
- June (5)
- May (5)
- April (7)
- March (5)
- February (9)
- January (7)

**2018**

- December (3)
- November (1)
- October (3)
- September (8)
- August (13)
- July (6)
- June (7)
- May (16)
- April (10)
- March (10)
- February (14)
- January (14)

**2017**

- December (7)
- November (14)
- October (15)
- September (13)
- August (17)
- July (13)
- June (9)
- May (10)
- April (23)
- March (8)
- February (8)
- January (10)

**2016**

- December (12)
- November (24)
- October (3)
- September (4)
- August (17)
- July (19)
- June (6)
- May (21)
- April (14)
- March (15)
- February (13)
- January (11)

**2015**

- December (25)
- November (19)
- October (24)
- September (21)
- August (26)
- July (34)
- June (30)
- May (16)
- April (21)
- March (17)
- February (25)
- January (19)

**2014**

- December (29)
- November (29)
- October (36)
- September (15)
- August (18)
- July (40)
- June (29)
- May (24)
- April (38)
- March (42)
- February (49)
- January (67)

**2013**

- December (66)
- November (76)
- October (79)
- September (90)
- August (106)
- July (89)
- June (72)
- May (72)
- April (63)
- March (61)
- February (74)
- January (54)

**2012**

- December (45)
- November (83)
- October (119)
- September (82)
- August (95)
- July (77)
- June (85)
- May (104)
- April (41)
- March (74)
- February (73)
- January (73)

**2011**

- December (83)
- November (64)
- October (77)
- September (105)
- August (39)
- July (25)
- June (44)
- May (64)
- April (46)
- March (34)
- February (50)
- January (40)

**2010**

- December (76)
- November (54)
- October (42)
- September (73)
- August (39)
- July (35)
- June (34)
- May (27)
- April (24)
- March (20)
- February (26)
- January (36)

**2009**

- December (49)
- November (57)
- October (48)
- September (44)
- August (39)
- July (27)
- June (41)
- May (38)
- April (53)
- March (47)
- February (37)
- January (38)

**2008**

© 2020 TechTarget, Inc. Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions