Subscribe to DSC Newsletter

June 2013 Blog Posts (18)

Data science defeats intuition: twin data points is the norm, not the exception

This is an example where data science and statistical analysis is superior to intuition. Here, intuition is misleading you into the wrong conclusions.

By twin data points, I mean observations that are almost identical. In any 2- or 3-dimensional data set with 300+ rows, if the data is quantitative and evenly distributed in a…


Added by Vincent Granville on June 26, 2013 at 7:30pm — 4 Comments

SQL CREATE TABLE in Pandas (Python) -- should be straightforward, but...

My new blog post on how to do the equivalent of SQL's "CREATE TABLE" in the Pandas Python Data Analysis Library. Sounds simple, but I wasn't able to find such an example anywhere on the web.

Added by Michael Malak on June 26, 2013 at 1:29pm — No Comments

Intellipaat Hadoop Online Training

Hi, All

Intellipaat will Start a new Hadoop Developer Batch From 29th june 2013. Interested Candidates Drop an Email at sales(@)intellipaat(dot)com.


sales Team Intellipaat


Added by raja singh on June 24, 2013 at 4:30am — 1 Comment

Which are the data products one can curate for AIRLINE industry ?

Here is a crisp info graphic which communicates the top 5 data products which can be $ denting in the Airline industry

1. Property reccomender

2. Word of mouth modeler

3, Funnel friction Spotter

4. Traveller churn scorer

5. Sentiment Analyzer

Added by derick on June 21, 2013 at 12:43pm — 1 Comment

Unleashing Intelligence through natural language (Part 3 - Autonomously generated assumptions - with self-adjusting level of uncertainty)

In this series I reveal and explain rules of intelligence contained within grammar, that can be utilized to unleash intelligence in software. These rules are extremely simple, but still undiscovered by scientists.

To be able to explain making assumptions, we need to understand (the difference with) drawing conclusions first:

• Conclusions are drawn straight ahead - top-down - like in: Given "John is a father." and "…


Added by Menno Mafait on June 19, 2013 at 10:39pm — No Comments

Business Analytics Model Risk Working Group

This is a review committee working group for practitioners and academics to establish a formal definition and set of classification criteria regarding business analytics model risk.

This group has been established based upon interest and feedback concerning a recent set of posts regarding business analytics model risk: …


Added by Scott Mongeau on June 19, 2013 at 6:53am — No Comments

Adopting Analytics Culture: What can be learned from social network analysis (SNA)?




Added by Scott Mongeau on June 17, 2013 at 10:00am — No Comments

Seven Questions on Adopting Analytics Culture

Seven Questions on Adopting Analytics Culture

Seven questions are posed and are addressed in serial.  The theme: ‘how can organizations adopt analytics-based decision making culture?’  

In particular, the questions address the use of change management to adopt evidence-based decision making, associated organizational challenges, and how analytics can be used to manage organizational change itself:…


Added by Scott Mongeau on June 13, 2013 at 3:59pm — 2 Comments

Business analytics model risk: framing model risk

Business analytics model risk (part 0 of 5): framing model risk - the complexity genie and the challenge of deciding on decision models

Introduction to a series of five articles on model risk

Here we introduce a series of five articles seeking to frame, define, and categorize business analytics model risk.  The intention is to propose processes and practices for strengthening organizational decision model risk mitigation. The series of five…


Added by Scott Mongeau on June 13, 2013 at 3:54pm — No Comments

Query Hive from iPython Notebook

My new blog post on querying Hive from iPython Notebook with pandas, the Python alternative to R:

Added by Michael Malak on June 13, 2013 at 9:44am — No Comments

Interesting Computational Complexity Question

In my recent article on a new, robust coefficient of correlation and R Squared, I mentioned an algorithm to generate random permutations:



Added by Vincent Granville on June 10, 2013 at 9:30pm — No Comments

DropQuery - A project from AngelHack Sydney, love to hear some feedback.

I attendedAngelHack Sydney recently during the month of May.

AngelHack is a hackathon where developers and entrepreneurs come together to prototype a viable business idea within 24 hours.

The project that I worked on was called "DropQuery". The basic concept is this.

* You have some data files - CSV, XLS, XML

* You want to quickly query it.

I talked to a few people at the…


Added by Eric Bae on June 6, 2013 at 6:28pm — No Comments

from multiple hourly wind energy forecasts to firm hourly DA forecasts.. using optimization analytics


the next day wind speed is forecasted using different weather parameters. the forecasted wind speed is converted into electricity supply forecasts using wind turbine power curves. all forecasts have errors. hence most of the wind energy generators supply their electricity directly into the real time markets.

this causes them to leave a lot of upside on the table.

this is cause electricity cannot be stored. all that is produced is either consumed or grounded.shortfall…


Added by Parag Patil on June 6, 2013 at 5:52pm — No Comments

Big Data & Marketing Attribution

Marketers are seeing the number of communication channels available to them increasing, while their budgets remain the same. The answer is not necessarily to do more marketing but to integrate the use of these channels effectively in order to find the right set of actions which bring the highest conversion rates, maximum profits and the most satisfied customers. For that, brands can rely on the analysis of consumer behavior, highlighting the customer lifecycle before purchasing and…


Added by Michel Bruley on June 5, 2013 at 1:22am — No Comments

Correlation and R-Squared for Big Data

With big data, one sometimes has to compute correlations involving thousands of buckets of paired observations or time series. For instance a data bucket corresponds to a node in a decision tree, a customer segment, or a subset of observations having the same multivariate feature. Specific contexts of interest include multivariate feature selection (a combinatorial problem) or identification of best predictive set…


Added by Vincent Granville on June 5, 2013 at 12:00am — 3 Comments

7 Habits of Highly Successful Big Data Pioneers

The Big Data age is dawning. Just like every major emerging opportunity that presents an unprecedented competitive advantage across sectors, we might not know the ultimate outcome with this journey at this stage. But everyone wants to know: who will race to the finish line and come out on top? Whether you're the turtle or the hare, it's important to observe and manage the signs presented to you on the road to victory. Here's a collection of differentiators that will lead to…


Added by Radhika Subramanian on June 4, 2013 at 11:02am — No Comments

3 Tips for Escaping Upper Hell (aka Slow Connections with Large Data)

Ever find yourself waiting for your data to appear and you start wondering if you are paying for sins from a past life?  Dante must have been thinking of this situation as he created his Circles of Hell.  There is no way the agony of waiting for data to appear with a looming deadline did not make his list!

Often it seems like the closer the deadline, the slower the connection.  When working with data locally, the data appears…


Added by Tricia Aanderud on June 4, 2013 at 5:23am — No Comments

Blog Topics by Tags

Monthly Archives














On Data Science Central

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service