Subscribe to DSC Newsletter

The next revolution in analytics: it's not about software, it's about data

It is about integrating external data sources in your data warehouse, and leveraging this data to answer questions such as "why are we losing so many users last month" or "why do we have so few new users recently", or "what new product / feature should I produce". The answer (and the cure) might not come from within your internal data, but from the outside:

- what are my competitors up to?
- what do my clients / employees write on Facebook, Twitter or elsewhere?

The external data in question can be gathered and analyzed using web crawling and text mining techniques, or surveys - to automatically find out and summarize what is being said about your company... and about your competitors. Combined with internal data, it could answer critical business questions.

In my opinion, the potential in properly exploiting external data far exceeds results that you could get from improved software (cloud, analytics as a service, hidden decision trees etc.) I believe the future of analytics is more about finding relevant data (and identifying the right metrics - this is absolutely critical and I will discuss it later), than software improvements.

Views: 1451


You need to be a member of AnalyticBridge to add comments!

Join AnalyticBridge

Comment by John Gins on September 16, 2011 at 9:59am

When I worked at Hyperparallel, before Yahoo bought it, one of my fellow researchers worked on a project studying real time traffic patterns thru a store. He was able to use GIS related software to analyze time of day against travel patterns against Trip Mission (based on what was bought).


In the northeast Stop and Shop was using Smart Baskets and the buyers Loyalty Card Information to help inform the shopper of deals that were targeted toward them. There was a lot of propensity modeling statistics going on in the background to allow that to happen.

Comment by Deepak Babu P R on September 15, 2011 at 3:06am

I see a lot of comments related to use of GIS information is increasingly gaining acceptance. I have a point in support of the same. We have foursquare(locaiton based services), we know when my customers are entering into the store, there are opportunities to integrate this with my internal customer behavior data to identify customer value and do marketing in REAL-TIME with offers relevant to his/her needs. I had presented this idea in the form of a deck at an Analytics expo, where my team won the first place. felt its a cool idea.




Comment by Themos Kalafatis on August 25, 2011 at 12:11am

Competitive Intelligence works surprisingly well. For example by collecting Tweets discussing about  all Telcos we are then able to understand :


- which Telco is associated with signal problems

- which Telco products have the highest positive / negative sentiment

- Mining phrases that suggest that someone will churn, the reasons and for which Telco this refers to



and all this in near real time.

Comment by Rana Singh on February 14, 2011 at 5:17pm

Vincent, I couldn't agree more. There is so much valuable data that sits outside of companies which needs to be analyzed with the internal data to get even better insights.



Comment by John Gins on November 24, 2010 at 12:41pm
There are several interesting problems with GIS data. The data for a point source, versus data for a line source, vs data for and area/volume source require different treatments to associate one with another. Ken Reed and Joe Berry used an overlay system to grid (i.e. 100m by 100m) a map then treated the grid key as a join key to allow pont data, line data and area data to be joined (the area and line data would be scaled.)
An example might be anew natural gas pipeline being laid in a neihborhood. A point source might be a residence, line source the pipeline, an area could be the census block. Join the data to estimate the propensity of residences willing to allow a gas hookup to their residence.
Comment by John Gins on November 24, 2010 at 8:33am
I realized about 20 years ago that Geographical Based data (GIS) and the the data types it brings (point, line/curve, area, volume) can actually dwarf the amount of data we use today. My current company just barely uses GIS related information (Distance, Census). The retail industry does looks at product placement in geometrical terms on a shelf and has started paying attention to placement in the store. (How often these days will you find the magazine rack near the pharmacy department in a large Drug store?)
With GPS information becoming more commonly used. What tools should Statisticians and Data Scientists be using to integrate this information with GIS data and more typical data sources?
Comment by Richard Boire on November 24, 2010 at 5:10am
Victor, you are absolutely right. We have always stressed the data with the organizations that we work with. But it's not just external data that we focus on. We will focus on the company's own internal data and arrive at unique approaches in manipulating this information into new variables. Software is secondary and data is primary.
Comment by Ralph Winters on November 22, 2010 at 11:30am
Vincent: Yes I think you can get good results from this providing you have enough agreement
between the identifiers, and have assigned them optimal prior weights. If you are dealing with identifiers like SSN, name/address, gender, phone number you are on solid ground. If you have a lot of missing values or you dealing with a lot of unstructured data that will further complicate.

-Ralph Winters
Comment by John Gins on November 22, 2010 at 10:36am
I believe it is more of a balance then you imply. If you do not have reasonable internal system, software, hardware, analytical best practices then only seekining outside data will just overwelm the organization. My company has been integrating outside data sources for over 30 years and providing that data to our customers, It takes a great deal of effort and fast, efficent, automation to make sure the information is timely and accurate.

I would be cautious about jumping on a data source until it can be vetted. I had an experience several years ago where I assumed that a usually reliable source was keeping current with all client demographic data. I was supprised to discover that their household vehicle data was several years out of date. (This was caused by privacy laws.) However the data provider never mentioned it in their documentation,

Vincent, there are tools like Quality Stage from IBM that help with joining. It requires setting up standards, patterns and rules, then allowing a knowledge worker to evaluate exceptions. Similar technology has been used for address matching.
Comment by Jason Price on November 22, 2010 at 9:06am
I would tend to agree with your comment. Much of our work involves integrating content from third party data vendors in order to provide valuable insight from the purchased data. On the flip side, we also use custom software and purchased software to help distribute the insight, whether it be in a web format, visualization, or some compiled report. Working with the external data also give you more domain expertise since you are constantly learning about the data, how it can be used as an asset, and where it can be applied for better decision making. Great post.

On Data Science Central

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service