Getting Data From the Web: Code, Tools, & Data Sets

“Gartner believes that enterprise data will grow 650 percent in the next five years, while IDC argues that the world’s information now doubles about every year and a half. IDC says that in 2011 we created 1.8 zettabytes (or 1.8 trillion GBs) of information, which is enough data to fill 57.5 billion 32GB Apple iPads, enough iPads to build a Great iPad Wall of China twice as tall as the original.” (Source)

Data is everywhere.

For the first time in history a large portion of the world’s data is in one place: the World Wide Web. Never before have we been so connected to each other, to our possessions, and to technology as we are today.

Wondering how big “Big Data” really is? Consider the following summary statements from The GovLab Index:

  • How much data exists in the digital universe as of 2012: 2.7 zetabytes (or  1 billion terabytes)
  • Increase in the quantity of Internet data from 2005 to 2012: +1,696%
  • Percent of the world’s data created in the last two years: 90
  • Number of exabytes (=1 billion gigabytes) created every day in 2012: 2.5; that number doubles every month
  • How much information in the digital universe is created and consumed by consumers (video, social media, photos, etc.) in 2012: 68%
  • The world’s annual effective capacity to exchange information through telecommunication networks in 1986, 2007, and (predicted) 2013: 281 petabytes, 65 exabytes, 667 exabytes
  • Increase in data collection volume year-over-year in 2012: 400%

From these numbers, two things are clear: (1) data is not going anywhere; and (2) the internet is basically a giant living data set that’s constantly being uploaded each and every second of each and every day.

The huge amount of data being uploaded and shared to the web creates a massive opportunity for businesses looking to learn more about their competitors, their products, their processes, their markets, and their customers.

There are three ways to extract data from the web:

To read about the plus and minuses, and the benefits of DaaS, click here.

