Anybody interested in writing a web crawler to
- Use the top 50 Twitter profiles for top 10 keywords (analytics, business intelligence, data mining etc.) as seed URLs, using TweepSearch
- Browse connected profiles and automatically select most relevant profiles: create a list of 5000 profiles, and a sub-list of 500 profiles that are focusing on analytics only, to add in real time analytic news
For now, our AnalyticBridge real time analytic news (see http://twitter.com/#/list/analyticbridge/analytics
, see also the scrollable RT feed box at http://www.analyticbridge.com/group/miningterabytesofdata
) is made of
- Feeds from 60 Twitter blogs
- Feeds from my own Twitter blog
Some of these blogs (at least mine, but I believe another one as well) have two feed sources
- content produced internally
- content produced externally: in my case: featured AB posts, featured AB jobs, featured AB conferences, and Google news feeds for a number of pre-selected keywords (all these sources - external and internal, from everyone, are blended together in one single feed, and then displayed on our Data Mining group, but some other AB groups have customized, partial feeds only)
While this sounds like a great level of automation has been achieved while preserving quality at the same time, there is one time-consuming manual step: identifying Twitter profiles that I would like to add to my Twitter list ("follow" if you use the Twitter terminology) so that their tweets end up in my large Real Time Analytic News feed.
Interestingly, this was done without writing one line of code.