A Data Science Central Community
What exactly is big data to an asset managers? It actually means more than sucking up social media data or spying on the satellite data. These are good data collection methodology - but realistically when you have a limited budget but you want to start something from ground zero, what should you do first? If you are to start it, what different elements you have to consider? I have some thinking about it - I wish that we can share more ideas here.
Consider supply chain: If you google "Big Data" and "hedge fund", you have get lots of cutting edge ideas. What you do not see is how to manage the Data after you have acquired? One key consideration is how to link up insight you crawled from different industries into one universe. And NOT all of the stocks in the S&P 500 are consumer related, at least more than half are not. It means that all those old folks blowing about "social media", crawling over news are too distant for a normal institutional investors. Let's say we are covering a Steel industry producer. The first thing we need to find out, is the supply chain data of this steel company. If company level information is not available, you can try to find out the industry information from input-output table, which will describes on average, for every dollar the company paid out, how does it go to different suppliers or employees payroll; same for every dollar earned, where does it earn from, whether it is exported, sold to another industry such as car or consumed (as raw materials for art object at the very extreme case).
Existing insight: As a buy side firm, do you consider managing your existing insight from the fundamental analysts? They do lots of researches on the ground, can those company level parameters be generalized? assumptions on industry growth and industry dynamics can also be modeled into the bigger Economy model, which includes models about the different industries and how they are linked. Indeed, the fundamental analysts are producers of Big Data if you collect them properly.
Crawling your customers' customers' customers' customers: As I have mentioned it in my other previous blog, the key to craw or generate insight for your industry economically is to find out the drivers. Of course you can be very good at monitoring the steel industry itself by monitoring the shipment or the work hours of the the steel industry participants, it may not give you a head start by time you have built the crawling and identifying a proper data provider. Not all industry are created equally and not all industries have the same crawl-able data available to you. Pragmatically, you can only look for the best available data source that has good and reliable quality. Early indication can be the amount of cars projected. If you want to have a head start - you can look for the income level to monitor for the amount of cars to be consumed. Wealth has to be accumulated in order to produce enough capital for car...if you are good, then you can also find the proxy for car sales quickly in the key countries/markets. Aggregating these information will give you an idea about how much steel are required. Then you can also look for the next industry that uses steel - construction ...so on. Gradually, you will be building a network of industries with interlinked supplier-customers relationship. If you could monitor your customers' customers' demand, such as sales or existing inventory, you will have an idea about how much are required. Coupling the length of the sales cycle - e.g. 3 months - then you know how much time does it take from the customers' customers' to pay your customers. With idea about your customers' inventory and demand, you also know more about your target companies.
Building the capability to crawl, not buying crawlers: There are so many different types of data displayed in so many different ways. Generally speaking, data are divided into several types: on webpage as text or pictures - you need to build the fundamental ability to recognize these two types of text/picture, otherwise you will have to pay a different price for different types of crawlers as web technology evolves.
Here are the practical consideration an investors should consider when building the mythical "Big Data" team.