Subscribe to DSC Newsletter

AnalyticBridge competition: investigate the spectacular stock market collapse of May 6, 2010

This is the largest one-day drop in history, and it is known as the flash crash. In 6 minutes Dow lost 600 points then recovered. The difference between high and low on May 6 was about 1,000 points, and Dow went below 10,000. Is it a technical glitch, foul play or natural cause?

To investigate this massive collapse, you are asked to:

- download publicly available daily stock prices (volume, open, low, high, close) for all stocks that are listed on the NYSE or Nasdaq (you can get the data from Yahoo Finance and download it using a web robot -- there is about a few thousand stocks)
- is the collapse evenly spread among all stocks? Any abnormality? What would the worse drop (for an individual stock) be if the average is 10%, given the number of stocks in NYSE or Nasdaq?
- see also this posting.


- The winner will be featured as our Member of the Month ($250 award)
- An additional $750 prize is offered
- The winner will be invited to become a member of AnalyticBridge's Executive Club and will receive our certification (none can be purchased)
- This represents a terrific career boost as we will make public announcements, press releases, etc.


Email your detailed analysis to Dr. Vincent Granville at [email protected] by December 15, 2010. Messages larger than 2MB will not be accepted. The subject line should be AnalyticBridge Competition 2010.

Views: 548

Replies to This Discussion

I am not sure I understand the requirements in this competition - what is the problem and goal statement?
What are the criteria on which the reports/research results will be judged?
Are you looking for the design of a 'flagging system' that flags such abnormalities in real time?

sorry for the many Qs.
Wasn't this blamed on a 'fat finger'?! Apparently, somebody meant to type 'm' for million and accidentally pressed 'b' for billion causing automated trading systems to spin out of control.
Hmm... perhaps the cascade, if it is one, can be traced...
Many people think that the "fat finger" theory is a legend.
Additional details:

- we are not looking for the perfect solution: nobody may ever known what really caused the collapse / recovery
- we want to reward people for leveraging analytic skills to produce a sound analysis, with statistical robustness, solid cross-validation, minimum amount of bias, and the ability to tackle a new "extreme value theory" problem
- we will feature not just the winner, but everybody producing a high quality analysis: your analysis should have comments about data quality, confidence intervals regarding your key statistics, and solid argumentation regarding your conclusions (your conclusion might very well be: we don't know the cause -- this is fine)
Congrat. on the contest!

It would help though if the input file is made available here ... in the absence of a web robot

I believe that getting data (and doing your data quality checks) to support your claims is part of the exercise, as well as mentioning your sources. There is public, free data on Yahoo finance for thousands of stocks and indexes (e.g. QQQQ), with price at open, close, high, low, volume, with a daily granularity, going back years ago.
What is surprising is not so much the drop, but the recovery that occurred within minutes of the crash. It is clear that the stock market can support Dow at 10,000 but not at 11,000.

Surprisingly, the recovery didn't have hiccups. This is very unusual, as this type of market collapse is followed by extremely volatile trading that lasts for days, with several large ups and downs. Here the recovery was as fast as the collapse.

In short, this looks like a very massive earthquake (the biggest one ever) with no after-shocks. It is possible that the recovery was artificial, maybe resulting from sell orders being reversed by the stock exchanges or some other authority.
The FAST and smooth recovery could indeed result from massive automatic orders. Where do all the forced position closing orders go? I assume they go right back to the same pool, in a shape of increased money supply looking to materialize in form of buying. The small players get wiped out, and the large ones get bigger, hungrier and with more purchasing power. The logic of this is imprinted in the decision software, which augments the response.
This feedback effect ended only when all the secured positions were liquidated, and the feedback mechanism sort of ran out of gas. Quite elementary in terms of Game theory it seems if looking at this from this angle.
Yet, since the above can be anticipated, it should not be ruled out that a clever big player would instigate such a process, eliminate the majority of stop-short secured positions, force them out, and harvest their value. In such a case, that player is not going to stop here.
Nanex, a datafeed provider, published their analysis at:

I don't know if they're right or not, but it's a short and interesting read.
Thanks for pointing to this axcellent site
An example of potential answer could be:

* all stocks were impacted by the drop
* the worst drops were 30%, but based on the number of stocks involved and extreme value theory, the fact tha one stock (Procter & Gamble) fell by 30% is not surprising
* the recovery was extremely swift and did not have chaotic behavior, compared with how stocks typically recover from a massive crash
* thus we believe the crash was "natural" and the recovery "artificial"

I'm not saying this is the right answer - nobody will ever know the right answer. But this is an example of answer that could win, provided it comes with a sound analysis of stock and index prices before and after the collapse.
What you are asking the participants to do is to build a model and then explain what may have happened in a time interval of about one hour. Participants could gather knowledge by taking individual samples of some parameters (you suggested volume, open, low, high, close).

According to the Nyquist–Shannon sampling theorem by getting one sample per day (again as you suggested) the events for interest here will be completely masked out because the highest frequency component would have a period of at least 2 days.

Since I didn’t know how fast the events took place I took a look at the web page indicated by Joseph Foutz in his post. According to Chart 1-b: from FlashCrashAnalysis_Part1-1.html. From what I see there to really be able to analyze the events one would need to have access to data taken at a sampling interval in the range of seconds or lower.

It is possible for some participants to either have access to appropriate recordings or be able to download the appropriate data from some specialized site. However in this case the competition becomes one of corporate resources and not of personal knowledge and ingenuity. My suggestion is to provide the same data to all participants and eliminate the data acquisition component as a differentiating factor.


Follow Us

On Data Science Central

On DataViz

On Hadoop

© 2018 is a subsidiary and dedicated channel of Data Science Central LLC   Powered by

Badges  |  Report an Issue  |  Terms of Service