A Data Science Central Community
“What can Silicon Valley learn from Wall Street?” could be another wording of this question. Indeed, the last years have seen tremendous developments in two statistics-related and highly mathematized areas – data science (DS) and quantitative risk management (QRM). However, though often tackling similar problems they evolved mainly parallel to each other. Cultural reasons of the main areas of their application (“Silicon Valley” vs. “Wall Street”) could be a reason for this separate development. Difficulties to communicate concepts in highly complex frameworks could be another one.
Of course there exist already some topics were the two areas were successfully merged. News-based algorithmic trading is one well-known example. Other examples include e.g. web-based reputation risk management, big data based rogue trading detection, and news-based credit rating early warning systems.
But this is only the tip of the iceberg. Both areas can also contribute to each other on a more fundamental conceptual level. The aim of this article is to draft some examples where both areas already use similar concepts – and to provide some suggestions of concepts which are well established in QRM and could be also useful for DS applications.
The basic skillsets for quantitative risk managers and data scientists have a huge overlap concerning the mathematical and statistical knowhow. Also, the problems of both areas are often similar (e.g. credit rating development which basically is data science). Thus it is not really surprising that many concepts used in DS are also well-established in QRM – though the nomenclatures and details often differ. Some concepts used in both “worlds” include
Regression: The main aim of QRM is to provide stable statements about possible future developments. Regression techniques are frequently used for this task.
The most commonly one is the most simple: The linear regression which is used e.g. to simulate future movements of stock price yields. (In this case, generally the logarithm of the stock price is assumed being linear – and thus the yield constant.) Another case is e.g. the replacement of missing data with reference values.
Logistic regressions are commonly used for calculating credit ratings: To discriminate between default and non-default – but also to “benchmark” internal ratings to external ones from the rating agencies.
Time series: Here applies the same as for regressions. Especially for interest rate models, concepts like ARIMA or GARCH are commonly used – at least in academia. Since many parameters, like inflation curves, have seasonality patterns seasonality adjustments are common practice.
ROC: In order to validate e.g. rating models, concepts similar to ROC or lift curves are used in QRM (though generally termed as discriminatory power).
Feature selection: Many areas of QRM – like interest curve or credit portfolio modeling – have to do with a great number of different parameters. In order to reduce them diagonalization techniques (like PCA in the simplest case) are used. Also, regressions require often further simplification. Here, forward selection and backward elimination are the methods of choice.
k-nn: Classification methods have not entered into QRM, yet. However, the usage of data sets as “blueprint” for simulations is already established and called “historical simulation”.
After the short introduction of some concepts commonly used in QRM and DS, there follow some (not complete and preliminary) suggestions of QRM methods currently not used in DS.
Value at risk (VaR), expected shortfall (ES), and other distribution-related figures: If there is one basic risk management lesson it is that predicting just the expected value of a statistic figure is dangerous at best. To cover this issue, several techniques have been developed that consider distributions and focus on certain bad cases. Here, despite its diverse shortcomings, the quantile-based VaR is the one most commonly used. Other methods – like the ES – consider complete distributions of bad cases and are more stable under some statistical criteria like sub-additivity. Distribution considerations may play an important role in many DS solutions (like predicting revenues) – and QRM could provide the required toolset. (This issue should become more important in the next time due to business reasons since risk considerations begin also to play a role in the controlling procedures of corporates.)
Monte Carlo methods: Are Monte Carlo simulations data science? Though many may disagree, it should be taken into account that Monte Carlo simulations are basically also means to get information from data sets. Monte Carlo simulations are excessively used in QRM and several methodologies (i.e. Cholesky decomposition for generating correlated random numbers) are implemented and could be of use for basically every use case that has to do with prediction.
Correlation modeling and diversification: As mentioned, measuring default rates is an important part of QRM. But QRM goes much farther. Correlations of defaults of counterparties are also considered – as well as higher correlations of branches and regions. The respective portfolio models are possibly the most complex ones QRM has developed yet and are often proprietary, like e.g. CreditMetrics or CreditRisk+ – but there exist also free and simple alternatives. Why should such methods not also be applied in fraud detection or supply chain management solutions?
The future influences the present: Perhaps the most striking feature of financial mathematics is that the (expected) future influences the present. The valuation of most financial products must include also the expectation about future revenues (cash flows). But this is not just a feature of financial products. Humans always tend to plan for the future and thus everything parameter that has to do with human behavior is influenced by some future expectations.
The model influences the reality: Another peculiar feature of financial math is that market participants act differently if they learn more about how the market works. This is a big difference e.g. to physics and seems paradoxical. However, e.g. the market values of stock options have clearly changed after the introduction of the Black-Scholes model. Could there be also a “Black-Scholes moment” for DS?
A simple plot – the heatmap: Finally a suggestion that has just to do with visualizing results. In QRM, in order to show “bad” casesSummarizing, the areas data science and quantitative risk management have lot of similarities – but s more clearly, the frequency of damages is generally plotted over the severity as scatterplots. Each area of the plot has a discrete color that gives an indication about the necessity to act. Such heatmaps are simple to set up and easy to communicate. They can also be easily adapted to other kinds of variables (e.g. severity over time horizon).
Summarizing, the areas data science and quantitative risk management have lot of similarities – but also often quite different approaches. Both areas can only profit from the gained knowledge of the other one.
Dr Dimitrios Geromichalos holds a doctorate in physics and a certification as Financial Risk Manager. He has over 10 years experience in the finance industry and worked as banker, auditor, and consultant. Currently he is employed at Capco – The Capital Markets Company GmbH. The opinions expressed in this article are not necessarily those of Capco.
Comments are closed for this blog post