A Data Science Central Community
What are Markov Chains?
A Markov chain is a random process with the property that the next state depends only on the current state. For example: If you have the choice of red or blue twice the process would be Markovian if each time you chose the decision had nothing to do with your choice previously (see diagram below). How can Markov Chains help us?
Detail and Terminology
To start with we need to define some basic terminology. The changes of state within the system are called transitions, and the probabilities associated with various state-changes are called transition probabilities. A famous example of Markov chains is called the ‘Drunkard’s Walk’. In this example you are trying to walk in a straight line along the number line staying on 0. With each step you take you randomly add or minus 1 from your score. Below is an example taken from the first two steps on this path to show the various probabilities:
In this example each step is a transition with a transition probability of 0.5 that you add 1 and 0.5 you minus 1 from your score.
A More Complex Example
Imagine the scene: it is 5pm, you’re at work and you have 4 choices of what to do next:
Stay at work, Go to the gym, Go to the pub or Go home. Every hour irrespective of what you are doing you have to make the choice again. The transition matrix below shows your probabilities from moving from one state to the next and would be generated from observing your data. For example if you are currently at work then your probability of staying at work for the next hour is 0.2 whereas your probability of going home is 0.4.
There are a few aspects of this transition matrix that are worth pointing out:
- The transition matrix needs to be reflective of the full state space. By this I mean that the only possible states are all contained in the matrix (you couldn’t go to the cinema for example).
- A lot of the potential paths are accessible (for example from the Work state you can move to Gym, Pub and Home) but some paths do not exist. If you are currently at the Pub there is zero probability of your next move being to Work (unprofessional) or the Gym (potentially dangerous).
- Home is what is known as an Absorbing state as once you are in this state you have zero probability of moving somewhere else (presumably because it’s cold outside).
Once you have this transition matrix you can answer all kinds of questions about where you are likely to be at a certain time.
- What is the probability I will still be at work at 8pm?
To get from 5pm to 8pm we go through 3 transitions
Probability I will be at work at 6pm = 0.2
Probability I will be at work at 7pm = 0.2 * 0.2 = 0.04
Probability I will be at work at 8pm = 0.2 * 0.2 * 0.2 = 0.008
- What is the probability I will be at the pub at 7pm?
To get from 5pm to 7pm we go through 2 transitions but there are multiple paths
Probability I will go work -> work -> pub = 0.2 * 0.2 = 0.04
Probability I will go work -> pub -> pub = 0.2 * 0.5 = 0.1
Probability I will go work -> gym -> pub = 0.2 * 0.2 = 0.04
Overall probability = 0.04 + 0.04 + 0.1 = 0.18
There are a few reasons why this example is over-simplistic and perhaps would not be too accurate. Mostly these reasons relate to the fact that these states are not truly Markovian or memoryless. For example, at some point external forces will change the probabilities: the pub will close, you will become tired of being at the gym for so long, you need to leave the house to go to work the next day. In real-life there are also many other things you could do after work so the states covered in the matrix are not inclusive but hopefully this example demonstrates where Markov Chains may be of use.
Examples where we’ve used Markov Chains at Capital One:
Customer Behavior Modelling
As part of analyzing our customer base we classify according to account behaviors, for example customers who have used their credit card for a large transaction and are paying down the balance versus customers using their card for their weekly spend. We are able to look at the transitions between these different states to better understand the flow of the population and predict the future make-up of our portfolio.
One use of Markov Chains at Capital One is to help predict credit losses through default. The process involves a customer going through a series of steps until the credit agreement is defaulted. Each step in this process can be thought of as a state with probabilities created for each transition. By flowing in the current volumes within each state we can predict future volumes flowing through to default. This prediction helps ensure external loss predictions are accurate and we are able to adequately provision to cover these losses.
When would I use Markov Chains?
As with any technique related to data science Markov Chains are one of many approaches you could take to solve a business problem using large amounts of data. The key is being able pick and choose when to take Markov Chains off the shelf. At a high level: Markov Chains may help you with a prediction problem where you have clearly defined states and the transition between these states is memoryless (i.e. does not depend on the previous states beyond the current one).
Credits: Kevin Chisholm, David Robinson