Good evening, could someone please explain the formula in Sampling an Imbalance – what does each term mean? Many thanks
Hello Alex,
We're on our way to update the contents of the notebook. But here is the explanation meanwhile:
So, we're particularly talking about tick imbalance bars here. Which means we want to sample when there is an imbalance between upward and downward ticks. Meaning, that informed traders are making the price move in a given direction.
Now the simple formula to find the direction at each timestep is:
where each value of b in the array tells you the direction of tick movement.
Now once we have this array, which basically consists of 1s and -1s, we simply add them up to get the total imbalance. It is given by the formula:
This is a simple sum.
Now the question arises…when is the imbalance enough to sample and make a new bar out of a list of ticks?
Now, this can be done using the expected value of theta or total imbalance using historical data. If the current imbalance which is theta exceeds its own expected value we will cut-off and take all ticks since the last sample or cut-off and make a tick imbalance bar out of it.
Now, how can we find the expected value of theta? It is given by the following formula.
The above formula multiplies the expected number of ticks in each imbalance bar, given by:
and the difference of the unconditional probability of the two directions (up or down) a tick can take:
Since it is the case that:
That the probabilities add to 1, we can rewrite the formula as:
For the first part of the formula, we don't know the expected number of ticks in an imbalance bar so we'll initially assume a value which will later be corrected with data coming in. So how is this updated? In practice, we take an exponential weighted moving average or simple moving average over the number of ticks in the previous tick bars that we sampled. We can also keep this value static based on an educated guess as to how many ticks might be within a tick imbalance bar.
The second part of the formula which is the difference of the probabilities can be implemented as an exponential weighted moving average of b or direction value of all the tick data we have till now. This is like calculating the imbalance per tick. "On average, an individual tick is imbalanced by a certain amount"
When we multiply these values together we get the expected imbalance in an imbalance bar with an expected number of ticks and the average imbalance in each tick.
These two rolling values will be multiplied to get the expected imbalance at each timestep. This value will be checked against the total imbalance. If the total imbalance will exceed this expected value we will sample.
Hope this answers your question. Do get back if further assistance is needed.