Alpha 57 as implemented does not match the definition in the paper

Course Name: Trading Alphas: Mining, Optimisation, and System Design, Section No: 15, Unit No: 6, Unit type: Notebook

The paper
https://arxiv.org/pdf/1601.00991
states alpha 57 to be:
Alpha#57: (0 - (1 * ((close - vwap) / decay_linear(rank(ts_argmax(close, 30)), 2))))

I note the 0 is superfluous and we can simplify it slightly to:
Alpha#57: - 1 * ((close - vwap) / decay_linear(rank(ts_argmax(close, 30)), 2)))

The paper defines the functions in the denominator as follows:

rank(x) = cross-sectional rank

ts_argmax(x, d) = which day ts_max(x, d) occurred on

decay_linear(x, d) = weighted moving average over the past d days with linearly decaying weights d, d – 1, …, 1 (rescaled to sum up to 1)

Now, the denominator is the difficult bit to code up.
The implementation in the Jupyter notebook suggests:

denom = data.Close.rolling(30).apply(np.argmax).rank(axis=1).rolling(
2).apply(lambda x: np.average(x, weights=np.linspace(0, 1, 2)))

But
.rank(axis=1)
returns ranks in 1…N.

In the 101 Alphas, rank(x) is, I believe, interpreted as a cross sectional rank [scaled to 0…1; percentile style)]. In Pandas that is:
.rank(axis=1, pct=True, method=‘average’)

If we do not use pct=True, the denominator scale changes with universe size, which changes the alpha magnitude mechanically.

Also, the paper definition for decay_linear(x, 2) is weights 2, 1 (today gets 2, yesterday gets 1), rescaled to sum to 1.
np.linspace(0, 1, 2) # [0, 1]

That means:
• yesterday weight 0
• today weight 1
So the code, as is, effectively suggests no 2 day decay, and instead just takes today’s value, which is wrong.

Moreover, ts_argmax returns positions 0…window-1, instead of 1…window, which is probably condonable, but, still I wish to flag it.

Could someone, please verify these observations, and implement the correct interpretation of the paper in numpy and, ideally pandas?

Hi,

We are looking into this and will get back to you.

Thanks.

Hi,

We have received the following reply from the author,

First, it is important to understand that the alphas presented in the paper are created with a random generator. So they do not necessarily “make sense”. It is, in fact, exceedingly difficult to consistently produce sensible random alphas. This is why we see issues such as " decay(x,2)".
With regards to ranking, the paper does not explicitly define the ranking scheme as much as I’m aware. Therefore, it is up to the reader to decide which scheme works best for their purpose. The notebook is merely an example which should be adjusted to each specific use case. As there is a wide range of different possible ranking schemes, we leave it to the student to find the most appropriate for their asset class, risk profile and trading style.

Hope this helps.

Hello,

Thank you for the previous reply. However, there are still two implementation points that remain unclear.

  1. decay_linear(x, 2)
    In the paper, this is defined as a linearly weighted moving average with weights d, d-1, …, 1, rescaled to sum to 1.
    For d = 2, that implies weights [2, 1] normalized to [2/3, 1/3], so:
    decay_linear(x, 2)t = (2 x_t + 1 x{t-1}) / 3

However, using
np.average(x, weights=np.linspace(0, 1, 2))
produces weights [0, 1], which assigns 0% to x_{t-1} and 100% to x_t. That is not smoothing and does not match the paper’s definition.

Am I correct that the implementation should instead explicitly use weights proportional to [1, 2] (or [2, 1] depending on ordering)?

  1. rank(x)

The paper defines rank(x) as cross sectional rank, but does not specify the tie handling convention.

Pandas offers several methods:
method = ‘average’, ‘min’, ‘max’, ‘first’, ‘dense’

Should rank(x) be interpreted as percentile rank with average tie handling (i.e. pct=True, method='average')?

My concern is purely about semantic consistency with the operator definitions in the paper. The choice of ranking scheme or decay weights should not depend on asset class or trading style, but on faithfully reproducing the defined signal.

I would appreciate clarification on the intended operator conventions.

Given that there are 101 alphas and several operator definitions that are not fully specified in the paper, could you clarify the appropriate channel for implementation questions of this type going forward?

Hi,

We will be working on this query and get back to you.

Thanks.

We have received the following reply from the author:

As I don’t have direct contact to the authors of the paper I am not in a position to directly infer their implementation details. The paper states that the formulation should serve as inspiration for further work rather than an exact guideline for replication. Likewise, the course implementation can only be a guideline rather than an exact replication. In any case, the paper neither mentions the assets traded with this not time frames. Furthermore, it states that only SOME of the formulations have actually been used. I recommend the student conducting independent research to find the best configuration for themselves.