Representing precise numeric quantities in python 3

Sergio_G · July 10, 2020, 12:53pm

Hello community,

what is your choice of data type for representing numeric quantities that must be precise and stay precise after mathematical operations and why?

My goal is to handle numeric values such as money amounts, order quantities and prices as precisely as possible. "Precise" here means "has the smallest deviation possible from the actual mathematical result"

Example of an issue when adding position sizes:

input: 0.7 + 0.6

python output: 1.2999999999999998

mathematical result: 1.3

If this small error keeps propagating after 1000s of operations, it leads to some differences. Example: 5 years backtest with 5000 trades, each with a small PnL error, could add up to a wrong backtest PnL. And not only PnL. Quantities, limits, entry/exit prices could all be wrong by some factor.

For a backtesting environment I am considering the Decimal class (https://docs.python.org/3/library/decimal.html) as simple floats have the above precision issues (https://docs.python.org/3/tutorial/floatingpoint.html).

For a test run with a broker and live market data I am unsure whether Decimal is sufficient.

One possibility for a live run could be a custom "Money" class for money quantities but it seems overkill for a backtest:

https://pypi.org/project/py-moneyed/ (Last release 2018)

https://code.google.com/archive/p/python-money/ (old/not maintained. Last release: 2011)

Plus: using such class for non-money quantities such as order quantity would be indequate design (e.g. an order quantity is not a money entity)

Further questions:

Should I stick with Decimal for backtesting for simplicity?
What about live trading?
Should I attempt to round (truncate?) the result of each operation (+,-,*,/,...) to N decimal digits after every operation?
Is this a problem at all or am I being too pedantic?

Any insight is welcome. Ideally from experience with live trading applications and real money.

Thanks and kind regards,
~s

Prodipta · July 11, 2020, 2:55pm

#4 is a good point. Python 'float' is of double precision, the error you quoted is of the order of 1E-16 I think. Even if you have an error creep of that order for every second for a 10 year backtest, they still does not come to significant digits. Live trading life-time will be much shorter than that. In real life accounts, it will never be more precise than cents - the broker will round it off (up or down, next update will reflect on the rounded value). So if you are really picky, you can model your accounts as integer (in cents, not in dollars) and do away with float precision problem. Or simply round it at each update to reflect real life scenario. On blueshift, we do the later. Using decimal is overkill in my opinion. Plus in general if you want fast calculation with floating points, at some point, you will use numpy, or somethiing similar libraries. They should be significantly slower with 'object' dtypes. There again you might store the data as integers, or even floats will be faster. The case where floating point precision becomes significant is if you use an equality test in the algo logic. This should be avoided. All equality tests should be done with integer data types and floating point should only be used for inequality test.

Sergio_G · July 11, 2020, 5:47pm

Hello Prodipta and thank you for your insights.

For a first version, which is focused on backtesting, I'll go for floats first.

A deviation of, say, 10 usd in the final PnL of a 100.000 USD strategy won't be an issue right now.

For future reference, here are some sources on comparing floats:

Thanks
~s