$ 2^{bit_precision} = 10^{decimal_digits} $ $ bit_precision = log_{2}(10) {decimal_digits} $ $ decimal_digits = log_{10}(2) {bit_precision}$
Type | Sign | Exponent | Significand | Total | Exponent Bias | Bits Precision | Number of Decimal Digits | Wiki |
---|---|---|---|---|---|---|---|---|
f16 Half (IEEE 754-2008) | 1 | 5 | 10 (11) | 16 | 15 | 11 | ~3.3 | https://en.wikipedia.org/wiki/Half-precision_floating-point_format |
f32 Single | 1 | 8 | 23 (24) | 32 | 127 | 24 | ~7.2 | https://en.wikipedia.org/wiki/Single-precision_floating-point_format |
f64 Double | 1 | 11 | 52 (53) | 64 | 1023 | 53 | ~15.9 | https://en.wikipedia.org/wiki/Double-precision_floating-point_format |
x86 extended precision | 1 | 15 | 64 | 80 | 16383 | 64 | ~19.2 | Don't care legacy anyway. |
f128 Quad | 1 | 15 | 112 (113) | 128 | 16383 | 113 | ~34.0 | https://en.wikipedia.org/wiki/Quadruple-precision_floating-point_format |
f256 Quad | 1 | 19 | 236 (237) | 256 | 262143 | 237 | ~71.3 | https://en.wikipedia.org/wiki/Octuple-precision_floating-point_format |
bfloat16 | 1 | 8 | 7 (8) | 16 | 127 | 8 | ~2.4 | https://en.wikipedia.org/wiki/Bfloat16_floating-point_format |
Type | Sign | Combination | Significand continuation | Total | Exponent Bias | Bits Precision | Number of Decimal Digits | Wiki |
---|---|---|---|---|---|---|---|---|
decimal64 | 1 | 13 | 50 | 64 | ? | ~53 | 16 | https://en.wikipedia.org/wiki/Decimal64_floating-point_format |
decimal128 | 1 | 17 | 110 | 128 | ? | ~112 | 34 | https://en.wikipedia.org/wiki/Decimal128_floating-point_format |