-
-
Save misyltoad/04f666b8a0a0a15f6ab133937f6e0db8 to your computer and use it in GitHub Desktop.
template <class _Ty> | |
_NODISCARD /* constexpr */ _Ty _Common_lerp(const _Ty _ArgA, const _Ty _ArgB, const _Ty _ArgT) noexcept { | |
// on a line intersecting {(0.0, _ArgA), (1.0, _ArgB)}, return the Y value for X == _ArgT | |
const int _Finite_mask = (int{isfinite(_ArgA)} << 2) | (int{isfinite(_ArgB)} << 1) | int{isfinite(_ArgT)}; | |
if (_Finite_mask == 0b111) { | |
// 99% case, put it first; this block comes from P0811R3 | |
if ((_ArgA <= 0 && _ArgB >= 0) || (_ArgA >= 0 && _ArgB <= 0)) { | |
// exact, monotonic, bounded, determinate, and (for _ArgA == _ArgB == 0) consistent: | |
return _ArgT * _ArgB + (1 - _ArgT) * _ArgA; | |
} | |
if (_ArgT == 1) { | |
// exact | |
return _ArgB; | |
} | |
// exact at _ArgT == 0, monotonic except near _ArgT == 1, bounded, determinate, and consistent: | |
const auto _Candidate = _ArgA + _ArgT * (_ArgB - _ArgA); | |
// monotonic near _ArgT == 1: | |
if ((_ArgT > 1) == (_ArgB > _ArgA)) { | |
if (_ArgB > _Candidate) { | |
return _ArgB; | |
} | |
} else { | |
if (_Candidate > _ArgB) { | |
return _ArgB; | |
} | |
} | |
return _Candidate; | |
} | |
if (isnan(_ArgA)) { | |
return _ArgA; | |
} | |
if (isnan(_ArgB)) { | |
return _ArgB; | |
} | |
if (isnan(_ArgT)) { | |
return _ArgT; | |
} | |
switch (_Finite_mask) { | |
case 0b000: | |
// All values are infinities | |
if (_ArgT >= 1) { | |
return _ArgB; | |
} | |
return _ArgA; | |
case 0b010: | |
case 0b100: | |
case 0b110: | |
// _ArgT is an infinity; return infinity in the "direction" of _ArgA and _ArgB | |
return _ArgT * (_ArgB - _ArgA); | |
case 0b001: | |
// Here _ArgA and _ArgB are infinities | |
if (_ArgA == _ArgB) { | |
// same sign, so T doesn't matter | |
return _ArgA; | |
} | |
// Opposite signs, choose the "infinity direction" according to T if it makes sense. | |
if (_ArgT <= 0) { | |
return _ArgA; | |
} | |
if (_ArgT >= 1) { | |
return _ArgB; | |
} | |
// Interpolating between infinities of opposite signs doesn't make sense, NaN | |
if constexpr (sizeof(_Ty) == sizeof(float)) { | |
return __builtin_nanf("0"); | |
} else { | |
return __builtin_nan("0"); | |
} | |
case 0b011: | |
// _ArgA is an infinity but _ArgB is not | |
if (_ArgT == 1) { | |
return _ArgB; | |
} | |
if (_ArgT < 1) { | |
// towards the infinity, return it | |
return _ArgA; | |
} | |
// away from the infinity | |
return -_ArgA; | |
case 0b101: | |
// _ArgA is finite and _ArgB is an infinity | |
if (_ArgT == 0) { | |
return _ArgA; | |
} | |
if (_ArgT > 0) { | |
// toward the infinity | |
return _ArgB; | |
} | |
return -_ArgB; | |
case 0b111: // impossible; handled in fast path | |
default: | |
_CSTD abort(); | |
} | |
} |
Even without AVX2, SSE2 is still able to do 4 floats within a few instructions. I don't see the "it's just a few instructions" as clearly it calls a few other functions (though with inlining this might be more preventable).
I do agree tho that msvc can't do anything about it since the standard defines this, so it's not up to the implementation.
Branches are still slower than not requiring branches for a few SSE or AVX instructions. Yes we do have good branch prediction but with spectre mitigations, they are slower.
My point is that 'safety over speed' is only applicable for a small percentage of people and people that see that std has lerp, will use that since they assume it is fast.
My main point was bringing up the "99% case". It's done in a few instructions, the branches before that are negligible for performance.
The compiler will generally know what it's doing, more so than handwritten asm. Looks fine to me.
Also, It should be "fast enough". But do your own benchmarks if you think its slow. If someone truly cared about performance they wouldn't be using the STL all over the place. But in a modern application, it's still fast enough.
Compare with original STL which does exactly what you would think:
https://github.com/justinmeiners/sgi-stl/blob/master/stl_numeric.h
Compare with original STL which does exactly what you would think:
https://github.com/justinmeiners/sgi-stl/blob/master/stl_numeric.h
What does that do? I don't even see a lerp function in there? Compare this gist to what, exactly?
Compare with original STL which does exactly what you would think:
https://github.com/justinmeiners/sgi-stl/blob/master/stl_numeric.hWhat does that do? I don't even see a lerp function in there? Compare this gist to what, exactly?
You're right. Lerp is new in the latest C++ revision, and so is not included here.
However, if you look at the functions that you would expect to be simple (inner_product
, etc) they are implemented
in a straightforward manner without a fixation on generality at all costs, in every strange case.
The particular file is just one example, other parts of the repo are worth looking at.
functions that you would expect to be simple (inner_product, etc)
inner_product is not a function where the "simple" implementation produces incorrect answers.
functions that you would expect to be simple (inner_product, etc)
inner_product is not a function where the "simple" implementation produces incorrect answers.
You don't think we could find inputs that would be incorrect? We would have to seek them out, but that's what has been done for lerp.
The end result of anticipating every possible misuse is it just won't be useful for anyone. Any implementation is a compromise compared to its theoretical design. As other posters have mentioned, who is this lerp
for?
Let's assume you're right though. inner_product
was just one example. I shared the link to show that the STL used to be extremely simple. It's doubtful ANY of the functions or data structures you find in the original STL are even close to as long or complex as the current standard library has made them. Can you find any of them which has become more simple in any major implementation?
The priorities are (1) correctness, (2) performance, (3) simplicity. The result of that is that simplicity goes down over time in favor of correctness or perf.
lerp
got standardized precisely because a correct implementation is not simple. All the normal floating point rules already do the correct thing for inner_product (e.g. always take the leftmost NaN, raise the right floating point exceptions, don't accumulate rounding errors within the algorithm, etc.).
If you want the 'simple' one then go for it. The library function doesn't exist for you.
Correctness is relative to a domain. You can often get all three (correctness, performance, and simplicity) by constraining your domain to more reasonable inputs to avoid pathological results. For example, some may say code which is not threadsafe is not correct. You can resolve this by throwing locks on everything, or you can just declare that accessing it from multiple threads is incorrect. Could we accomplish the same with Lerp by specifying a reasonable domain instead?
If someone truly cared about performance they wouldn't be using the STL all over the place
Ah yes, great, if I cared about performance, I'd instead be reinventing my own wheels instead of standing on shoulders of giants, of course I would do that, such an amazing idea, surely one that works out just great for many people...
If someone truly cared about performance they wouldn't be using the STL all over the place
Ah yes, great, if I cared about performance, I'd instead be reinventing my own wheels instead of standing on shoulders of giants, of course I would do that, such an amazing idea, surely one that works out just great for many people...
This is more common than you think in high performance applications in the professional workspace. Look at Rad Game Tools for example.
But, I digress… I too enjoy using the STL as much as I can.
You can take your trolling elsewhere though, with all the sarcasm and such.
beautiful
First off, It depends on the compiler. You keep saying it's 2 instructions for AVX2 when that only depends on the compiler and target CPU.
MSVC by default compiles to SSE2 for x86-64 and usually won't use special instructions unless you specify AVX/AVX2. Basically; Compatibility over speed. There's nothing wrong in the snippit of asm you posted, go look at the label that has the "99% case" in it, it's literally ~5 SSE/2 instructions for a simple linear interpolation.
Also, go take your ideas to the people who write the STL, not the people who implement it.
And as a final note, it's not slow. A branch isn't slow on modern CPUs (since you argue the func should be using AVX instructions) since we have good branch predictors these days.
The function is going for safety over speed. Want a "faster" func? Write it yourself.