-
-
Save misyltoad/04f666b8a0a0a15f6ab133937f6e0db8 to your computer and use it in GitHub Desktop.
template <class _Ty> | |
_NODISCARD /* constexpr */ _Ty _Common_lerp(const _Ty _ArgA, const _Ty _ArgB, const _Ty _ArgT) noexcept { | |
// on a line intersecting {(0.0, _ArgA), (1.0, _ArgB)}, return the Y value for X == _ArgT | |
const int _Finite_mask = (int{isfinite(_ArgA)} << 2) | (int{isfinite(_ArgB)} << 1) | int{isfinite(_ArgT)}; | |
if (_Finite_mask == 0b111) { | |
// 99% case, put it first; this block comes from P0811R3 | |
if ((_ArgA <= 0 && _ArgB >= 0) || (_ArgA >= 0 && _ArgB <= 0)) { | |
// exact, monotonic, bounded, determinate, and (for _ArgA == _ArgB == 0) consistent: | |
return _ArgT * _ArgB + (1 - _ArgT) * _ArgA; | |
} | |
if (_ArgT == 1) { | |
// exact | |
return _ArgB; | |
} | |
// exact at _ArgT == 0, monotonic except near _ArgT == 1, bounded, determinate, and consistent: | |
const auto _Candidate = _ArgA + _ArgT * (_ArgB - _ArgA); | |
// monotonic near _ArgT == 1: | |
if ((_ArgT > 1) == (_ArgB > _ArgA)) { | |
if (_ArgB > _Candidate) { | |
return _ArgB; | |
} | |
} else { | |
if (_Candidate > _ArgB) { | |
return _ArgB; | |
} | |
} | |
return _Candidate; | |
} | |
if (isnan(_ArgA)) { | |
return _ArgA; | |
} | |
if (isnan(_ArgB)) { | |
return _ArgB; | |
} | |
if (isnan(_ArgT)) { | |
return _ArgT; | |
} | |
switch (_Finite_mask) { | |
case 0b000: | |
// All values are infinities | |
if (_ArgT >= 1) { | |
return _ArgB; | |
} | |
return _ArgA; | |
case 0b010: | |
case 0b100: | |
case 0b110: | |
// _ArgT is an infinity; return infinity in the "direction" of _ArgA and _ArgB | |
return _ArgT * (_ArgB - _ArgA); | |
case 0b001: | |
// Here _ArgA and _ArgB are infinities | |
if (_ArgA == _ArgB) { | |
// same sign, so T doesn't matter | |
return _ArgA; | |
} | |
// Opposite signs, choose the "infinity direction" according to T if it makes sense. | |
if (_ArgT <= 0) { | |
return _ArgA; | |
} | |
if (_ArgT >= 1) { | |
return _ArgB; | |
} | |
// Interpolating between infinities of opposite signs doesn't make sense, NaN | |
if constexpr (sizeof(_Ty) == sizeof(float)) { | |
return __builtin_nanf("0"); | |
} else { | |
return __builtin_nan("0"); | |
} | |
case 0b011: | |
// _ArgA is an infinity but _ArgB is not | |
if (_ArgT == 1) { | |
return _ArgB; | |
} | |
if (_ArgT < 1) { | |
// towards the infinity, return it | |
return _ArgA; | |
} | |
// away from the infinity | |
return -_ArgA; | |
case 0b101: | |
// _ArgA is finite and _ArgB is an infinity | |
if (_ArgT == 0) { | |
return _ArgA; | |
} | |
if (_ArgT > 0) { | |
// toward the infinity | |
return _ArgB; | |
} | |
return -_ArgB; | |
case 0b111: // impossible; handled in fast path | |
default: | |
_CSTD abort(); | |
} | |
} |
Honestly, may as well just toss this entire gist up as a failed trolling attempt. Good try though.
Please view the asm this generates compared to a simpler lerp (or even the one from libc++) It's pretty insane given how common lerp can be [and how common in fast paths] in some codebases.
The C++ std::lerp is defined in a way such that it can be implemented without finite checks like this if you order things right.
You're incorrect.
Read "26.8.4 [c.math.lerp]":
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/n4849.pdf
Constexpr is obviously still missing, but that's why its a work in progress.
Please view the asm this generates compared to a simpler lerp (or even the one from libc++) It's pretty insane given how common lerp can be [and how common in fast paths] in some codebases.
What does this even mean? The first few branches to std::isfinite is a sanity check which the standard mandates and the first if statement produces identical asm across compilers.
The other cases (like the switch cases) are for handling edge-cases and are also required according to the latest standard.
Once again, unless you have some valid issues in the snippit of code you provided then don't bother replying. Even then, if you do have an issue then go create an issue on the MSVC STL github repo -> https://github.com/microsoft/STL
Or fix it yourself.
The problem, as pointed out by the title of this gist, is that the C++ standard tends to push implementers into creating those monstrosities. Thus it’s probably not even fully fixable, the only fix is not avoid using those functions (or fork C++ back into sanity).
Complaining about speed and safety? Seems odd to me. I can't defend your point or the creator of the gists. Their implementation is early but is objectively fast and safe.
The point is it's only "speed and safety" given the context of the C++ standards committee's "standards".
What does this even mean? The first few branches to std::isfinite is a sanity check which the standard mandates
The C++ standard dictates what the return values should be for given inputs and types of inputs, not the implementation.
Implementing this genuinely with isnan and isfinite is incredibly naïve, as I assure you it can be done by just understanding the quirks of NaN/INF and how comparisons with these types work. (NaN != NaN, comparison ordering, etc)
This also brings up the point that we still don't have a singular standardized STL implementation like other languages std libs which leads this to happen.
You're making yourself look dumber with each post. "NaN != NaN" is compiler-specific and setting-specific (fast floating point math). Your point means nothing.
This is the MSVC implementation -- therefore it can adhere to the floating point standard it implements, which is IEEE 754 (not like anyone implements anything else anyway.)
As for /fp:fast, nobody using that will give a shit about these checks if they're already saying they don't give a shit about them by using that flag.
Alternatively though, they could ifdef this for that case, but that would suck and go against what /fp:fast is for.
I would very much like to have a simpler implementation that handles all of the corner cases described in microsoft/STL#65 .
Should the Standard provide a separate function for linear interpolation that explicitly doesn't attempt to handle corner cases and can be directly implemented with the obvious expression? Maybe, but that's a question for the Library Evolution Working Group - our job as STL implementers is to Do What The Standard Says even when the Standard says (indirectly) to do something very complicated.
@StephanTLavavej
Yep, so it's like I thought. It's written about as good as you are going to get given the confines of the standards, so the problem is not with the implementation, but the standard.
problem is not with the implementation, but the standard
I don't even see this as a problem with the standard. Yes, this implementation is long. But most of the time you fall into the 'inputs are finite' case.
NaN != NaN
Unfortunately, not in some modes for some of the compilers we support.
As for /fp:fast, nobody using that will give a shit about these checks if they're already saying they don't give a shit about them by using that flag.
I disagree. There is value in allowing the compiler to make inferences like x==x
is true without removing explicit isnan
tests. /fp:fast
means 'I don't care about every last ULP' which allows floating point loops to be vectorized or operations to be fused; not making NaNs infectious is a substantially bigger change than that.
I assure you it can be done by just understanding the quirks of NaN/INF and how comparisons with these types work
I don't believe passing our current battery of tests is possible without the explicit tests, though mostly due to infinites rather than NaNs, because almost everything turns into infinity*0 => NaN. However, since we shipped that implementation the authors of P0811 indicated that they desire different behavior for infinites which may allow removal of the explicit tests (see issue linked below).
If anyone is interested in actual product change I recommend continuing over in microsoft/STL#65. If you do, please be mindful of our Code of Conduct and refrain from personal attacks. Saying 'XYZ is broken' is not helpful. Saying 'XYZ should work like this' is.
The only reasonable change is to remove it, it shouldn't be in stl, 90% of people don't need such a complex lerp function and if they do they should implement it themselves.
The only reasonable change is to remove it, it shouldn't be in stl, 90% of people don't need such a complex lerp function and if they do they should implement it themselves.
It’s not complex and does it’s job. You fall into the 99% case usually and it’s just a basic lerp function. If you think it’s complicated then this function is not for you.
You should go complain to the people who write the standard, not the people who implement it.
It is complex for a lerp function; one that's literally just a * perc + b * (1 - perc); which is literally 2 instructions for AVX2 (and that's literally 8 floats instead of just 1). This shouldn't be in stl. The only real reason I could think of is if your input isn't defined, but in 99% of the cases, they are; and that 1% shouldn't be catered to. And then you'll just have unnecessary branches and checks in a very simple function. Putting this into the standard makes it so everyone will use it (since they will assume that std is fast); lerp is commonly used and this WILL be a bottleneck if called very often.
https://gist.github.com/Joshua-Ashton/1f25704b6ccb989c03c1dd60757a6044 It shouldn't generate this, it's literally 2 asm instructions, that's it.
It is complex for a lerp function; one that's literally just a * perc + b * (1 - perc); which is literally 2 instructions for AVX2 (and that's literally 8 floats instead of just 1). This shouldn't be in stl. The only real reason I could think of is if your input isn't defined, but in 99% of the cases, they are; and that 1% shouldn't be catered to. And then you'll just have unnecessary branches and checks in a very simple function. Putting this into the standard makes it so everyone will use it (since they will assume that std is fast); lerp is commonly used and this WILL be a bottleneck if called very often.
https://gist.github.com/Joshua-Ashton/1f25704b6ccb989c03c1dd60757a6044 It shouldn't generate this, it's literally 2 asm instructions, that's it.
First off, It depends on the compiler. You keep saying it's 2 instructions for AVX2 when that only depends on the compiler and target CPU.
MSVC by default compiles to SSE2 for x86-64 and usually won't use special instructions unless you specify AVX/AVX2. Basically; Compatibility over speed. There's nothing wrong in the snippit of asm you posted, go look at the label that has the "99% case" in it, it's literally ~5 SSE/2 instructions for a simple linear interpolation.
Also, go take your ideas to the people who write the STL, not the people who implement it.
And as a final note, it's not slow. A branch isn't slow on modern CPUs (since you argue the func should be using AVX instructions) since we have good branch predictors these days.
The function is going for safety over speed. Want a "faster" func? Write it yourself.
Even without AVX2, SSE2 is still able to do 4 floats within a few instructions. I don't see the "it's just a few instructions" as clearly it calls a few other functions (though with inlining this might be more preventable).
I do agree tho that msvc can't do anything about it since the standard defines this, so it's not up to the implementation.
Branches are still slower than not requiring branches for a few SSE or AVX instructions. Yes we do have good branch prediction but with spectre mitigations, they are slower.
My point is that 'safety over speed' is only applicable for a small percentage of people and people that see that std has lerp, will use that since they assume it is fast.
My main point was bringing up the "99% case". It's done in a few instructions, the branches before that are negligible for performance.
The compiler will generally know what it's doing, more so than handwritten asm. Looks fine to me.
Also, It should be "fast enough". But do your own benchmarks if you think its slow. If someone truly cared about performance they wouldn't be using the STL all over the place. But in a modern application, it's still fast enough.
Compare with original STL which does exactly what you would think:
https://github.com/justinmeiners/sgi-stl/blob/master/stl_numeric.h
Compare with original STL which does exactly what you would think:
https://github.com/justinmeiners/sgi-stl/blob/master/stl_numeric.h
What does that do? I don't even see a lerp function in there? Compare this gist to what, exactly?
Compare with original STL which does exactly what you would think:
https://github.com/justinmeiners/sgi-stl/blob/master/stl_numeric.hWhat does that do? I don't even see a lerp function in there? Compare this gist to what, exactly?
You're right. Lerp is new in the latest C++ revision, and so is not included here.
However, if you look at the functions that you would expect to be simple (inner_product
, etc) they are implemented
in a straightforward manner without a fixation on generality at all costs, in every strange case.
The particular file is just one example, other parts of the repo are worth looking at.
functions that you would expect to be simple (inner_product, etc)
inner_product is not a function where the "simple" implementation produces incorrect answers.
functions that you would expect to be simple (inner_product, etc)
inner_product is not a function where the "simple" implementation produces incorrect answers.
You don't think we could find inputs that would be incorrect? We would have to seek them out, but that's what has been done for lerp.
The end result of anticipating every possible misuse is it just won't be useful for anyone. Any implementation is a compromise compared to its theoretical design. As other posters have mentioned, who is this lerp
for?
Let's assume you're right though. inner_product
was just one example. I shared the link to show that the STL used to be extremely simple. It's doubtful ANY of the functions or data structures you find in the original STL are even close to as long or complex as the current standard library has made them. Can you find any of them which has become more simple in any major implementation?
The priorities are (1) correctness, (2) performance, (3) simplicity. The result of that is that simplicity goes down over time in favor of correctness or perf.
lerp
got standardized precisely because a correct implementation is not simple. All the normal floating point rules already do the correct thing for inner_product (e.g. always take the leftmost NaN, raise the right floating point exceptions, don't accumulate rounding errors within the algorithm, etc.).
If you want the 'simple' one then go for it. The library function doesn't exist for you.
Correctness is relative to a domain. You can often get all three (correctness, performance, and simplicity) by constraining your domain to more reasonable inputs to avoid pathological results. For example, some may say code which is not threadsafe is not correct. You can resolve this by throwing locks on everything, or you can just declare that accessing it from multiple threads is incorrect. Could we accomplish the same with Lerp by specifying a reasonable domain instead?
If someone truly cared about performance they wouldn't be using the STL all over the place
Ah yes, great, if I cared about performance, I'd instead be reinventing my own wheels instead of standing on shoulders of giants, of course I would do that, such an amazing idea, surely one that works out just great for many people...
If someone truly cared about performance they wouldn't be using the STL all over the place
Ah yes, great, if I cared about performance, I'd instead be reinventing my own wheels instead of standing on shoulders of giants, of course I would do that, such an amazing idea, surely one that works out just great for many people...
This is more common than you think in high performance applications in the professional workspace. Look at Rad Game Tools for example.
But, I digress… I too enjoy using the STL as much as I can.
You can take your trolling elsewhere though, with all the sarcasm and such.
beautiful
The C++ std::lerp is defined in a way such that it can be implemented without finite checks like this if you order things right.