So, I was reading Why You shouldn’t use lodash anymore and use pure JavaScript instead, because once upon a time, I shifted from Underscore to Lodash, and I'm always on the lookout for the bestest JavaScript stdlib. At the same time, there was recently an interesting conversation on Twitter about how some of React's functionality can be easily implemented in modern vanilla JS. The code that came out of that was elegant and impressive, and so I have taken that as a message to ask if we really need the framework.
Unfortunately, it didn't start out well. After copy-pasting the ~100 lines of code that Lodash executes to perform a
find, there was then this shocking claim:
.
To give you some perspective on these numbers, let's assume we're kicking around on your laptop, lazily executing at 2.0gHz. This puts you at 2,000,000,000 (2 billion) cycles per second, which is also known as 2,000,000 (2 million) cycles per millisecond. Therefore, to say that Lodash takes 140ms means that it is requiring 280,000,000 (280 million) cycles. That means that each of the ~100LoC that Lodash is executing are taking roughly 28 million cycles each to execute (on average).
Even given the inefficiency of JavaScript in the browser, that's crazy. Something else is going on.
The code they executed was a naive attempt at benchmarking, and it's a good case in point about how NOT to do benchmarking, so I'm taking the educational opportunity and laying out the critiques. The improved file is below.
So, it starts with the instantiation of the data. This code looks entirely innocuous:
const users = [
{ 'user': 'barney', 'age': 36, 'active': true },
{ 'user': 'fred', 'age': 40, 'active': false },
{ 'user': 'pebbles', 'age': 1, 'active': true }
];
However, when it comes to benchmarking, you've got a problem. It's entirely possible that the JIT will encounter this
declaration, see the opening brace, and then skip to the closing brace without ever parsing or lexing the contents.
All the lexer really needs to know is that users
is not null
/undefined
, is an array
, and does not use lexically
scoped variables, which can be determined in most cases without even parsing the contents of the array.
Everything else is a runtime detail.
Even if it is parsed, the reification of the parsed content into actual user-space objects may well be delayed until later: the source code itself for those objects may be held by the JIT, and only turned into bytecode executions on demand. This is exactly what JIT means: Just In Time.
So, when is that code in the middle going to be processed? When it's first called -- which, in this case, is while we are timing Lodash. So Lodash's time also (potentially, depending on your runtime) includes the time to parse, lex, and reify the object. Once we get to the native version, that's already done, so the native doesn't pay that cost.
To fix this, we need to ensure that we've exercised the test data, touching all the pieces that our timing tests will touch.
In both of the original tests, they start the timer. Once the timer starts, they time the following things:
- Time to return from
new Date()
after capturing the result of thegettime
system call (or equivalent). - Time to parse/lex/reify the code backing the test.
- Time to execute the test (what we want).
- Time to perform
console.log
the display the result of the test. - Time to instantiate another
new Date()
, including anothergettime
system call (or equivalent).
The issue is that what we want to test is really, really fast. Everything surrounding the test itself takes a lot of time, which means there is a lot of noise clouding up a very slight signal. If the time for this noise was consistent between the two executions, that'd arguably be okay -- at least it would be an apples-to-apples comparison -- but given the variance introduced by JIT, GC, and a system call, I don't have any confidence that the noise is consistent, and the noise could easily overwhelm the signal.
So what we need to do is boost the signal. The simplest way to do this is to simply increase the number of times that
we perform the operation. The amount that we have to boost the signal has to do with the amount of noise on your
particular environment, and the amount of memory you're willing to commit to it (keeping in mind that memory allocation
is yet another source of noise), so we'll make that a configuration const
that we can play with.
The last major concern about this test is hotspot-style optimizations. It's entirely possible that a runtime will spend additional effort optimizing bytecode which is executed often. This means that running a test once might only be showing you only the entirely unoptimized time. To address this, along with ensuring that you aren't measuring the time to parse/lex the implementation, we need to run the test a few times before we start the clock.
Good read. It's always nice when venturing into the comments on an article actually leads to something valuable.