Specifically, customer has a server that suffers from unecessary internal congestion when under load. By enforcing concurrency limits, this middleware sidesteps thread-based problems (starvation, context-switching) and improves overall performance.
- greatly increases performance in certain scenarios
- low or no configuration
- works despite hardware differences
- works despite workload differences
- queueing helps only specific scenarios, but helps massively
- customers should test throughput under load, ideally on a real server; then make an informed decision
- AVOID: customers refreshing on localhost and saying "yeah it doesn't feel faster"
- AVOID: customers adding this to every project because "hey it seems useful"
- queueing a request allocates ~1kb
- this adds up, resulting in slow GC calls
- on the plaintext benchmark, queuing results in
- 4% throughput overhead
- 22% memory overhead
- this fix eliminates most of the memory overhead
- the LIFO strategy is a clear improvement (under load: same throughput, 20x better latency)
- however, LIFO can result in degenerate cases where one request gets "bobbled"
- the fix (that Facebook uses) is running a FIFO "sliplane" to guarantee timely entry when load is low
- reduces request variance
- not sure of the numbers, this would require research and scenario development
- make sure to look at latency bell curves (and uncompleted requests), not just average
- rps varies wildly with
MaxConcurrentRequests
; setting a proper level is crucial - ideally middleware adjusts automatically, with little to no user input
- otherwise, customers must be educated and encouraged to tweak it themselves
- within reasonable
MaxConcurrentRequest
values, throughput can vary up to 2.5x on the same scenario - makes it easier for customers to test if queueing actually helps their website
- can this middleware improve real servers, not just test scenarios?
- what ease of use problems do customers actually hit?
- may be difficult to find a willing customer with relevant problems
- still useful to prove negatives (ie that it won't help in some specific case)
- can be worked on in parallel with other projects