Datacenter applications expect microsecond-scale service times and tightly bound tail latency, with future workloads expected to be even more demanding. To address this challenge, state-of-the-art runtimes employ theoretically optimal scheduling policies, namely a single request queue and strict preemption.

We present Concord, a runtime that demonstrates how forgoing this design—while still closely approximating it—– enables a significant improvement in application throughput while maintaining tight tail-latency SLOs. We evaluate Concord on microbenchmarks and Google’s LevelDB key value store; compared to the state of the art, Concord improves application throughput by up to 52% on microbenchmarks and by up to 83% on LevelDB, while meeting the same tail-latency SLOs. Unlike the state of the art, Concord is application agnostic and does not rely on the non-standard use of hardware, which makes it immediately deployable in the public cloud.

Concord is publicly available at https://github.com/dslab-epfl/concord.

Paper: Achieving Microsecond-Scale Tail Latency Efficiently with Approximate Optimal Scheduling. Rishabh Iyer, Musa Unal, Marios Kogias, George Candea. ACM Symposium on Operating Systems Principles (SOSP), Koblenz, Germany, October 2023.