Modern stream processing engines (SPEs) process large volumes of events propagated at high velocity through multiple queries. To improve performance, existing SPEs generally aim to minimize query output latency by minimizing, in turn, the propagation delay of events in query pipelines. However, for queries containing commonly used blocking operators such as windows, this scheduling approach can be inefficient. Watermarks are events popularly utilized by SPEs to correctly process window operators. Watermarks are injected into the stream to signify that no events preceding their timestamp should be further expected. Through the design and development of Klink, we leverage these watermarks to robustly infer stream progress based on window deadlines and network delay, and to schedule query pipeline execution that reflects stream progress. Klink aims to unblock window operators and to rapidly propagate events to output operators while performing judicious memory management. We integrate Klink into the popular open source SPE Apache Flink and demonstrate that Klink delivers significant performance gains over existing scheduling policies on benchmark workloads for both scale-up and scale-out deployments.
Dettaglio pubblicazione
2021, In Proceedings of the 2021 International Conference on Management of Data (SIGMOD/PODS '21), Pages 485-498
Klink: Progress-Aware Scheduling for Streaming Data Systems (04b Atto di convegno in volume)
Farhat Omar, Daudjee Khuzaima, Querzoni Leonardo
ISBN: 9781450383431
Gruppo di ricerca: Distributed Systems