Stream processing frameworks provide programming abstractions that allow their users to express the desired time-dependent data analysis. The frameworks organize the computation as a directed graph of interconnected operators that perform event-wise transformations. We tackle the correctness question for programs expressed in this way. To this end, we model (possibly stateful) operators and define their composition, model data streams with time-stamps and watermarks, define reusable, modular operators, and prove their correctness in the Isabelle/HOL proof assistant, taking advantage of its advanced coinductive methods infrastructure. We demonstrate the usefulness of our model by verifying stream processing algorithms computing incremental histograms and relational joins.
Paper draft