important
Since the writing of this blog post, recursive CTEs have been stabilized and are generally available to all Materialize users. Ready to dive in? Sign up for a free trial.
Materialize excels at incrementally maintaining up-to-date results of SQL queries as source data changes over time. Today, we introduce support for recursive SQL queries, allowing you to express and run iterative computations that are maintained incrementally.
SQL’99 introduced the very useful common table expressions (CTEs).
A common table expression allows you to use the WITH
clause to name some expressions and then use them multiple times in your query, without resorting to copy/paste.
Recursive CTEs go a step further and allow you to write queries with CTEs that reference themselves, allowing you to formulate queries that perform the same computation until convergence.
Many important problems require iterative computation and can be handled entirely in the database instead of the application layer with the help of recursive CTEs. Let’s look at an example of what we are releasing today and discuss when it might be useful to you.
Recursive CTEs in Materialize
The WITH MUTUALLY RECURSIVE
flavor of recursive CTEs that is now publicly available in Materialize remains unchanged from our original proposal.
The blog post explains why and how we decided to depart from the SQL’99 standard.
To illustrate how recursive CTEs work in Materialize, consider the following high-level definition of the transitive closure T(x int, y int)
of a binary relation R(x int, y int)
:
- For every combination of values
(x,y)
: if(x,y)
is inR
, then(x,y)
must be also inT
(base case). - For every combination of values
(x,y,z)
: if(x,y)
and(y,z)
are both inT
, then(x,z)
must be also inT
(recursive case).
Translating this definition into a recursive SQL query in Materialize is a straight-forward mapping where each case corresponds to exactly one UNION
branch in the recursive definition of T
:
WITH MUTUALLY RECURSIVE
T(x int, y int) AS (
SELECT x, y FROM R
UNION
SELECT x, z FROM T as t1(x, y) JOIN T as t2(y, z) USING(y)
)
SELECT * FROM T
Recursive CTEs in Materialize are evaluated as if the CTE contents are fully replaced by the result of the CTE definition in every iteration until they stop changing. As a developer you can just assume this intuitive “bulk update” mental model when you formulate your queries. As usual, under the hood Materialize will maintain the computation incrementally — even across iterative updates of your recursive CTE definitions: we will do the heavy-lifting for you as we compile and execute your SQL queries.
Unlike SQL’99 WITH RECURSIVE
queries, WITH MUTUALLY RECURSIVE
accepts any valid SELECT
query as a recursive CTE definition.
For example, the above query is not accepted by Postgres because the recursive CTE T
is referenced twice in the recursive case.
A compatible formulation is possible, but might take longer to converge.
Expressing a recursive CTE that is compliant with the SQL’99 standard is often a challenge.
Because WITH MUTUALLY RECURSIVE
does not impose syntactic constraints on your recursive CTEs, this process is much easier in Materialize.
All details about the syntax and semantics of recursive WITH MUTUALLY RECURSIVE
blocks as well as some practical examples can be found in the reference docs.
Adoption path
Now that you’ve learned about the distinct flavor of recursive CTEs that Materialize offers, you might have some practical questions before you give them a shot.
How to identify use cases for recursive CTEs?
Use cases for recursive CTEs usually arise naturally in situations where the following conditions are met:
- Your data can be viewed or organized as a network (a graph) or as a hierarchy (a tree).
- You want to compute information that explains your data within the context of the above structure.
- Your business can benefit from having always-fresh, accurate, and consistent results of the above computation.
How to write recursive CTEs?
WITH MUTUALLY RECURSIVE
definitions can be freely composed:
you can chain them sequentially or even nest them in each other — much like the loop constructs of any other programming language!
As a consequence, algorithms defined in terms of one or more (possibly nested) loops that update collections of data points until convergence can be mapped almost directly to WITH MUTUALLY RECURSIVE
blocks and recursive CTEs.
The original blog post and the examples in our reference docs provides further examples and guidance on writing and debugging recursive CTEs in Materialize.
How are results maintained incrementally?
What is the secret sauce that makes recursive CTEs tick incrementally in Materialize? The answer is — it’s the very same secret sauce that we use elsewhere! The Materialize runtime is built on top of Differential Dataflow — a programming framework that allows users to incrementally maintain computations changing both over time and over multiple iterations. Differential Dataflow has always been capable of efficient incremental maintenance of iterative dataflows — recursive CTEs are the mechanism by which we expose this power to Materialize users!
Summary
Recursive CTEs allow you to express iterative computations using the WITH MUTUALLY RECURSIVE
CTE syntax.
Materialize maintains the results of these computations incrementally, keeping your results up to date as data comes in from your source systems.
Check out our reference docs on recursive CTEs if you are interested in getting started and trying them out yourself.