Replica expiration: Limiting temporal filters' resource requirements

Materialize provides a SQL interface to work with continually changing data. You type SQL queries, and we maintain the queries incrementally, offering fast access to results. If you’re used to stream processors, Materialize offers temporal filters to present a time-based window into the data.

Support for temporal filters isn’t new, but we recently addressed some concerns around their resource utilization. In this blog, I’ll explain what makes maintaining temporal filters expensive and how we mitigate some of the cost.

What are temporal filters?

I’ll refer to Frank’s excellent introduction to temporal filters, so here’s just a quick recap. Assuming you have data that encodes event time, you can use temporal filters to define a time-based window on top of it. Let’s define a table that encodes timestamped ticket sales.

sql

CREATE TABLE ticket_sales (
   content text,
   sale_ts mz_timestamp
);

The content describes a sales event, the sale_ts plays the role of timestamps, indicating the moment of the sale. This is just data, and Materialize will happily return all the contents when selecting from it. Note that I’m using the type mz_timestamp here, but we could use any other type that we can cast, too.

To make things more interesting, we ask Materialize to only return the sales that happen in the last 24 hours, using its logical time (ticket_sales is still empty, so it won’t return any data yet):

sql

SELECT content
FROM ticket_sales
WHERE mz_now() >= sale_ts
  AND mz_now() < (sale_ts::timestamp + '24 hours'::interval)::mz_timestamp;

This query will change its result over time, not just for changes to the ticket_sales table, but also as Materialize’s system clock advances.

The mz_now() function returns the logical time at which the statement was executed, and can be used in WHERE conditions to limit the working dataset of ad-hoc SELECT queries, (materialized) views, and indexes. This is how you define a temporal filter in Materialize!

Let’s take a closer look at how this all works under the hood.

What makes temporal filters tick?

I’ll make this more concrete by continuing with the above example. We’ll now insert some data into the ticket_sales table:

INSERT INTO ticket_sales VALUES ('hello', now()::mz_timestamp);

This inserts ticket sale with content value hello, along with its sales time timestamps (sale_ts). A select reveals the data:

SELECT content, sale_ts FROM ticket_sales;

`content`	`sale_ts`
hello	1736755846704

You might not find this particularly surprising as we’ve just inserted said data, but now we’ll define a temporal filter on this data:

SELECT content, sale_ts
FROM ticket_sales
WHERE mz_now() >= sale_ts
  AND mz_now() < (sale_ts::timestamp + '24 hours'::interval)::mz_timestamp;

`content`	`sale_ts`
hello	1736755846704

Unless you moved away from your computer for more than a day between the two queries, you should get the same result, because the current time is still within the sale_ts plus 24 hours bounds.

Next, let’s insert ticket sale in the future. It should not appear in the select with temporal filter because it’s too far out:

INSERT INTO ticket_sales VALUES ('future', (now() + '20 days')::mz_timestamp);

And repeating our select with a temporal filter still only produces the ticket sale with a content value of hello.

Materialize expresses data as updates in the form of (data, time, diff), representing an update at a time with a change in multiplicity. Think of the diff representing an insert (>0) or a delete (<0). The time represents the moment when Materialize learned about a change.

In the above example, the temporal filter turns data into a pair of updates, similar to the following table:

`mz_timestamp`	`mz_diff`	`content`	`sale_ts`
1736756302896	1	hello	2025-01-13 00:00:00+00
1736812800000	-1	hello	2025-01-13 00:00:00+00
1738454400000	1	future	2025-02-02 00:00:00+00
1738540800000	-1	future	2025-02-02 00:00:00+00

Some of the updates occur at the current time (the insertion of the hello sale event), but other occur in the future, such as the retraction of the hello sale event and the addition and retraction of the future sale event.

Each relation has a time that marks what data is definite, and when you query a relation, we will only reveal data up to this time. Data in the future can still change, which is why Materialize cannot surface it, i.e., a pending future addition could cancel an equivalent future retraction.

Expiring future updates

Materialize supports a concept of arrangements, which you can think of as an in-memory index of data spanning a time range, ending in the current time that’s known to be definite. Specifically, it only represents data that is definite. While we might know about future updates, we cannot insert them into the arrangement yet because they might not be definite.

Instead, the arrangement stages them in a separate area that we will scan whenever we learn that the time for definite data changed. At that point, Materialize scans the pending data, and extracts definite data while leaving indefinite future updates in place. This process is what’s causing resource utilization proportional to outstanding updates. We need to retain the updates in memory, and spend CPU cycles on periodically scanning the data.

This is where an optimization we recently implemented comes into play: For some objects, we know that their definite time roughly correlates with the wall-clock advancing. This implies that updates at a specific moment in the future can only become definite at that time. We plan restarts of Materialize environments ahead of time, which allows us to reason about which update might be revealed in an index, and which updates will definitely not be revealed until the next restart.

We determine a timestamp for each replica that is some time after the next scheduled restart, and simply discard updates in the future of this time stamp. We ensure correctness by preventing times in the future of the expiration time to become definite.

What objects support expiration?

The expiration feature assumes that the time of updates correlates to wall-clock time, which is true for a subset of objects supported in Materialize. If you create tables and sources, the feature applies transitively through downstream dependencies. We currently do not support load generators because their mapping of wall-clock time to definite times is specific to their implementation-specific.

Some objects are valid for all times and thus can be queried at all times (subscribes support AS OF/UP TO to carve out a user-supplied time range). We don’t enable expiration for such objects.

Experiencing it

We enabled the feature in all Materialize. We observed it dropped the overall memory utilization between 20% and 50%, with no impact on availability or correctness!

But this doesn’t mean we’ve fully solved the underlying problem. We still need to frequently revisit indefinite updates, only likely a much smaller amount. We have some ideas on how to solve the issue, but it is a fundamentally difficult problem because times are only partially ordered. Also, this feature is not yet supported for self-managed deployments due to its dependency on scheduling restarts.

Appendix: Showing updates for constant collections

Constant collections have the nice property that they’re defined for all times. Subscribing to a select over a constant collection allows us to see the updates Materialize applies over time (enable show diffs in the console):

sql

SUBSCRIBE TO (
  SELECT content, sale_ts::timestamptz
  FROM (
    VALUES
      (
        'hello',
        '3024-12-17'::mz_timestamp
      ),
      (
        'future',
        ('3024-12-17'::timestamp + '20 days')::mz_timestamp
      )
    ) AS ticket_sales(content, sale_ts)
  WHERE mz_now() >= sale_ts AND mz_now() < (sale_ts::timestamp + '24 hours'::interval)::mz_timestamp
);

This query results in similar updates to what I’m presenting above.

Replica expiration: Limiting temporal filters' resource requirements

Transform, Deliver, and Act

Related Posts You’ll Love

What are temporal filters?

What makes temporal filters tick?

Expiring future updates

What objects support expiration?

Experiencing it

Appendix: Showing updates for constant collections

More Articles

Real-Time Structured Data for RAG: Enrich prompts with live context

How to Simplify Microservices with a Shared Database and Materialized Views

The Making of Materialize Self-Managed: Flexible Deployments Explained

Get Started with Materialize