We’ll be in attendance at dbt’s upcoming Coalesce 2024 conference next week, and we look forward to seeing you there! Our very own Steffen Hausmann — Field Engineer at Materialize — will speak with Wolf Rendall — Director of Data Products at Vontive — about how to build real-time loan underwriting systems with dbt.
In the Vontive use case, the team needed to port over data models from PostgreSQL into Materialize in order to power real-time loan underwriting. With the spotlight on this use case, we wanted to highlight how to perform a migration from dbt-postgres to dbt-materialize in the following blog.
If you’re a PostgreSQL user, you’ve likely experienced the power of using dbt when managing your data models for analytics. However, as the complexities of your data models grow, running analytical queries directly on your transactional database often degrades the performance of your data pipelines. Materialize enables the efficient execution of complex queries on fresh, up-to-date data, all while reducing the load on your primary database.
In this guide, we’ll show you how to migrate your existing PostgreSQL dbt project to Materialize with minimal SQL tweaks. By offloading complex models to an operational data store (ODS) like Materialize, you can run high-performance analytics on real-time data without sacrificing consistency or speed.
Materialize’s dbt Adapter: Standard dbt + New Streaming Functionality
You can use dbt Core with Materialize by installing our dbt-materialize
plugin. dbt-materialize
brings you the best functionality of dbt, along with new functionality enabled by streaming. Here’s a step-by-step walkthrough on how to set up dbt-materialize
.
Step-by-Step Walkthrough: How to Install dbt-materialize
- Install the Materialize adapter: As a first step, you must install
dbt-materialize
anddbt-core
on your local environment. We recommend using a virtual environment for managing dependencies. Perform the following commands in your terminal:
python3 -m venv dbt-venv
source dbt-venv/bin/activate
pip install dbt-core dbt-materialize
You can only use the dbt-materialize
adapter with dbt Core. If you need dbt Cloud support, reach out to dbt Labs about prioritizing this functionality.
- Configure connection details to Materialize: Update your
profiles.yml
to connect to Materialize. This configuration will replace any existing database connection. Copy and paste the following:
default:
outputs:
dev:
type: materialize
host: <host>
user: <user>
password: <password>
port: 6875
database: materialize
schema: public
ssl: required
target: dev
- Test the connection: Verify that your connection to Materialize works by running the following command:
dbt debug
If the command is executed successfully, you should see a confirmation that dbt is connected to Materialize.
- Deploy a sample data model: to help you get started, the
dbt-materialize
adapter includes sample models to run the Materialize quickstart. You can create a sample project using:
dbt init mz_get_started
And then deploy it to your Materialize target environment using:
dbt run
Migrating Model Types: What You Need to Know
When migrating your dbt models from dbt-postgres
(PostgreSQL) to dbt-materialize
(Materialize), most model types stay the same. Only table and incremental models require adjustments. Read the entries below to understand the changes.
Change #1 - Tables Become Views with Indexes
In PostgreSQL, table materializations are often used to precompute complex logic, providing faster query performance. However, this comes at a cost: the results become increasingly stale over time. Each time the underlying data changes, your queries continue to rely on outdated results until you manually refresh the table.
In contrast, Materialize solves this problem by replacing tables with views that are indexed for real-time performance. These indexed views are incrementally updated as new data comes in, ensuring always up-to-date query outputs without needing to manually refresh them.
-- models/customer_revenue.sql
{{ config(materialized='view', indexes=[{'columns': ['customer_id']}]) }}
SELECT
customer_id,
SUM(order_total) AS total_revenue
FROM orders
GROUP BY customer_id;
By adding indexes to the views, you instruct Materialize to precompute and maintain these results in memory for fast query serving. This ensures that queries always run on the freshest data without the staleness issues typical of PostgreSQL.
Change #2 - Incremental Models Become Views with Indexes
One of the biggest advantages of switching to Materialize is the simplification of incremental models. In PostgreSQL, incremental models require complex logic to track new and updated records. In Materialize, you can eliminate that complexity entirely.
Imagine a model that tracks total sales for customers, where the underlying order data can be updated at any time. Below is an example of a typical complex incremental model in dbt-postgres
, where we manually handle timestamps and merge data to account for these updates:
-- models/customer_revenue_incremental.sql
{{ config(
materialized='incremental',
unique_key='customer_id'
) }}
WITH latest_orders AS (
SELECT *
FROM {{ source('public', 'orders') }}
WHERE updated_at > (SELECT COALESCE(MAX(updated_at), '1900-01-01'::timestamp) FROM {{ this }})
),
updated_customers AS (
SELECT
customer_id,
SUM(order_total) AS total_revenue
FROM latest_orders
GROUP BY customer_id
),
existing_customers AS (
SELECT
customer_id,
total_revenue
FROM {{ this }}
WHERE customer_id NOT IN (SELECT customer_id FROM updated_customers)
)
SELECT * FROM updated_customers
UNION ALL
SELECT * FROM existing_customers
In Materialize, there’s no need for complex incremental logic. You can simply define the model as a view with an index, and Materialize takes care of real-time updates behind the scenes, automatically handling inserts, updates, and deletes.
-- models/customer_revenue.sql
{{ config(materialized='view', indexes=[{'columns': ['customer_id']}]) }}
SELECT
customer_id,
SUM(order_total) AS total_revenue
FROM orders
GROUP BY customer_id;
This approach ensures that you always have real-time data, without needing to manually handle incremental logic or refresh schedules.
Change #3 - Materialized Views: Data Sharing, Complex Logic
In Materialize, materialized views support arbitrarily complex logic that goes beyond what PostgreSQL can handle. These views are similar to indexed views in Materialize, but they have a key difference. That is, materialized view results can be shared across clusters and the results can be exported to external systems, such as Kafka topics.
A cluster in Materialize is an isolated pool of compute resources dedicated to handling specific workloads, ensuring scalability and high performance. Think of clusters as separate environments within the same system, where each operates independently to process queries without interference. Sharing results between clusters is like sharing results across databases. Materialized views provide flexibility by allowing multiple clusters to access the same precomputed data, enabling efficient distribution and scaling of workloads.
Indexed views are ideal for fast access to precomputed results for frequent queries. Materialized views are best for scenarios where you need to share results across multiple clusters. With materialized views, you can make data available across clusters, ensuring that even large, distributed workloads can access the same up-to-date results.
For example, if you maintain a view of high-value customers and need this data for different workloads running on separate clusters:
-- models/high_value_customers.sql
{{ config(materialized='materialized_view') }}
SELECT
customer_id,
SUM(order_value) AS total_value
FROM orders
GROUP BY customer_id
HAVING SUM(order_value) > 1000;
This ensures that each cluster can access the same precomputed results, enhancing scalability and efficiency across your infrastructure.
Change #4 - Temporal Filters
In PostgreSQL, implementing time-based filtering often involves using the NOW()
function or similar expressions. Time-based filtering introduces a unique challenge because query results can change as time progresses, even if the underlying data remains static. Inexpertly using NOW()
in an indexed or materialized view could cause the system to reprocess every record at every moment, negating the benefits of incremental view maintenance.
Materialize addresses this with a feature called temporal filters, which helps optimize queries by filtering data based on timestamps. Temporal filters ensure that your query always returns the most relevant and up-to-date information, automatically excluding old or outdated data.
For most users migrating from PostgreSQL, this change is straightforward: simply replace NOW()
with mz_now()
in your models. While mz_now()
behaves similarly to NOW()
, it allows Materialize to efficiently evaluate the query, only recalculating data when necessary, ensuring efficient incremental updates without overloading the system.
For example, PostgreSQL, you might use the NOW()
function to filter orders from the last 24 hours like this.
{{ config(materialized='table') }}
SELECT
order_id,
customer_id,
order_total,
order_date
FROM orders
WHERE order_date + INTERVAL ‘24 hours’ >= NOW();
In Materialize, you can achieve the same logic with temporal filters using mz_now()
.
{{ config(materialized='view', indexes=[{'columns': ['order_id']}]) }}
SELECT
order_id,
customer_id,
order_total,
order_date
FROM orders
WHERE order_date + INTERVAL ‘24 hours’ >= mz_now();
Port Models from dbt-postgres
and dbt-materialize
with Ease
You can port your models from dbt-postgres
to dbt-materialize
easily, requiring minimal changes. This is often one of the first steps a customer takes when they start working with Materialize. We’ve seen countless successful ports performed, allowing our customers to productionize their real-time use cases rapidly.
With your dbt models now live in Materialize, you can elevate your workflow by integrating advanced development practices:
Unit Testing: Ensure the integrity of your data models by adding tests that validate SQL logic before deployment, minimizing the risk of errors.
Blue/Green Deployments: Achieve zero-downtime updates by alternating between environments, allowing you to deploy new models without disrupting services.
CI/CD Integration: Streamline your deployment pipeline with automated CI/CD workflows, combining them with blue/green strategies for continuous, reliable rollouts.
To start your dbt-postgres
migration, sign up for a free trial of Materialize now and leverage our dbt adapter to easily port your SQL logic over.
And come visit us at the Coalesce 2024 conference next week to watch Steffen Hausmann, Field Engineer at Materialize, talk with Wolf Rendall of Vontive about creating real-time loan origination systems.