StormaticsStormatics

PostgreSQL Materialized Views: When Caching Your Query Results Makes Sense (And When It Doesn’t)

The Pain and the Real Constraint

Your dashboard queries are timing out at 30 seconds. Your BI tool is showing spinners. Your users are refreshing the page, wondering if something’s broken.

You’ve indexed everything. You’ve tuned shared_buffers. You’ve rewritten the query three times. The problem isn’t bad SQL – it’s that you’re forcing PostgreSQL to aggregate, join, and scan millions of rows every single time someone opens that report.

Here’s a clear stance: repeated heavy computations are a design choice, not a badge of honour. If you’re running the same expensive calculation dozens of times a day, you’re choosing to do more work than necessary.

This post shows you how to turn one expensive query shape into a fast, indexed object with explicit freshness and operational control. Materialized views give you predictable reads when you’re willing to accept a refresh contract.

What a Materialized View Actually Is (and What It Is Not)

Definition in Plain Words

A materialized view is a physical relation that stores the result set of a query.

When you create one, PostgreSQL runs your query, writes the output to disk, and keeps it there until you tell it to refresh. That’s it. No magic. Just a snapshot you control.

Compare the Three Common Patterns

Let’s be precise about what you’re choosing:

View: Computed at read time, always current. PostgreSQL rewrites your query against the underlying tables every time. Zero staleness, full computation cost on every read.

Materialized view: Computed at refresh time, fast reads. You decide when to refresh. Reads are fast and predictable because they’re hitting stored data. Staleness is explicit and bounded by your refresh schedule.

Summary table: You own the update pipeline. Whether it’s ETL jobs, application code, or triggers—you’re writing the insert/update logic and managing incremental changes yourself.

Why the “Physical” Part Matters

Because it’s a physical relation, you can:

  • Index it like any table
  • Let the planner treat it like a table (predictable execution plans)
  • Pay for storage and refresh work in exchange for making reads fast and deterministic

You’re trading computation-on-read for computation-on-schedule. That’s the contract.

When Materialized Views Are a Strong Fit

The Best-Fit Workload Shapes

Materialized views work best when:

  1. Repeated reporting queries with stable patterns hit the same aggregations (BI dashboards, executive summaries, weekly rollups)
  2. Heavy joins and aggregations across large tables that don’t change second-by-second
  3. Precomputed metrics that are “fresh enough” on a schedule your business can accept

If your query shape is stable and your freshness requirement is measured in minutes or hours (not milliseconds), materialized views are worth evaluating.

A Concrete Motivating Example

Let’s use an e-commerce order revenue summary. You’re joining:

  • orders (10M rows)
  • order_items (40M rows)
  • products (500K rows)
  • customers (2M rows)

Your query aggregates revenue by product category, customer region, and week. It’s grouped, filtered by tenant, and sorted by revenue descending.

Baseline symptoms:

  • Execution time: 28 seconds
  • The query runs 40+ times per day (dashboard loads, exports, API calls)
  • Users complain, BI tool times out, someone opens a ticket

You know the query is expensive. The question is whether you want to keep paying that cost every single time.

Build It Step-by-Step (Copy/Paste SQL)

Start with the Baseline Query

Here’s the query you want to stop recomputing:

SELECT 
   p.category,
   c.region,
   DATE_TRUNC('week', o.order_date) AS week,
   COUNT(DISTINCT o.order_id) AS order_count,
   SUM(oi.quantity * oi.unit_price) AS total_revenue
FROM orders o
JOIN order_items oi ON o.order_id = oi.order_id
JOIN products p ON oi.product_id = p.product_id
JOIN customers c ON o.customer_id = c.customer_id
WHERE o.tenant_id = 'acme_corp'
 AND o.order_date >= CURRENT_DATE - INTERVAL '90 days'
GROUP BY p.category, c.region, DATE_TRUNC('week', o.order_date)
ORDER BY total_revenue DESC;

Execution: 28 seconds. Buffers: scanning 80GB+ across four tables.

Create the Materialized View Safely

CREATE MATERIALIZED VIEW mv_order_revenue_summary AS
SELECT
   o.tenant_id,
   p.category,
   c.region,
   DATE_TRUNC('week', o.order_date) AS week,
   COUNT(DISTINCT o.order_id) AS order_count,
   SUM(oi.quantity * oi.unit_price) AS total_revenue
FROM orders o
JOIN order_items oi ON o.order_id = oi.order_id
JOIN products p ON oi.product_id = p.product_id
JOIN customers c ON o.customer_id = c.customer_id
WHERE o.order_date >= CURRENT_DATE - INTERVAL '90 days'
GROUP BY o.tenant_id, p.category, c.region, DATE_TRUNC('week', o.order_date)
WITH NO DATA;

Why WITH NO DATA?

It creates the structure without populating it immediately. This gives you control—you can add indexes first, then populate in a maintenance window. It’s a cleaner rollout.

Populate It

REFRESH MATERIALIZED VIEW mv_order_revenue_summary;

First refresh: 4.2 seconds. That’s the cost you’ll pay each time you refresh.

Index It Like a Production Object

Here’s where most performance wins happen. Your materialized view is a table—treat it like one.

-- Filter columns (tenant, date bucket)
CREATE INDEX idx_mv_revenue_tenant_week
ON mv_order_revenue_summary(tenant_id, week);

-- Common grouping dimensions
CREATE INDEX idx_mv_revenue_category
ON mv_order_revenue_summary(category);

CREATE INDEX idx_mv_revenue_region
ON mv_order_revenue_summary(region);

Critical guidance: Index design should match the read patterns of the MV consumers, not the base tables. Ask yourself: how will people query this snapshot?

Now your dashboard query becomes:

SELECT category, region, week, order_count, total_revenue
FROM mv_order_revenue_summary
WHERE tenant_id = 'acme_corp'
 AND week >= CURRENT_DATE - INTERVAL '90 days'
ORDER BY total_revenue DESC;

Execution: 180 milliseconds. Index scan, no joins, no aggregation.

You’ve turned a 28-second computation into a 180ms index lookup.

Refresh Strategies (and How to Choose)

Full Refresh (Simple, Predictable)

REFRESH MATERIALIZED VIEW mv_order_revenue_summary;

This rewrites the entire materialized view. While it’s running, reads are blocked. The view is locked until the refresh completes.

Operational notes:

  • Schedule off-peak if your refresh takes more than a few seconds
  • Treat it as a batch job with a runtime budget
  • Monitor duration as data grows

Concurrent Refresh (Keeps Reads Available)

REFRESH MATERIALIZED VIEW CONCURRENTLY mv_order_revenue_summary;

This builds the new snapshot in the background and swaps it in atomically. Reads stay available throughout.

Requirements:

  • A unique index that reflects the logical uniqueness of MV rows

For our example:

CREATE UNIQUE INDEX idx_mv_revenue_unique 
ON mv_order_revenue_summary(tenant_id, category, region, week);

Trade-offs (framed constructively):

  • Higher refresh overhead (PostgreSQL does more work to build and merge)
  • You need to design for uniqueness and accept slightly longer refresh times

I believe concurrent refresh is worth it when your materialized view serves user-facing queries and a 4-second lock would be visible.

Freshness as a Contract (What Teams That Do Well Always Define)

Define “fresh enough” in business terms:

  • Every 5 minutes for near-real-time dashboards
  • Hourly for internal reporting
  • Daily for executive summaries

Define operational SLOs:

  • Max refresh runtime: 10 seconds
  • Acceptable staleness window: up to 1 hour

Make staleness visible. Users should know when the data was last refreshed. We’ll cover how in the observability section.

Scheduling Refreshes (Cron, pg_cron, K8s, Managed Cloud)

One Scheduler Owns Refresh

Avoid multiple sources triggering refresh. Choose one mechanism and stick with it.

Scheduling Options

OS cron + psql:

# /etc/cron.d/refresh-mv
0 * * * * postgres psql -d production -c "REFRESH MATERIALIZED VIEW CONCURRENTLY mv_order_revenue_summary;"

pg_cron (when available):

SELECT cron.schedule('refresh-revenue-mv', '0 * * * *', 
  $$REFRESH MATERIALIZED VIEW CONCURRENTLY mv_order_revenue_summary$$);

Kubernetes CronJob:

apiVersion: batch/v1
kind: CronJob
metadata:
 name: refresh-mv-revenue
spec:
 schedule: "0 * * * *"
 jobTemplate:
   spec:
     template:
       spec:
         containers:
         - name: refresh
           image: postgres:16
           command:
           - psql
           - -c
           - "REFRESH MATERIALIZED VIEW CONCURRENTLY mv_order_revenue_summary;"
          restartPolicy: OnFailure

App-driven refresh (only when you control concurrency and backoff):

Use this sparingly. Application-triggered refreshes can lead to refresh storms if you’re not careful.

Managed Cloud Realities (Practical Notes)

RDS, Azure Database for PostgreSQL, and Cloud SQL have different extension policies. pg_cron might not be available, or you might not have permissions to install it.

In that case, scheduling often moves to:

  • Cloud Scheduler (GCP)
  • EventBridge (AWS)
  • Azure Automation
  • Kubernetes CronJobs in the same environment

These work fine. The important part is having one source of truth for your refresh schedule.

Prevent Overlapping Refresh Runs

If your refresh takes 6 minutes and you schedule it every 5 minutes, you’ll have multiple refresh jobs competing.

Advisory lock pattern (conceptual):

DO $$
BEGIN
 IF pg_try_advisory_lock(12345) THEN
   REFRESH MATERIALIZED VIEW CONCURRENTLY mv_order_revenue_summary;
   PERFORM pg_advisory_unlock(12345);
 ELSE
   RAISE NOTICE 'Refresh already running, skipping';
 END IF;
END $$;

Refresh starts only if the lock is acquired. This avoids refresh storms during delays.

Observability: Measure Refresh Cost and Staleness

Before/After Proof (What to Show in the Post)

Let’s put real numbers on this:

Baseline query time: 28 seconds
MV query time: 180 milliseconds
Refresh overhead: 4.2 seconds
Refresh cadence: Every hour

You’re doing 4.2 seconds of work every hour to save 28 seconds on each of 40+ queries per day. The math works.

EXPLAIN (ANALYZE, BUFFERS) for baseline:

GroupAggregate  (cost=2847392.18..2847395.32 rows=1 width=89) (actual time=27823.445..27823.451 rows=156 loops=1)
 Buffers: shared hit=9234 read=1847234
  ->  Sort  (cost=2847392.18..2847392.68 rows=200 width=57) (actual time=27811.234..27812.891 rows=8923456 loops=1)

After (querying the MV):

Index Scan using idx_mv_revenue_tenant_week on mv_order_revenue_summary  (cost=0.42..23.18 rows=156 width=89) (actual time=0.034..0.178 rows=156 loops=1)

  Buffers: shared hit=12

That’s the difference. You’ve moved the heavy lifting to a scheduled job.

Tracking “Last Refresh Time” (Correct Approach)

Important: pg_matviews does not store last_refresh. That column doesn’t exist.

Practical patterns:

  1. A small mv_refresh_log table updated by your refresh job:
CREATE TABLE mv_refresh_log (
   mv_name TEXT PRIMARY KEY,
   last_refresh_at TIMESTAMPTZ,
   refresh_duration_ms INTEGER
);

DO $$
DECLARE
 start_time TIMESTAMPTZ := clock_timestamp();
 end_time TIMESTAMPTZ;
 duration_ms INTEGER;
BEGIN
 REFRESH MATERIALIZED VIEW CONCURRENTLY mv_order_revenue_summary;
 end_time := clock_timestamp();
  duration_ms := EXTRACT(EPOCH FROM (end_time - start_time)) * 1000;

 INSERT INTO mv_refresh_log (mv_name, last_refresh_at, refresh_duration_ms)
 VALUES ('mv_order_revenue_summary', end_time, duration_ms)
 ON CONFLICT (mv_name) 
 DO UPDATE SET last_refresh_at = EXCLUDED.last_refresh_at,
               refresh_duration_ms = EXCLUDED.refresh_duration_ms;
END $$;

Now you can expose freshness to users:

SELECT 
   mv_name,
   last_refresh_at,
   NOW() - last_refresh_at AS staleness,
   refresh_duration_ms
FROM mv_refresh_log
WHERE mv_name = 'mv_order_revenue_summary';
  1. Scheduler job history (pg_cron job run details, or platform logs)
  2. Logging refresh duration in PostgreSQL logs and aggregating in your observability stack

Watch the Impact on the Rest of the System

Monitor:

  • Refresh CPU and IO (does it spike? does it compete with writes?)
  • Temp file usage (large sorts/hashes during refresh can spill to disk)
  • Replica lag sensitivity (if refresh competes with write workload on the primary, replicas might fall behind)

Materialized view refreshes are queries. They use resources. Plan accordingly.

Performance Tuning That Consistently Pays Off

Indexes on the MV aligned to read patterns: We covered this. It’s the single biggest lever.

Ensure base tables stay healthy:

  • ANALYZE keeps statistics fresh
  • VACUUM and autovacuum prevent bloat
  • Bloated base tables make refresh slower

Reduce refresh work:

  • Simplify the MV query (do you really need all those joins?)
  • Pre-filter with partitions where relevant (if your base tables are partitioned by date, your refresh can scan fewer partitions)

Resource guardrails:

  • Schedule refresh during low-traffic windows
  • Understand sort and hash behavior in the refresh query (use EXPLAIN to spot large temp writes)

Trade-Offs (Framed as Design Decisions)

Every materialized view comes with trade-offs. Frame them as conscious decisions:

Freshness window: Staleness is a contract. You’re accepting data that’s up to X minutes or hours old. If you need second-by-second accuracy, you will get better outcomes with a different pattern—streaming aggregates, app-level cache with invalidation, or summary tables maintained incrementally.

Refresh cost and operational ownership: Someone has to own the refresh schedule, monitor it, and tune it as data grows.

Storage and backup footprint: You’re duplicating data. Factor this into disk capacity and backup windows.

Maintenance surface area: Another object to index, refresh, monitor, and document.

I believe the trade-offs are worth it when your workload matches the pattern. If it doesn’t, don’t force it.

Common Gotchas and Troubleshooting (Symptom → Cause → Fix)

MV Is Fast at First, Then Slows Down

Symptom: Queries against the MV start fast, then degrade over weeks.

Cause: Data growth + indexes not aligned to access patterns. As the MV grows, full scans become expensive.

Fix: Add indexes for filter columns and common joins. Run EXPLAIN ANALYZE on MV queries to confirm index usage.

Concurrent Refresh Fails

Symptom: REFRESH MATERIALIZED VIEW CONCURRENTLY errors out.

ERROR:  cannot refresh materialized view “public.mv_order_revenue_summary” concurrently

HINT:  Create a unique index with no WHERE clause on one or more columns of the materialized view.

Cause: Missing or incorrect unique index for logical uniqueness.

Fix: Identify the columns that make each row unique and create a unique index:

CREATE UNIQUE INDEX idx_mv_revenue_unique 

ON mv_order_revenue_summary(tenant_id, category, region, week);

Refresh Jobs Overlap

Symptom: Multiple refresh processes running at the same time, competing for resources.

Cause: Scheduler runs without a guardrail. Refresh duration exceeds the schedule interval.

Fix: Implement an advisory lock pattern (shown earlier) or increase the schedule interval.

Refresh Workload Disrupts Primary Workload

Symptom: Writes slow down during refresh. Replica lag spikes.

Cause: Refresh scheduled during peak hours or competing for IO/CPU.

Fix:

  • Move refresh to off-peak windows
  • Simplify the refresh query
  • Consider resource limits (statement timeout, work_mem tuning)

MV Results Surprise People

Symptom: Users report “wrong” data. Confusion about why numbers don’t match real-time queries.

Cause: Freshness contract not explicit and not visible.

Fix: Expose last_refresh_at in your application. Add a timestamp to the dashboard showing when data was last updated. Communicate the refresh schedule clearly.

Alternatives Worth Considering (and When They Win)

Normal views: Use when you have few reads and an always-current requirement. No storage overhead, no refresh jobs, but you pay computation cost on every read.

Summary tables + ETL: Use when incremental updates are feasible and you want full control over the update pipeline. More work to build, but you own the logic and can optimize for incremental changes.

Partitioning + indexing: Use when the real constraint is data layout and pruning. If your query scans too many partitions, materialized views won’t help. Fix the partitioning strategy first.

Cache layer (Redis, Memcached): Use when app-level latency goals dominate and you need sub-millisecond response times. Caches are great for key-value lookups, less so for complex aggregations.

Timescale continuous aggregates: Use when time-series rollups are the core pattern. Continuous aggregates in TimescaleDB handle incremental refresh automatically for time-bucketed data.

Pick the tool that fits the constraint. Materialized views are one option, not the only option.

Conclusion

Materialized views give you predictable reads, controlled computation, and clear freshness. You’re trading real-time accuracy for speed and resource efficiency – on purpose, with a contract.

When the pattern fits, the results are immediate: dashboard queries drop from 28 seconds to 180 milliseconds, users stop complaining, and your database does less work.

The operational commitment is real – you own the refresh schedule, the indexes, and the monitoring. But the alternative is running the same expensive query over and over, hoping the database can keep up.

Leave A Comment