When HASH partitioning works better than RANGE
I have always been a fan of RANGE partitioning using a date/time value in PostgreSQL. This isn't always possible, however, and I recently came across a scenario where a table had grown large enough that it had to be partitioned, and the only reasonable key to use was a UUID styled identifier.The goal of this post is to highlight when and why hashing your data across partitions in PostgreSQL might be a better approach.Range vs. Hash Partitioning in PostgreSQLRange Partitioning (A Quick Recap)Range partitioning in PostgreSQL uses boundary values that define slices of the data, often by date or numeric ranges. If you have a transactions table, you might create monthly partitions based on a transaction_date column. This is intuitive for time-series data because each partition holds rows from a specific date range.Advantages of Range Partitioning:Easy pruning for date-based queries.
Straightforward approach to archiving old data: drop an entire partition for a past month, rather than issuing a massive DELETE.
Pairs nicely with time-based ingestion pipelines, where every day or month gets its own partition.
But as convenient as that is, there are cases where range partitioning runs into problems.Why Range Partitioning Can Fall ShortData Skew: If a huge portion of data lands in a single time interval—say, because of a traffic spike in the current month—that monthly partition might end up significantly larger than the others.
Complex Backfills: Not everyone ingests data in an orderly, daily manner. Sometimes you need to re-ingest or correct data that spans multiple periods. Merging or splitting range partitions can get cumbersome.
Non-Date Dimensions: Some tables aren’t naturally tied to a sequential numeric or date dimension. If your queries center on user IDs or device IDs, dividing by date might not solve your performance issues.