Big Data and PostgreSQL – Scaling out by partitioning tables

Big Data is a buzz word doing the rounds nowadays, and the tool people generally associate with Big Data is Hadoop. While it is true that the sheer volume of data Hadoop can efficiently and cost effectively manage is unparalleled, not all businesses actually need Hadoop. What if you don’t want to adopt yet another fairly complex technology (Hadoop) that you would need to maintain and have in-house expertise in? What if you could do it all in PostgreSQL?

Due to many rich and diverse features, I have found PostgreSQL to be able to do a good enough job for most business cases. I have already talked about handling unstructured data with JSON, JSONB, and HSTORE. Another feature that you can use for scaling is table partitioning.

Let me walk you through a very simple tutorial on how to partition tables in PostgreSQL to enable massive scale-out.

Before starting off, let’s create a new database and then a table that tracks events in it. The table simply stores the type of event, its time, and comments associated with it.

postgres=# CREATE DATABASE partition_sample;

CREATE DATABASE

partition_sample =# CREATE TABLE events (id SERIAL, type INTEGER, time TIMESTAMP WITH TIME ZONE, comments TEXT);

CREATE TABLE

1. Tables with inheritance and CHECK CONSTRAINT

First step is to create the partitions, which will essentially be tables that inherit from ‘events’. Each of these tables will have a CHECK CONSTRAINT specifying exactly what range of values are allowed in it. Below, I am setting up 9 such tables.

Table to contain events of type ‘1’ occurring in January 2015:

partition_sample =# create table events_1_2015_01 (CONSTRAINT events_type_time_check CHECK (time >= ‘2015-01-01’ AND time < ‘2015-02-01’ AND type = 1)) inherits(events);

CREATE TABLE

Table to contain events of type ‘2’ occurring in January 2015:

partition_sample=# create table events_2_2015_01 (CONSTRAINT events_type_time_check CHECK (time >= ‘2015-01-01’ AND time < ‘2015-02-01’ AND type = 2)) inherits(events);

CREATE TABLE

And so on forth …

partition_sample=# create table events_3_2015_01 (CONSTRAINT events_type_time_check CHECK (time >= ‘2015-01-01’ AND time < ‘2015-02-01’ AND type = 3)) inherits(events);

CREATE TABLE

partition_sample=# create table events_1_2015_02 (CONSTRAINT events_type_time_check CHECK (time >= ‘2015-02-01’ AND time < ‘2015-03-01’ AND type = 1)) inherits(events);

CREATE TABLE

partition_sample=# create table events_2_2015_02 (CONSTRAINT events_type_time_check CHECK (time >= ‘2015-02-01’ AND time < ‘2015-03-01’ AND type = 2)) inherits(events);

CREATE TABLE

partition_sample=# create table events_3_2015_02 (CONSTRAINT events_type_time_check CHECK (time >= ‘2015-02-01’ AND time < ‘2015-03-01’ AND type = 3)) inherits(events);

CREATE TABLE

partition_sample=# create table events_1_2015_03 (CONSTRAINT events_type_time_check CHECK (time >= ‘2015-03-01’ AND time < ‘2015-04-01’ AND type = 1)) inherits(events);

CREATE TABLE

partition_sample=# create table events_2_2015_03 (CONSTRAINT events_type_time_check CHECK (time >= ‘2015-03-01’ AND time < ‘2015-04-01’ AND type = 2)) inherits(events);

CREATE TABLE

partition_sample=# create table events_3_2015_03 (CONSTRAINT events_type_time_check CHECK (time >= ‘2015-03-01’ AND time < ‘2015-04-01’ AND type = 3)) inherits(events);

CREATE TABLE

A description of the ‘event’s table shows how these child tables are now associated with it:

partition_sample=# \d+ events

Table “public.events”

Column  |           Type           |                      Modifiers                      | Storage  | Stats target | Description

———-+————————–+—————————————————–+———-+————–+————-

id       | integer                  | not null default nextval(‘events_id_seq’::regclass) | plain    |              |

type     | integer                  |                                                     | plain    |              |

time     | timestamp with time zone |                                                     | plain    |              |

comments | text                     |                                                     | extended |              |

Child tables: events_1_2015_01,

events_1_2015_02,

events_1_2015_03,

events_2_2015_01,

events_2_2015_02,

events_2_2015_03,

events_3_2015_01,

events_3_2015_02,

events_3_2015_03

2. Trigger function to route data appropriately

Next step is to create a trigger function that will fire before an insert and will route data to the appropriate partition. Below is the PL/pgSQL code for this. Notice how, before insertion, table name is determined on the fly by appending ‘type’ and ‘time’ of new incoming data.

CREATE FUNCTION events_partition_function() RETURNS TRIGGER

LANGUAGE plpgsql

AS $_$

DECLARE

v_Table    VARCHAR;

sql        VARCHAR;

BEGIN

v_Table = ‘events’ || ‘_’ || NEW.type || ‘_’ || to_char(NEW.time, ‘YYYY_MM’);

sql := ‘INSERT INTO ‘ || v_Table || ‘ VALUES ( ($1).*)’;

EXECUTE sql USING NEW;

return NULL;

END $_$;

Next step is to create a trigger on the ‘events’ table.

CREATE TRIGGER trg_events_insert

BEFORE INSERT ON events

FOR EACH ROW

EXECUTE PROCEDURE events_partition_function();

3. Inserting data

Now let’s insert data that will demonstrate how partitioning works. The data below is designed to populate 1 row in each of the partitions. The ‘comments’ column contains text that mentions where to expect this data.

partition_sample=# insert into events (type, time, comments) values (1, ‘2015-01-15’, ‘This should be in events_1_2015_01’),     

 (2, ‘2015-01-15’, ‘This should be in events_2_2015_01’),   

(3, ‘2015-01-15’, ‘This should be in events_3_2015_01’), 

(1, ‘2015-02-15’, ‘This should be in events_1_2015_02’), 

(2, ‘2015-02-15’, ‘This should be in events_2_2015_02’),   

(3, ‘2015-02-15’, ‘This should be in events_3_2015_02’),       

(1, ‘2015-03-15’, ‘This should be in events_1_2015_03’), 

(2, ‘2015-03-15’, ‘This should be in events_2_2015_03’),       

(3, ‘2015-03-15’, ‘This should be in events_3_2015_03’);

INSERT 0 0

4. Verifying the data

Even though the data got inserted into respective partitions, because these partitions inherit from the master table ‘events’, querying ‘events’ will retrieve all data.

partition_sample=# SELECT * FROM events;

id | type |          time          |              comments

—-+——+————————+————————————

4 |    1 | 2015-01-15 00:00:00+05 | This should be in events_1_2015_01

5 |    2 | 2015-01-15 00:00:00+05 | This should be in events_2_2015_01

6 |    3 | 2015-01-15 00:00:00+05 | This should be in events_3_2015_01

7 |    1 | 2015-02-15 00:00:00+05 | This should be in events_1_2015_02

8 |    2 | 2015-02-15 00:00:00+05 | This should be in events_2_2015_02

9 |    3 | 2015-02-15 00:00:00+05 | This should be in events_3_2015_02

10 |    1 | 2015-03-15 00:00:00+05 | This should be in events_1_2015_03

11 |    2 | 2015-03-15 00:00:00+05 | This should be in events_2_2015_03

12 |    3 | 2015-03-15 00:00:00+05 | This should be in events_3_2015_03

(9 rows)

Partitions need to be designed according to the expected workload. In this case, it is expected that extract criteria from queries will generally containing the ‘type’ and ‘time’ components. An example is:

partition_sample=# select * from events where type = 3 and time > ‘2015-02-01’ and time < ‘2015-03-01’;

id | type |          time          |              comments

—-+——+————————+————————————

9 |    3 | 2015-02-15 00:00:00+05 | This should be in events_3_2015_02

(1 row)

5. The scale-out advantage

To illustrate how this enables you to scale-out PostgreSQL, simply run the EXPLAIN command with the last query run above.

partition_sample=# explain select * from events where type = 3 and time > ‘2015-02-01’ and time < ‘2015-03-01’;

QUERY PLAN

——————————————————————————————————————————————————————

Append  (cost=0.00..28.20 rows=2 width=48)

->  Seq Scan on events  (cost=0.00..0.00 rows=1 width=48)

Filter: ((“time” > ‘2015-02-01 00:00:00+05’::timestamp with time zone) AND (“time” < ‘2015-03-01 00:00:00+05’::timestamp with time zone) AND (type = 3))

->  Seq Scan on events_3_2015_02  (cost=0.00..28.20 rows=1 width=48)

Filter: ((“time” > ‘2015-02-01 00:00:00+05’::timestamp with time zone) AND (“time” < ‘2015-03-01 00:00:00+05’::timestamp with time zone) AND (type = 3))

(5 rows)

As you can see, this query runs a sequential scan on the related partition only, leaving the other partitions alone. This is what allows you to scale out PostgreSQL … you can create partitions on tables with huge quantities of rows and still be able to very quickly retrieve the data. You can actually create indexes on these partitions as well to speed things up further.

You gotta love PostgreSQL! J

Leave A Comment