Managing big enough data in postgres

Postgres can work quite well for large databases without having to look at NoSQL solutions. But if you find your tables getting larger than the physical memory, you might want to look for ways to optimize your peformance. In my current consulting assignment, I have to deal with a table that is 104 gigabytes on Amazon RDS running Postgres. This table is at the heart of the system and any query that as much as glances at the table sideways gets incredibly slow. To make matters worse, we had to run a migration on that table that changed the data type of one of the columns from bigint to varchar.

That would just not work. The one way to solve such situations is to partition the table. PostgreSQL has a feature called as table inheritance that allows child tables to share the same structure of the parent table and the child tables can participate in queries to the parent table. What is cool about this is that postgres also supports the concept of table spaces that lets multiple tables be in separate directories or mount points. So technically, you can mount more often accessed data on fast SSDs or an EBS volume with higher provisioned IOPS and move older data to slower low cost storage devices. This also has the additional benefit of reducing the size of working set and being able to perform seq scans in memory rather than index scans on disk. The data I was dealing with was largely time series data and time seemed like the obvious way to partition it. The first order of business was to export existing data to CSV files using the \copy command using limit and offset. The table I was dealing with contained close to about a 100 million rows with multiple JSON fields so I exported them a million at a time.

The next step was to create a clone table that had the same structure as the large table.

create table large_table_clone as 
   select * from large table where 1=2;

The you can create child tables by specifying the check condition and the table you want to inherit from.

create table large_table_clone_201504 (
  check(start_time >= date '2015-04-01' 
        start_time < date '2015-05-01')
) inherits(large_table_clone)

We can then create a trigger on the parent table to send records to the child table based on the inserted record’s start_time. The trigger would look something like this.

CREATE OR REPLACE FUNCTION partition_function()
  IF ( NEW.start_time <= DATE '2015-04-01' 
       NEW.start_time > DATE '2015-05-01' ) THEN
    INSERT INTO large_table_clone_201504 VALUES (NEW.*);
  ELSIF ( NEW.start_time <= DATE '2015-03-01' 
          NEW.start_time > DATE '2015-04-01' ) THEN
    INSERT INTO large_table_clone_201503 VALUES (NEW.*);
    RAISE EXCEPTION 'Date out of range';
LANGUAGE plpgsql;

CREATE TRIGGER large_table_clone_partition
BEFORE INSERT ON large_table_clone
FOR EACH ROW EXECUTE PROCEDURE partition_function();

The only trouble was I was not sure of the valid date range in my dataset and there was no hope of getting to it from the existing table. So I had to create tables on the fly as data came in.

create or replace 
function partition_function() 
returns trigger as
  table_master varchar := 'large_table_clone';
  table_part varchar := '';
  start_date date := null;
  end_date date := null;
  table_count int;
  table_part := table_master || 
                '_' || 
  start_date := date_trunc('month',NEW.start_time)::date;
  end_date := (date_trunc('month',NEW.start_time) 
               + '1 month')::date;

  execute format('select tablename from pg_tables 
                  where tablename=%L limit 1',table_part);

  get diagnostics table_count = ROW_COUNT;

  if table_count < 1 then
    execute format(
     'create table if not exists %I (
        check( start_time <@ [%L,%L]::tsrange)
      ) inherits (%I)', 
     table_part, start_date, end_date, table_master);
  end if;

  execute format(
   'insert into %I values (($1).*)', 
   table_part) using NEW;

  return null;
language plpgsql volatile
cost 100;

This dynamically creates tables as fills in the data as they are inserted into large_table_clone. In case you were wondering the COPY command respects the insert trigger so you can bulk load the data exported via CSV files to the new tables.

Since we have table inheritance setup, we can now run queries against the large_table_clone table and it will transparently run the queries across all the child tables and get us the result. But this puts us back where we started. If you really want to get the benefits of partitioning, you will have to use constraint exclusion.

set constraint_exclusion=on;
select * from large_table_clone 
where start_time between '2015-01-01' 
                         and '2015-02-15';

This will ensure that it only performs the queries on two of the partitions rather than all the partitions. The next obvious step was to create an index on large_table_clone_yyyy_mm partitions on the start_time and cluster the tables using that index. This worked in our cases as the records are usually immutable and older partitions can hence be safely clustered. This made the large table a lot more manageable and we can feel safe archiving old data without impacting the performance of the production system.


6 thoughts on “Managing big enough data in postgres”

  1. In your first trigger function, it seems like the comparison operators are reversed. I don’t think the conditional statements will ever be true.


  2. Thanks for this writeup. I found it to be very useful. I had faced a similar issue in the past with where we could have used this solution(again with massive time series data). At that time I resorted to manually archiving the data, I wish I had known about this. Thanks again :)

    Liked by 1 person

  3. Thank you for your solution!
    How many insert queries to the table? Is it possible to get race condition when 2 requests will start relation of new partition table? How to deal with it? As I get all concurrent queries will be failed?

    Do you use Master-Slave replication? How to replicate such tables? Is it enough to replicate parent table?


    1. The insert volume was fairly low about 200 inserts per second. However it tends to burst when the queue workers get weird. The outer transaction is set to serializable in our setup. I guess this takes care of race conditions on insert.

      We use Amazon RDS for our database and that has a Multi AZ option. My understanding is that you do not have to do anything to setup replication and Amazon would manage it transparently. In any case, the PostgreSQL replication docs state that it mirrors all changes to the database including DDLs and even creation of new databases.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s