Creating pivot tables with Postgres

I have been using PostgreSQL for a while now and I have been growing fond of it over time. As a part of the Rails app that I was working with, we have been logging API requests to a table. The target systems which call the API do so in bursts and we wanted to find the frequency of these calls based on the hour of the day. All the API calls were logged in a table with the following structure.

+----+---------------+------------+------------+
| id | xml_content   | created_at | updated_at |
|----+---------------+------------+------------|
| 1  | <somecontent> | timestamp  | timestamp  |
| 2  | <somecontent> | timestamp  | timestamp  |
| 3  | <somecontent> | timestamp  | timestamp  |
| 4  | <somecontent> | timestamp  | timestamp  |
+----+---------------+------------+------------+

To have a better sense of when we receive these requests, I wanted to segregate them by the hour for a given month. This is fairly simple and we can write a query as below.

select (date_part('year',created_at)*100 + 
        date_part('month',created_at)) yearmonth,
       date_part('hour',created_at) hour_of_day,
       count(1) no_of_requests
from call_logs
group by date_part('year',created_at)*100 + 
          date_part('month',created_at),
          date_part('hour',created_at) 
order by yearmonth desc, hour_of_day

The results of the query look a bit like this.

+-----------+-------------+----------------+
| yearmonth | hour_of_day | no_of_requests |
|-----------+-------------+----------------|
| 201504    | 00          | 2500           |
| 201504    | 01          | 2200           |
| 201504    | 02          | 2123           |
| ...       | ...         | ...            |
| 201503    | 11          | 1282           |
| ...       | ...         | ...            |
+-----------+-------------+----------------+

Although this gives the necessary data that we need, it would be easier if the hours appeared as columns rather than rows. Our postgres query will have to change the shape of the result set based on the number of hours returned in the query. Postgres has an extension that provides crosstab functionality to create pivot table like structures called tablefunc. The syntax is a bit verbose but essentially the crosstab function has the following signature. There are other crosstab signatures but I will leave it as an exercise to the reader to explore the other signatures.

crosstab(dataset_query::text, category_query::text) 
-- returns setof records

We have already written our query to return the dataset. The dataset query should always produce rows with 3 columns. The first column takes the row key. The values in the second column become actual columns on which that table is pivoted and the values in the third column are associated with the values in the second column and appear as values under the column bearing the value same value in the cross tabe record set. As this returns a set of records, we will have to typecast the returned records set as a table. If you have a lot of categories returned by the second query, it becomes a bit of a pain to describe it. In our case, since the second column is hour numbers, it can take values from 0 to 23. So we use the generate_series function to generate the values.

Since our query has single quotes, we need to escape them properly. Postgres has dollar quoted strings that let us express queries within $tag$ tags and it auto escapes the quotes for us. This makes the query quite readable. So the final resulting query looks a bit like this.

 
select * from crosstab(
  -- dataset query
  $q$select (date_part('year',created_at)*100 +
    date_part('month',created_at)) yearmonth, date_part('hour',created_at)
  hour_of_day, count(1) no_of_requests 
  from call_logs
  group by date_part('year',created_at)*100 + date_part('month',created_at),
  date_part('hour',created_at) order by yearmonth desc, hour_of_day$q$,
  -- category query
  $q$ select * from generate_series(0,23)$q$) 
as (datepart float, h00 bigint, h01 bigint, h02 bigint, 
                    h03 bigint, h04 bigint, h05 bigint, h06 bigint, 
                    h07 bigint, h08 bigint, h09 bigint, h10 bigint, 
                    h11 bigint, h12 bigint, h13 bigint, h14 bigint,
                    h15 bigint, h16 bigint, h17 bigint, h18 bigint, 
                    h19 bigint, h20 bigint, h21 bigint, h22 bigint, 
                    h23 bigint);

This produces result that looks like this.

+-----------+------+------+------+------+-----+------+------+
| yearmonth |  h00 | h01  | h02  | h03  | ... | h22  | h23  |
|-----------+------+------+------+------+-----+------+------|
| 201504    | 2312 | 2323 | 2323 | 2311 | ... | 2223 | 1132 |
| 201503    | 2322 | 2323 | 2263 | 2311 | ... | 2223 | 1132 |
+-----------+------+------+------+------+-----+------+------+

This extension makes it very easy to visualize tabular data with PostgreSQL using pivot tables.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s