Sometimes it is very useful to store your data not in one big table, but in several so called partitions. A partition is just an inherited table storing the data that fit the given rule. Thereby you can avoid scanning a huge table if you are just looking for a small part of data, or you can put older data to a slower/cheaper storage (put the partitions in tablespaces on that storage).
You don’t have to adapt your code, the CRUD actions are offered to the application in the same way it is for one big table. The whole logic is in Postgresql itself, a trigger function puts the data in the desired partition and the parameter constraint_exclusion=on informs the planner to scan only the affected partitions.
A basic request is to split data on a given timestamp, e.g. create a partition on a monthly basis. In such a scenario you can dive into historical data with no performance impact to the current data. Additionally you can delete no longer needed, old data just by dropping the partition.
It will be best to explain the stuff by an example…..
create a test db
psql -U postgres postgres
postgres=> CREATE DATABASE pgtestdb encoding='UTF-8';
postgres=>\q
psql -U postgres pgtestdb
postgres=> CREATE SCHEMA part_test;
create master table
CREATE TABLE tab_test
(
datetime timestamp without time zone NOT NULL DEFAULT now(),
number integer NOT NULL,
price real NOT NULL DEFAULT 0.000000000000000,
currency_id character varying(3) DEFAULT 'eur'::character varying
);
create four partition tables (inherited from master table)
create table tab_test201011 ( CHECK (datetime >= DATE '2010-11-01' AND datetime < DATE '2010-12-01') ) INHERITS (tab_test);
create table tab_test201012 ( CHECK (datetime >= DATE '2010-12-01' AND datetime < DATE '2011-01-01') ) INHERITS (tab_test);
create table tab_test201101 ( CHECK (datetime >= DATE '2011-01-01' AND datetime < DATE '2011-02-01') ) INHERITS (tab_test);
create table tab_test201102 ( CHECK (datetime >= DATE '2011-02-01' AND datetime < DATE '2011-03-01') ) INHERITS (tab_test);
create trigger function
CREATE OR REPLACE FUNCTION test_insert_trigger()
RETURNS TRIGGER AS $$
BEGIN
IF ( NEW.datetime >= DATE '2010-11-01' AND NEW.datetime < DATE '2010-12-01' ) THEN INSERT INTO tab_test201011 VALUES (NEW.*);
ELSIF ( NEW.datetime >= DATE '2010-12-01' AND NEW.datetime < DATE '2011-01-01' ) THEN INSERT INTO tab_test201012 VALUES (NEW.*);
ELSIF ( NEW.datetime >= DATE '2011-01-01' AND NEW.datetime < DATE '2011-02-01' ) THEN INSERT INTO tab_test201101 VALUES (NEW.*);
ELSIF ( NEW.datetime >= DATE '2011-02-01' AND NEW.datetime < DATE '2011-03-01' ) THEN INSERT INTO tab_test201102 VALUES (NEW.*);
ELSE
RAISE EXCEPTION 'column datetime out of range. Fix the tour_insert_trigger() function!';
END IF;
RETURN NULL;
END;
$$
LANGUAGE plpgsql;
== UPDATE ==
to simplify the trigger function you can replace the code above with the following dynamic creation of the insert statement, moving the row in the appropriate target table. As a further enhancement you don’t have to extend the trigger function every time you’ve added new partitions:
CREATE OR REPLACE FUNCTION test_insert_trigger()
RETURNS trigger AS
$BODY$
DECLARE
date_part varchar(20);
tablename varchar(100);
BEGIN
date_part='';
date_part = to_char(new.datetime,'yyyy') || to_char(new.datetime,'mm');
tablename = 'tab_test' || date_part;
--raise notice 'target table: %', tablename;
execute 'insert into ' || tablename::regclass || '(datetime,number,price,currency_id) select $1,$2,$3,$4' using NEW.datetime,NEW.number,NEW.price,NEW.currency_id;
RETURN NULL;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
ALTER FUNCTION test_insert_trigger() OWNER TO postgres;
create trigger on master table
CREATE TRIGGER insert_test_trigger BEFORE INSERT ON tab_test FOR EACH ROW EXECUTE PROCEDURE test_insert_trigger();
create some indices (not for functionality, just for select performance)
CREATE INDEX idx_test_dt_201011 ON tab_test201011 (datetime);
CREATE INDEX idx_test_dt_201012 ON tab_test201012 (datetime);
CREATE INDEX idx_test_dt_201101 ON tab_test201101 (datetime);
CREATE INDEX idx_test_dt_201102 ON tab_test201102 (datetime);
insert some test data (100000 rows in each partition)
insert into tab_test (datetime, number, price) (select DATE('2010-11-01') + (i%30) * INTERVAL '1 days', i,i::real from (select generate_series(1,100000) as i) as q);
insert into tab_test (datetime, number, price) (select DATE('2010-12-01') + (i%30) * INTERVAL '1 days', i,i::real from (select generate_series(1,100000) as i) as q);
insert into tab_test (datetime, number, price) (select DATE('2011-01-01') + (i%30) * INTERVAL '1 days', i,i::real from (select generate_series(1,100000) as i) as q);
insert into tab_test (datetime, number, price) (select DATE('2011-02-01') + (i%28) * INTERVAL '1 days', i,i::real from (select generate_series(1,100000) as i) as q);
test the stuff
pgtestdb=# set constraint_exclusion=off;
SET
pgtestdb=# explain select count(number) from tab_test where datetime between '2010-10-05 00:00:00' and '2010-10-20 00:00:00';
QUERY PLAN
——————————————————————————————————————————————————————-
Aggregate (cost=43.90..43.91 rows=1 width=4)
-> Append (cost=0.00..43.89 rows=5 width=4)
-> Seq Scan on tab_test (cost=0.00..10.75 rows=1 width=4)
Filter: ((datetime >= ’2010-10-05 00:00:00′::timestamp without time zone) AND (datetime <= ’2010-10-20 00:00:00′::timestamp without time zone)) -> Index Scan using idx_tour_dt_201011 on tab_test201011 tab_test (cost=0.00..8.28 rows=1 width=4)
Index Cond: ((datetime >= ’2010-10-05 00:00:00′::timestamp without time zone) AND (datetime <= ’2010-10-20 00:00:00′::timestamp without time zone)) -> Index Scan using idx_tour_dt_201012 on tab_test201012 tab_test (cost=0.00..8.28 rows=1 width=4)
Index Cond: ((datetime >= ’2010-10-05 00:00:00′::timestamp without time zone) AND (datetime <= ’2010-10-20 00:00:00′::timestamp without time zone)) -> Index Scan using idx_tour_dt_201101 on tab_test201101 tab_test (cost=0.00..8.28 rows=1 width=4)
Index Cond: ((datetime >= ’2010-10-05 00:00:00′::timestamp without time zone) AND (datetime <= ’2010-10-20 00:00:00′::timestamp without time zone)) -> Index Scan using idx_tour_dt_201102 on tab_test201102 tab_test (cost=0.00..8.28 rows=1 width=4)
Index Cond: ((datetime >= ’2010-10-05 00:00:00′::timestamp without time zone) AND (datetime <= ’2010-10-20 00:00:00′::timestamp without time zone)) (12 rows)
pgtestdb=# set constraint_exclusion=on;
SET
pgtestdb=# explain select count(number) from tab_test where datetime between '2010-10-05 00:00:00' and '2010-10-20 00:00:00';
QUERY PLAN ————————————————————————————————————————————————————— Aggregate (cost=10.76..10.77 rows=1 width=4) -> Append (cost=0.00..10.75 rows=1 width=4)
-> Seq Scan on tab_test (cost=0.00..10.75 rows=1 width=4)
Filter: ((datetime >= ’2010-10-05 00:00:00′::timestamp without time zone) AND (datetime <= ’2010-10-20 00:00:00′::timestamp without time zone))
(4 rows)
That's it, with enabled constraint_exclusion just one partition will be scanned !
I tried the example in PostgreSQL version 8.3 and newer (up to latest 9.0.2), I don't know which older versions of Postgres supports partitioning....
Advertisement
Like this:
Be the first to like this post.
You also need an update trigger if a record has to move from one partition to another partion when the datetime changes.
Hi Frank,
thanks for your comment, of course you’re right
I’ll extend the post….
You can do some neat things with this partitioning. I recently setup a national sized database in Postgres 8.1, largest table was 350 million records. By using partitioning I was able to make things like index builds etc. work adequately.
The one thing I did not try was the triggers. I just loaded each sub table directly.
Thats a good idea and one I’ll experiment with.