Gkoenig's Blog

content is almost exclusive related to IT stuff (linux, database,…)

Tag Archives: PostgreSQL

pgTAP ~ Unit Tests for PostgreSQL ~ Part II ~

as promised in my previous post regarding pgTAP ( aka Part I ) here, I’ll continue this topic with checking some data in the database. Setup and basic handling of pgTAP was described in Part I also, for this reason I’ll concentrate on the test functions, which are used in the following script frame (just replace “##put_ypur_tests_here##” with the test scenarios described later on):

-- Format the output for nice TAP.
\pset format unaligned
\pset tuples_only true
\pset pager
-- Revert all changes on failure.
\set ON_ERROR_ROLLBACK 1
\set ON_ERROR_STOP true
\set QUIET 1
BEGIN;
\i ./pgtap.sql
-- create two tables with referential constraint
create table tab1 (id integer not null, a_text varchar(200), dt timestamp default now(), CONSTRAINT tab1_pkey PRIMARY KEY (id));
create table tab2 (id integer not null, a_text varchar(200), id_ref integer, CONSTRAINT id_ref FOREIGN KEY (id_ref) REFERENCES tab1 (id));
-- insert some test data
insert into tab1 (id,a_text) values (1,'test entry one');
insert into tab1 (id,a_text) values (2,'test entry two');
insert into tab1 (id,a_text) values (3,'test entry three');
insert into tab2 (id,a_text,id_ref) values (1,'ref test entry one',1);
insert into tab2 (id,a_text,id_ref) values (2,'ref test entry one',2);
insert into tab2 (id,a_text,id_ref) values (3,'ref test entry one',3);
SELECT plan(10); -- run a maximum of ten tests
##put_your_tests_here##
-- get the results
SELECT * FROM finish();
ROLLBACK;

What do we want to test ?

  • Are the inserted values really there ?
    The preferrable way to check data is via prepared statements compared to your expected result/array of results, if you just want to test plain data.
    A very convenient way is the function results_eq(), easy to show with an example: 

    -- check inserted data
    -- ...with two prepared statements. 2nd example will be more flexible with cursors
    PREPARE ids_fetched AS select id from tab1 where id in (1,2,3) order by id asc;
    PREPARE ids_expected AS VALUES (1),(2),(3);
    SELECT results_eq( 'ids_fetched', 'ids_expected', 'fetched the expected ids from tab1');
    PREPARE ids_fetched1 AS select id from tab1 where id in (1,2,3) order by id asc;
    PREPARE ids_fetched2 AS select id from tab2 where id in (1,2,3) order by id asc;
    SELECT results_eq( 'ids_fetched1', 'ids_fetched2');

    You don’t have to put your expected values in a prepared statement (like in the 1st example), it is also possible to pass this values as function parameters for set_eq():

    -- ...with a mixture of prepared statements and predefined values
    PREPARE text_fetched AS select a_text::text from tab1 where id in (1,2) order by id asc;
    SELECT set_eq( 'text_fetched', ARRAY[ 'test entry one','test entry two' ], 'fetched the expected text from tab1');

  • Does a function return the expected result ?
    In addition to test the availability of expected values is to check the return values of your own designed functions. Let’s assume that we created a function which returns all the ids from tab1 in ascending order, now let’s test if it is working as expected. In this case there are two alternatives performing the test recordset by recordset. On the one hand by simple function/statement comparison, on the other hand by using cursors. I’ll show you both:


    -- create a (simple) function
    create or replace function get_ids_tab1() returns setof integer as 'select id from tab1 order by id asc' language 'sql';
    -- check output of function
    -- ...with simple calls
    SELECT results_eq('select * from get_ids_tab1()', 'select id from tab1 order by id asc','get_ids_tab1() should return ids from relation tab1 in ascending order');
    -- ...with cursors
    DECLARE ids_function CURSOR FOR SELECT * FROM get_ids_tab1();
    DECLARE ids_relation CURSOR FOR SELECT id FROM tab1 order by id asc;
    SELECT results_eq('ids_function'::refcursor,'ids_relation'::refcursor,'Gotcha ! function returned the expected values.');

  • Raising errors intentionally and catching them

    -- simulate some errors
    PREPARE throw_error AS insert into tab1 (id,a_text) values (1,'this insert throws a duplicate key error');
    SELECT throws_ok('throw_error','23505',NULL,'Insert failed because of duplicate key violation (id)');

    But how do one know this error number postgres will throw ?
    throws_ok() will tell you 🙂 . Have a look at the next example. It will try to insert a row in tab2 with a reference to id “4” in tab1, which obviously doesn’t exist. Therefore we expect an error. Since we don’t know the error number, I put the value “-1” as error number and the function call will fail with a verbose message what was thrown, and what we did expect:


    PREPARE ref_fail AS insert into tab2 (id, a_text, id_ref) values (4,'ref test entry four',4);
    SELECT throws_ok('ref_fail','-1',NULL,'Insert failed because of (ref_id)');

    Running the test script will produce the following output (in my case this was the 6th test which had been executed):


    # Failed test 6: "Insert failed because of (ref_id)"
    # caught: 23503: insert or update on table "tab2" violates foreign key constraint "id_ref"
    # wanted: -1

    Aaaahhhh….the violated foreign key constraint has the errorcode 23503, so we can replace the “-1” by “23503” and the test will succeed.

These are just some of the available test functions pgTAP provides. A complete overview is here and it is worth looking.

Hopefully you got a first, short overview what pgTAP is for, please let me know if you miss something or there’s something wrong.

pgTAP ~ Unit Tests for PostgreSQL

I’ll show you a short introduction to pgTAP, a testing framework for postgres, including basic theory and short examples. I’ll just show the “simple test script” way, not the xUnit way of testing. My test db is postgres version 9.0.2

The framework around your tests is built on putting your tests in one transaction and rollback it at the end. Additionally statements for starting and showing the results are mandatory:
BEGIN;
SELECT plan();
…your tests are here…
SELECT * FROM finish();
ROLLBACK;

download source package from http://pgtap.org download section, or go directly to http://pgfoundry.org/frs/?group_id=1000389
install it like the common linux way, e.g. the .gz (or just run the pgtap.sql from the extracted sourcedir/sql in your target database):
tar -xzf pgtap-.gz
cd pgtap-
make
make install
make installcheck

This will work if the binary pg_control is in your PATH and you are a db superuser. Most likely you’ll need additional settings like:
env PG_CONFIG=/path/to/pg_config
make installcheck PGUSER=postgres

Detailed instructions are here: http://pgtap.org/documentation.html#Installation, probably you need additional perl modules..

Now let’s test some stuff….
If you need the pgTAP-functions in a different database, just run the sql-script called pgtap.sql (you’ll find it in the subdirectory sql under the sourcedir pgtap-) or include the call to pgtap.sql in your script with \i /pgtap.sql

HelloWorld:
-- Format the output for nice TAP.
\pset format unaligned
\pset tuples_only true
\pset pager
-- Revert all changes on failure.
\set ON_ERROR_ROLLBACK 1
\set ON_ERROR_STOP true
\set QUIET 1
BEGIN;
SELECT plan(1);
SELECT pass( 'Hello World !' );
SELECT * FROM finish();
ROLLBACK;

save this as helloworld.txt and call it: psql -U postgres -f helloworld.txt

The snipped output will look like:
1..1
ok 1 - Hello World !

What does this mean ?
1..1 => you ran tests from #1 to #1 (exactly one 😉 )
ok 1 – Hello World ! => your test returned “ok”, it was test number “1” and the string “Hello World!” is the description (==output) of the pass() function.

Yes, this test doesn’t make any sense at all, but it shows the basic format.
Now you’re free to check any detail of your database structure. Let’s start checking some tables and columns. Create a table first with
CREATE TABLE tab_test
(
datetime timestamp without time zone NOT NULL DEFAULT now(),
"number" integer NOT NULL,
price real NOT NULL DEFAULT 0.000000000000000,
currency_id character varying(3) DEFAULT 'eur'::character varying
);
CREATE INDEX idx_number ON tab_test USING btree (number);
CREATE INDEX idx_price ON tab_test USING btree (price);

tables_and_columns.txt:
-- Format the output for nice TAP.
\pset format unaligned
\pset tuples_only true
\pset pager
-- Revert all changes on failure.
\set ON_ERROR_ROLLBACK 1
\set ON_ERROR_STOP true
\set QUIET 1
BEGIN;
SELECT plan(3);
SELECT has_table( 'tab_test', 'table tab_test exists' );
SELECT has_column( 'tab_test', 'number' );
SELECT col_type_is( 'tab_test', 'number', 'integer' );
SELECT * FROM finish();
ROLLBACK;

Run the tests: psql -U postgres -f tables_and_columns.txt
The output will be:
1..3
ok 1 - table tab_test exists
ok 2 - Column tab_test.number should exist
ok 3 - Column tab_test.number should be type integer

The call of has_table got the additional parameter “description”, which will be printed out instead of the default message you see for tests no. 2 and 3. It’s possible to put a schemaname as the first parameter to this function call, if your table is not in the default search schema.
To receive a “not ok” result, just modify e.g. the call of has_column to SELECT has_column(‘tab_test’,’id’);
Now the output tells you that something failed:
1..3
ok 1 - table tab_test exists
not ok 2 - Column tab_test.id should exist
# Failed test 2: "Column tab_test.id should exist"
ok 3 - Column tab_test.number should be type integer
# Looks like you failed 1 test of 3

extend tables_and_columns.txt by:
...
SELECT plan(5);
...
SELECT has_index('tab_test','idx_number', 'Index idx_number on tab_test');
SELECT has_index('tab_test','idx_price','currency_id', 'Index idx_price on tab_test');
...

Now you get a helpful hint in the output, which tells you that the table has an index called idx_price, but it is on a different column:
1..5
ok 1 - table tab_test exists
not ok 2 - Column tab_test.id should exist
# Failed test 2: "Column tab_test.id should exist"
ok 3 - Column tab_test.number should be type integer
ok 4 - Index idx_number on tab_test
not ok 5 - Index idx_price on tab_test
# Failed test 5: "Index idx_price on tab_test"
# have: idx_price ON tab_test(price)
# want: idx_price ON tab_test(currency_id)
# Looks like you failed 2 tests of 5

There are pgTAP functions for almost everything in your postgres db, like Triggers, Functions, Schemas, Tablespaces, …. just check the documentation.
It is also possible to create relationships of, or better conditional, tests. This means calling a testfunction only if certain conditions are met.
Again an example will illustrate this best:
The following code fragment will check the table tab_test in schema1 if there exists a primary key on the table. If yes the next test will check if the primary key contains the two columns an_id and another_id. If there’s no primary key found the function fail() will be called. You can call several test functions within collect_tap(…) to build something like a subfunction
...
SELECT CASE WHEN has_pk('schema1','tab_test',NULL) ~* '^ok'
THEN
collect_tap(
col_is_pk ('schema1','tab_test', ARRAY['an_id','another_id'],'table "schema1.tab_test" should have a combined primary key over two columns'),
pass('passed')
)
ELSE
fail('did not find any primary key in table "schema1.tab_test".')
END;
...

Now you should have a basic understanding of how to create a test scenario for testing a db structure, there are a lot more functions available than described in this post, so feel free to play with it.
In my next post I’ll show how to check data….

any hints are highly welcome

pgfouine => dive into postgres log

No matter if you are new to postgres or using this excellent database since days/months/years, I’m pretty sure everyone came to a point where something unexpected happened.
The logfile of postgres is very helpful and contains a lot of information and hints down to a highly detailed level.
If you are in such a situation, give pgfouine a try. It is a very nice log analyzer for postgres and it parses the log of a vaccum run also (details at the bottom of this post after enabling logs in general).
Which information you’ll receive in the logfile is configured in the all-in-one-configfile postgresql.conf in your $PGDATA directory (the directory parameter for the command initdb after installing postgres).
The section ERROR REPORTING AND LOGGING contains all the parameters for configuring the log output.
I prefer these settings in testing/development environment to create a very detailed output (on a productive system the log_min_XY parameters should be adjusted to e.g. error) and get one logfile per day in the directory $PGDATA/pg_log. The parameter log_line_prefix is adapted to the needs of pgfouine, THE log file analyzer for postgres logs.

base settings:

log_destination = 'stderr'
log_directory = 'pg_log'
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
log_rotation_age = 1d
log_min_messages = info
log_min_error_statement = notice
log_min_duration_statement = 100
log_line_prefix = '%t [%p]: [%l-1] '
log_error_verbosity = verbose

additional settings:
log_checkpoints = on
log_duration = on
log_statement = 'all'

As mentioned earlier these settings create alot, alot of log output needed for development, analyzing or problem handling, but not for productive environment.
If you have connections from many different hosts, you should enable the log_hostname parameter. I don’t use this, since there are only some application servers connecting to the database via pgpool “middleware”.
If you’ve defined the parameters and produced a lot of output, analyzing the log will be the next step. There are various reasons for checking the logfile -> lack of performance -> long running statements -> errors -> check the history of client connections -> ….

If you’re looking for a certain error or a certain log entry just open the log file and search for it 😉
If you want to create a nice, detailed statistics page e.g. for analyzing sql statements pgfouine (here) will do the job for you. This tool, written in php, parses the logfile and creates e.g. a html page containing overall statistic, queries by type, several tables ordered by duration, count, runtime.
It’s as easy as:
pgfouine.php -onlyselect -logtype stderr -file ""path_to_postgresql_logfile" > 2009-02-10_select_only.html
or if you want to have all statements included, just remove the option “-onlyselect”

./pgfouine_vacuum.php -file "path_to_log_from_vaccum" > 2009-02-10_vacuum.html