Recently one of our clients wanted to aggregate text data in PostgreSQL using Mondrian. Without going too deep into Mondrian details, this requires an aggregate function that’s called sum() and accepts a single text argument. While stock PostgreSQL fulfills the first criteria, the function is defined for many data types but not text.
Creating some test data with generate_series:
create table agg_test as ( with s as ( select generate_series(1, 5) as intcol ) select intcol, 'key' || intcol as textcol from s );
sumagg=# select * from agg_test ; intcol | textcol --------+--------- 1 | key1 2 | key2 3 | key3 4 | key4 5 | key5 (5 rows)
Intended behaviour on an int colum:
sumagg=# select sum(intcol) from agg_test; sum ----- 15
But text has no concept of summation:
sumagg=# select sum(textcol) from agg_test; ERROR: function sum(text) does not exist LINE 1: select sum(textcol) from agg_test; ^ HINT: No function matches the given name and argument types. You might need to add explicit type casts.
Luckily in PostgreSQL it’s very easy to define custom aggregations. To make a sum() of some strings which is the same strings glued together with commas, we only need a helper functions that takes the both the already aggreagated strings (argument a) and the next string to append (argument b). We use concat_ws since it while handle NULLs the way we want:
CREATE OR REPLACE FUNCTION textsum_helper(a text, b text) RETURNS text AS $$ BEGIN RETURN CASE WHEN a = '' THEN b ELSE concat_ws(', ', a, b) END; END; $$ LANGUAGE plpgsql;
Using this function, the aggregate can now simply be easily defined:
CREATE AGGREGATE sum (text) ( sfunc = textsum_helper, stype = text, initcond = '' );
This finally allows us to sum up text:
sumagg=# select sum(textcol) from agg_test; sum ------------------------------ key1, key2, key3, key4, key5 (1 row)