PostgreSQL Training,
Consulting & Support
General Bits
By A. Elein Mustain

12-Jan-2004 Issue: 57

Archives | General Tidbits | Google General Bits | Docs | Castellano | PortuguÍs | Subscriptions | Notifications | | Prev

General Bits is a column loosely based on the PostgreSQL mailing list pgsql-general.
To find out more about the pgsql-general list and PostgreSQL, see

Survey Time
PostgreSQL Survey 05-Jan-2004

The survey from Issue #56 is still open. I encourage you to answer the survey so we can find ways to better support the postgresql community.

Editor: elein at
Regular Expressions in Queries
[GENERAL] Select Non Alpha Contents of a Column 29-Dec-2003

You can use POSIX regular expression pattern matching in PostgreSQL queries. Regular expressions give you a much wider range of pattern matching than either LIKE or SIMILAR TO.

Regular expressions are the familiar ones you use with grep and other unix tools. There is a nice explanation of using regular expressions in the 7.4 PostgreSQL manual in 9.6.3. POSIX Regular Expressions. And a quick web surf can give you more explanations and tutorials.

To match to regular expressions in a PostgreSQL query use the appropriate operator:

~Matches regular expression, case sensitive
~*Matches regular expression, case insensitive
!~Does not match regular expression, case sensitive
!~*Does not match regular expression, case insensitive

This is very handy when you would want to do something like selecting the rows with a column beginning with non-alphabetic characters. These queries give identical results. The first does a case insensitive search on values which do not start with characters a through z. The second does it in a more traditional case sensitive way.

	SELECT * FROM table_name WHERE col !~* '^[a-z]'   
	SELECT * FROM table_name WHERE col !~ '^[a-zA-Z]';

Contributors: Hunter Hillegas lists at, Jeff Eckermann jeff_eckermann at, Martin Marques martin at
Converting a comma separated list into rows
[GENERAL] Passing a comma delimited list to a function 03-Jan-2004

In 7.4 it is easy to turn a list into a set of rows. There are two parts to this problem. The first is handling the comma separated list and the second is writing a function which returns a set.

Two suggests were made for splitting the text apart: using the split_part(string text, delimiter text, field integer) which extracts a field from a delimited string and using the the string_to_array(string text, delimiter text) function. For plpgsql functions, I prefer using the array functions, however, if I were writing this in plperl, I would use the native perl functions. This example is about plpgsql.

In this function, the comma delimited string is passed to the function unravel(). The delimiter is hard coded as a comma. A holding array is declared. Then the string is converted to an array. The FOR loop uses the array_upper() and array_lower() functions to determine the index into the array. The second parameter on these functions is the dimension of the array--in this case, it is a one dimensional array.

	create or replace function unravel(text) returns SETOF int as '
	 v_list alias for $1;
	 v_delim text := '','';
	 v_arr text[];
	 v_arr := string_to_array(v_list, v_delim);
	 for i in array_lower(v_arr, 1)..array_upper(v_arr, 1) loop
	   return next v_arr[i]::int;
	 end loop;
	' language plpgsql;
	regression=# select * from unravel('1,2,3,4,5');
	(5 rows)

Contributors: A E cooljoint at, Joe Conway mail at, Ezra Epstein news-reader at
Auto-Update of Timestamp
[GENERAL] simple auto-updating timestamp ? 28-Dec-2003

MySQL has a feature where the first column of a table of type timestamp is updated whenever the row is inserted or updated. You have to use datetime fields if you want the column really update.

This auto-updating of timestamps is helpful for cases where you want the system to track the last modified timestamp of a row for, say, auditing purposes.

In order to do this in the smallest, most regular way, I add two columns called modified_timestamp to any table that I want to track the updates. This allows me to write a one function for all of them.

	   NEW.modified_timestamp = now();
	' LANGUAGE 'plpgsql';

I usually have two variations on this function that operate on two other standard columns: modified_user and create_timestamp. These track the user who changed the row and when the row was created. To track the modified user, add NEW.user = current_user to the function. To set the create_timestamp you'll need another function or a condition in the function which enables it to be set depending on the type of trigger.

Having created the function, for each table and column that I want updated with the modification time, I add the triggers:


Be careful with these columns because if you try to change the modification_timestamp, it will be overwritten with the system time of now(). Also remember, now is being deprecated in favor of now().

Contributors: Andreas maps.on at, D. Dante Lorenso dante at, elein at
Massive Inserts
[GENERAL] MVCC for massively parallel inserts 05-Jan-2004

The question was raised about PostgreSQL's performance with massive simultaneous inserts into a heavily indexed table. It was conjectured that the CPU could not handle this case.

In fact it is unlikely to be CPU bound, instead it is much more likely to be I/O bound. The money to be spent on multiple CPUs would probably be much better spent on good disks. The discussion from there veered over to which disk configurations might be best under the circumstances.

Tom Lane suggested that, "If your insert load exceeds the available disk bandwidth, I think you have little choice but to figure out a way to divide the workload among multiple independent databases."

Alex Satrapa writes, "Properly set up in a RAID-1/0 array, you'll get much better "bandwidth" out of those drives. Whether you RAID in software or hardware is up to you and your budget - but if you choose hardware, make sure you actually get a hardware RAID controller, not one of the cheapies which are just multiple-IDE-controllers-with-special-drivers."

Greg Stark wrote: I would agree and if you really need the I/O bandwidth you can go to much larger stripe sets than even this. The documentation I've seen before suggested there were benefits up to stripe sets as large as twelve disks across. That would be 24 drives if you're also doing mirroring. Ideally separating WAL, index, and heap files is good, but you would have to experiment to see which works out fastest for a given number of drives."

It was also noted that indexes are not built at the end of the transaction, but rather as the inserts proceed. This is one of the main features of PostgreSQL's MVCC.

Vivek Khera pointed out that he did a bunch of testing with a 14 disk SCSI array and found that RAID5 was best over RAID10 and RAID50. He added,"I found that putting the WAL on its own array (in my case a mirror on the other RAID controller channel) helped quite a bit. I don't think it is easy to split off index files to alternate locations with Postgres."

"Increasing the number of checkpoint segments was one of the biggest improvements I observed for mass-insert performance (as tested while doing a restore on a multi-million row database.)"

"The combination of having the WAL on a separate disk, and letting that grow to be quite large has been very good for my performance and also for reducing disk bandwidth requirements."

Contributors: Steven D.Arnold stevena at, Tom Lane tgl at, Joshua D. Drake jd at, Alex Satrapa alex at, Greg Stark gsstark at, Vivek Khera khera at

Comments and Corrections are welcome. Suggestions and contributions of items are also welcome. Send them in!
Copyright A. Elein Mustain 2003, 2004, 2005, 2006, 2007, 2008, 2009

Search General Bits & Search WWW