Varlena, LLC | PostgreSQL General Bits Newsletter

PostgreSQL Training,
Consulting & Support

By A. Elein Mustain

09-Feb-2004 Issue: 61

Archives |

General Tidbits |

Google General Bits |

Prev |

General Bits is a column loosely based on the PostgreSQL mailing list pgsql-general.
To find out more about the pgsql-general list and PostgreSQL, see www.PostgreSQL.org.

R-tree Indexes
Overview	07-Feb-2004

Region Trees, commonly known as R-Trees, are indexing structures for spatial data. Built in spatial data types in PostgreSQL are the geometric types such as Point, Line, Polygon and so forth.

R-Trees, like B-Trees, are dynamic and height balanced. But while the logical operators for B-Trees are usually greater than, less than and equals, the logical operators for R-Trees are within, contains, overlaps and equal. The nodes of an R-Tree contain bounding boxes of shape.

Given a set of shapes like these. In a plane (a), there are two circles (c) and (d). The left circle (c) contains a triangle (e) and overlaps the circle (d) and another triangle (f). The circle (d) overlaps the circle (c) and contains the triangle (f).

A bounding box is created for each shape or group of shapes. From the inside out, the triangles are bounded by (E) and (F). The circles are bounded by (C) and (D). (B) bounds the left circle with its overlapping triangle. (A) bounds (B) along with the right circle (D).

Using the bounding boxes as nodes, the R-Tree is constructed like this picture. (A) contains (B) and (D). (B) contains (C) and (F). (C) contains (E). (D) contains (F).

Using this structure enables fast access to spatial queries using the spatial logical functions.

We can store these shapes in the database as circles and polygons. Here we are creating two tables for circles and polygons, populating them with these shapes and creating R-Tree indexes on the shape values.

	-- drop table circles ;
	create table circles (
	   cirid char(1),
	   circ  Circle
	);
	create index circletree on circles USING RTREE (circ);
	insert into circles values ( 'c', '((2.5,4),2)');
	insert into circles values ( 'd', '((5.5,4.5),2)');
	
	-- drop table polys ;
	create table polys (
	   polyid   char(1),
	   poly     Polygon
	);
	create index polytree on polys USING RTREE (poly);
	insert into polys values ( 'a', '(0,0),(8,8)'::Box);
	insert into polys values ( 'e', '(2,3),(3,3),(2,5)');
	insert into polys values ( 'f', '(4,4),(5,3.5),(5,4.5)');

The R-Tree indexes will be used (if there were enough rows to make it faster than a sequential scan) if the R-Tree boolean operators are used in a query. The operators are these that follow. You will notice several combinations are missing. For our example of polygons (triangles and a square) and circles, in order to use some of these operators we will have to cast appropriately. (There is no reason that all permutations of the operators do not exist except that all permutations can be created by appropriate casting. (An exercise for the reader is to create R-Tree operands for the missing permutations :-))

Boolean Geometric Operators

Operator	Description	Example
&&	Overlaps?	box '((0,0),(1,1))' && box '((0,0),(2,2))'
&<	Overlaps or is left of?	box '((0,0),(1,1))' &< box '((0,0),(2,2))'
&>	Overlaps or is right of?	box '((0,0),(3,3))' &> box '((0,0),(2,2))'
<<	Is left of?	circle '((0,0),1)' << circle '((5,0),1)'
>>	Is right of?	circle '((5,0),1)' >> circle '((0,0),1)'
<^	Is below?	circle '((0,0),1)' <^ circle '((0,5),1)'
>^	Is above?	circle '((0,5),1)' >^ circle '((0,0),1)'
`?#`	Intersects?	lseg '((-1,0),(1,0))' ?# box '((-2,-2),(2,2))'
?-	Is horizontal?	?- lseg '((-1,0),(1,0))'
?-	Are horizontally aligned?	point '(1,0)' ?- point '(0,0)'
?\|	Is vertical?	?\| lseg '((-1,0),(1,0))'
?\|	Are vertically aligned?	point '(0,1)' ?\| point '(0,0)'
?-\|	Is perpendicular?	lseg '((0,0),(0,1))' ?-\| lseg '((0,0),(1,0))'
?\|\|	Are parallel?	lseg '((-1,0),(1,0))' ?\|\| lseg '((-1,2),(1,2))'
~	Contains?	circle '((0,0),2)' ~ point '(1,1)'
@	Contained in or on?	point '(1,1)' @ circle '((0,0),2)'
~=	Same as?	polygon '((0,0),(1,1))' ~= polygon '((1,1),(0,0))'

EXPLAIN on one query shows that the R-Tree index is being used. This EXPLAIN was run after the table creation but prior to vacuuming and so does not optimize for the number of rows in the table. An EXPLAIN after vacuuming on this particular data will show sequential scans because there are only a couple of rows involved.

	elein=# explain
	elein-# select polyid, 'contains', cirid from circles, polys
	elein-# where poly ~ circ::Polygon;
	                                 QUERY PLAN                                  
	-----------------------------------------------------------------------------
	 Nested Loop  (cost=0.00..4860.52 rows=1001 width=10)
	   ->  Seq Scan on circles  (cost=0.00..20.00 rows=1000 width=29)
	   ->  Index Scan using polytree on polys  (cost=0.00..4.83 rows=1 width=37)
	         Index Cond: (polys.poly ~ polygon(12, "outer".circ))
	(4 rows)

A couple of queries against the data confirm the written description of the relationships between the shapes as they were initially described.

	elein=# select polyid, 'contains', cirid from circles, polys
	elein-# where poly ~ circ::Polygon;
	 polyid | ?column? | cirid 
	--------+----------+-------
	 a      | contains | c
	 a      | contains | d
	(2 rows)
	
	elein=# select cirid, 'contains', polyid from circles, polys
	elein-# where circ::Polygon ~ poly;
	 cirid | ?column? | polyid 
	-------+----------+--------
	 c     | contains | e
	 d     | contains | f
	(2 rows)
	
	elein=# select polyid, 'intersects', cirid from circles, polys
	elein-# where poly::Box ?# circ::Box;
	 polyid |  ?column?  | cirid 
	--------+------------+-------
	 e      | intersects | c
	 f      | intersects | d
	 a      | intersects | c
	 a      | intersects | d
	(4 rows)
	
	elein=# select cirid, 'intersects', polyid from circles, polys
	elein-# where circ::Box ?# poly::Box;
	 cirid |  ?column?  | polyid 
	-------+------------+--------
	 c     | intersects | e
	 d     | intersects | f
	 c     | intersects | a
	 d     | intersects | a
	(4 rows)

Real world spatial examples are very common. PostGIS with mapping is a very active area. Bus routes and GPS data make up a real world application. Insider knowledge knows that Middle Earth was mapped with Illustra's geometric data types (based on Postgres). Diagrams of all types are classic examples. Time intervals can be constructed to use R-Tree indexes so that one could quickly identify schedule overlaps and gaps.

References:
Guttman, A., "R-Trees: A Dynamic Index Structure for Spatial Searching", Proceedings of the SIGMOD Conference, Boston, June 1984, p. 47-57.
Paul Brown, "Object-Relational Database Development: A Plumber's Guide", Informix Press
PostgreSQL 7.4 Documentation, 9.9 Geometric Operators
Contributors: elein at varlena.com

People on pg mailing lists
[GENERAL] size of mailing lists? 05-Feb-2004

The PostgreSQL mailing lists have always been helpful and high quality. Here are a few testimonials:

To tell you the truth, I've always got good responses from this list. Apparently most other question posts do too. There seems to be always someone knowledgeable to answer questions, which is very very nice :-)

Better than most paid-for support, cheaper, friendlier... The PG community is probably prettier too and makes a better cup of tea ;-)
[Or coffee, if you want to come over one day or another :-)]

According to Marc Fournier, there are 11918 unique addresses subscribed across these lists and spread out like this:

pgadmin-devteam 5

pgadmin-hackers 114

pgadmin-support 303

pgsql-admin 1470

pgsql-advocacy 245

pgsql-announce 2707

pgsql-benchmarks 295

pgsql-bugs 710

pgsql-chat 149

pgsql-committers 158

pgsql-cygwin 373

pgsql-de-allgemein 37

pgsql-docs 449

pgsql-fr-generale 30

pgsql-general 1927

pgsql-hackers 1093

pgsql-hackers-win32 102

pgsql-interfaces 849

pgsql-jdbc 913

pgsql-jobs 34

pgsql-novice 1032

pgsql-odbc 647

pgsql-patches 332

pgsql-performance 749

pgsql-php 825

pgsql-ports 271

pgsql-sql 1361

pgsql-techdocs 130

pgsql-tr-genel 44

pgsql-www 33

pgsql-www-advocacy 6

pgsql-www-jobs 11

pgsql-www-main 7

press 6

sfpug 42

translators 21

webmaster 6

Contributors: David Garamond lists at zara.6.isreserved.com, Richard Huxton dev at archonet.com, Karsten Hilbert Karsten.Hilbert at gmx.net, Marc G. Fournier scrappy at postgresql.org, Tom Lane tgl at sss.pgh.pa.us

Timing out Queries
[GENERAL] How to kick out automatically stuck in queries 02-Feb-2004

In 7.4 you can set a client side parameter to automatically timeout a query after a specified amount of time. This feature can be used to limit rogue queries or resource hogs.

The amount of time specified is in milliseconds so do your math right. The default value of statement_timeout is 0, meaning that there is no timeout set.

	=# set statement_timeout = 1;
	SET
	=# show statement_timeout;
	 statement_timeout 
	-------------------
	 1
	(1 row)

If a query runs into the timeout limit, it will return an error saying the query was cancelled. (Hint: don't set the timeout to 1 millisecond.)

	=# select count(*) from transaction_log 
	-# UNION select count(*) from transaction_log 
	-# UNION select count(*) from transaction_log;
	ERROR:  canceling query due to user request

Contributors: Együd Csaba csegyud at vnet.hu, Kris Jurka books at ejurka.com

Splitting values
[SQL] How to retrieve N lines of a text field. 29-Jan-2004

A large text field with embedded newlines could be interpreted as a delimited list where the delimiter is the newline. It is hard to write the function fetch to fetch N lines of text one using SQL99 regular expressions,

This almost works except for the final line.

	select substring(
	   test from '#"' || repeat('%\n', $1) || '#"%' for '#') 
	from multiline_test;

However, using the POSIX-style regex this could be possible using "[^\n]*\n" to match exactly one line. The O'Reilly book "Mastering Regular Expressions" is highly recommended for this kind of expression wrangling.

A completely different approach using a function which returned SETOF was also proposed. This is a nice use of a setof function which returns the first/last/top N of anything. This function also shows how to have an iterative counter within a set returning function. Remember that control returns to the body of the function for each return value and that the variables used in the function maintain their values between calls.

	CREATE OR REPLACE FUNCTION first_n_lines(text, int)
	RETURNS setof text AS '
	DECLARE
	  i int := 0;
	  oneline text;
	BEGIN
	  LOOP
	    i := i + 1;
	    IF i > $2 THEN
	      EXIT;
	    END IF;
	    SELECT INTO oneline split_part($1, ''\n'', i);
	    IF oneline = '''' THEN
	      EXIT;
	    END IF;
	    RETURN NEXT oneline;
	  END LOOP;
	  RETURN;
	END
	' LANGUAGE 'plpgsql';
	
	regression=# select * from first_n_lines('abc\ndef\nghi', 2);
	 first_n_lines
	---------------
	 abc
	 def
	(2 rows)

Contributors: Chris Travers chris at travelamericas.com, Tom Lane tgl at sss.pgh.pa.us, Joe Conway mail at joeconway.com

Comments and Corrections are welcome. Suggestions and contributions of items are also welcome. Send them in!

Top