Region Trees, commonly known as R-Trees, are indexing structures for spatial data. Built in spatial data types in PostgreSQL are the geometric types such as Point, Line, Polygon and so forth.
R-Trees, like B-Trees, are dynamic and height balanced. But while the logical operators for B-Trees are usually greater than, less than and equals, the logical operators for R-Trees are within, contains, overlaps and equal. The nodes of an R-Tree contain bounding boxes of shape.
Given a set of shapes like these. In a plane (a), there are two circles (c) and (d). The left circle (c) contains a triangle (e) and overlaps the circle (d) and another triangle (f). The circle (d) overlaps the circle (c) and contains the triangle (f).
A bounding box is created for each shape or group of shapes. From the inside out, the triangles are bounded by (E) and (F). The circles are bounded by (C) and (D). (B) bounds the left circle with its overlapping triangle. (A) bounds (B) along with the right circle (D).
Using the bounding boxes as nodes, the R-Tree is constructed like this picture. (A) contains (B) and (D). (B) contains (C) and (F). (C) contains (E). (D) contains (F).
Using this structure enables fast access to spatial queries using the spatial logical functions.
We can store these shapes in the database as circles and polygons. Here we are creating two tables for circles and polygons, populating them with these shapes and creating R-Tree indexes on the shape values.
-- drop table circles ; create table circles ( cirid char(1), circ Circle ); create index circletree on circles USING RTREE (circ); insert into circles values ( 'c', '((2.5,4),2)'); insert into circles values ( 'd', '((5.5,4.5),2)'); -- drop table polys ; create table polys ( polyid char(1), poly Polygon ); create index polytree on polys USING RTREE (poly); insert into polys values ( 'a', '(0,0),(8,8)'::Box); insert into polys values ( 'e', '(2,3),(3,3),(2,5)'); insert into polys values ( 'f', '(4,4),(5,3.5),(5,4.5)');
The R-Tree indexes will be used (if there were enough rows to make it faster than a sequential scan) if the R-Tree boolean operators are used in a query. The operators are these that follow. You will notice several combinations are missing. For our example of polygons (triangles and a square) and circles, in order to use some of these operators we will have to cast appropriately. (There is no reason that all permutations of the operators do not exist except that all permutations can be created by appropriate casting. (An exercise for the reader is to create R-Tree operands for the missing permutations :-))
Boolean Geometric Operators
EXPLAIN on one query shows that the R-Tree index is being used. This EXPLAIN was run after the table creation but prior to vacuuming and so does not optimize for the number of rows in the table. An EXPLAIN after vacuuming on this particular data will show sequential scans because there are only a couple of rows involved.
elein=# explain elein-# select polyid, 'contains', cirid from circles, polys elein-# where poly ~ circ::Polygon; QUERY PLAN ----------------------------------------------------------------------------- Nested Loop (cost=0.00..4860.52 rows=1001 width=10) -> Seq Scan on circles (cost=0.00..20.00 rows=1000 width=29) -> Index Scan using polytree on polys (cost=0.00..4.83 rows=1 width=37) Index Cond: (polys.poly ~ polygon(12, "outer".circ)) (4 rows)
A couple of queries against the data confirm the written description of the relationships between the shapes as they were initially described.
elein=# select polyid, 'contains', cirid from circles, polys elein-# where poly ~ circ::Polygon; polyid | ?column? | cirid --------+----------+------- a | contains | c a | contains | d (2 rows) elein=# select cirid, 'contains', polyid from circles, polys elein-# where circ::Polygon ~ poly; cirid | ?column? | polyid -------+----------+-------- c | contains | e d | contains | f (2 rows) elein=# select polyid, 'intersects', cirid from circles, polys elein-# where poly::Box ?# circ::Box; polyid | ?column? | cirid --------+------------+------- e | intersects | c f | intersects | d a | intersects | c a | intersects | d (4 rows) elein=# select cirid, 'intersects', polyid from circles, polys elein-# where circ::Box ?# poly::Box; cirid | ?column? | polyid -------+------------+-------- c | intersects | e d | intersects | f c | intersects | a d | intersects | a (4 rows)
Real world spatial examples are very common. PostGIS with mapping is a very active area. Bus routes and GPS data make up a real world application. Insider knowledge knows that Middle Earth was mapped with Illustra's geometric data types (based on Postgres). Diagrams of all types are classic examples. Time intervals can be constructed to use R-Tree indexes so that one could quickly identify schedule overlaps and gaps.
The PostgreSQL mailing lists have always been helpful and high quality. Here are a few testimonials:
To tell you the truth, I've always got good responses from this list. Apparently most other question posts do too. There seems to be always someone knowledgeable to answer questions, which is very very nice :-)
Better than most paid-for support, cheaper, friendlier... The PG community is probably prettier too and makes a better cup of tea ;-)
According to Marc Fournier, there are 11918 unique addresses subscribed across these lists and spread out like this:
In 7.4 you can set a client side parameter to automatically timeout a query after a specified amount of time. This feature can be used to limit rogue queries or resource hogs.
The amount of time specified is in milliseconds so do your math right. The default value of statement_timeout is 0, meaning that there is no timeout set.
=# set statement_timeout = 1; SET =# show statement_timeout; statement_timeout ------------------- 1 (1 row)
If a query runs into the timeout limit, it will return an error saying the query was cancelled. (Hint: don't set the timeout to 1 millisecond.)
=# select count(*) from transaction_log -# UNION select count(*) from transaction_log -# UNION select count(*) from transaction_log; ERROR: canceling query due to user request
A large text field with embedded newlines could be interpreted as a delimited list where the delimiter is the newline. It is hard to write the function fetch to fetch N lines of text one using SQL99 regular expressions,
This almost works except for the final line.
select substring( test from '#"' || repeat('%\n', $1) || '#"%' for '#') from multiline_test;However, using the POSIX-style regex this could be possible using "[^\n]*\n" to match exactly one line. The O'Reilly book "Mastering Regular Expressions" is highly recommended for this kind of expression wrangling.
A completely different approach using a function which returned SETOF was also proposed. This is a nice use of a setof function which returns the first/last/top N of anything. This function also shows how to have an iterative counter within a set returning function. Remember that control returns to the body of the function for each return value and that the variables used in the function maintain their values between calls.
CREATE OR REPLACE FUNCTION first_n_lines(text, int) RETURNS setof text AS ' DECLARE i int := 0; oneline text; BEGIN LOOP i := i + 1; IF i > $2 THEN EXIT; END IF; SELECT INTO oneline split_part($1, ''\n'', i); IF oneline = '''' THEN EXIT; END IF; RETURN NEXT oneline; END LOOP; RETURN; END ' LANGUAGE 'plpgsql'; regression=# select * from first_n_lines('abc\ndef\nghi', 2); first_n_lines --------------- abc def (2 rows)
Comments and Corrections are welcome. Suggestions and contributions of items are also welcome. Send them in!
Copyright A. Elein Mustain 2003, 2004, 2005, 2006, 2007, 2008, 2009