|
||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||
Yet another question about why indexes are not being used. The table in question is as follows with 138459 rows. Column | Type | Modifiers ----------+--------------+----------- state | character(2) | not null county | text | not null township | character(5) | range | character(5) | section | integer | meridian | integer | boundary | polygon | Indexes: lgrng, lgsec, lgst, lgtwnThe queries are: select * from lg where state='NM'; select * from lg where section=14; Explain showed that both queries performed sequential scans. Several people explained two items that would affect this scenario. First, use text or varchar() for the columns. This is because constants like 'NM' are seen as text types which is a different type than the column character type. It was suggested that character types are treated differently in the optimizer. Then, from the explain's output, it was surmised that the data was not very well clustered. The rows were in fairly random physical order. When retrieving 10% of the data it is actually faster to do the sequential scan than it is to use the index so the optimizer choose the right path. These suggestions were tested by setting sequential scan on and then off for runs of explain analyze which showed that in fact the scan was faster. gds2=# explain analyze select boundary from lg where section=14; NOTICE: QUERY PLAN: Index Scan using lgsec on lg (cost=0.00..12167.45 rows=3836 width=32) (actual time=44.98..18325.87 rows=3759 loops=1) Total runtime: 18344.06 msec EXPLAIN gds2=# set enable_seqscan=on; SET VARIABLE gds2=# explain analyze select boundary from lg where section=14; NOTICE: QUERY PLAN: Seq Scan on lg (cost=0.00..5885.77 rows=3836 width=32) (actual time=0.24..2974.65 rows=3759 loops=1) Total runtime: 2987.61 msec Note that the first number following cost= or actual= shows the start up time needed to fetch the first row.
To see the queries currently running on your installation use the view: select * from pg_stat_activity; You must be postgres to actually see the queries and you must have set your postgresql.conf file to monitor the queries. The line to set query monitoring is: set stats_command_string=on;
The -o option to pg_ctl enables you to pass options to the postmaster directly. Unfortunately, pg_ctl does not actually "see" the options so it may give some misleading messages. In this example: pg_ctl -w -o "-p 12345" starta non-standard port number is passed to the postmaster. However, pg_ctl tries to affirm that the postmaster started by connecting to psql using the default parameters in its environment. It then returns the message that the postmaster did not start when it in fact did. In this case, setting the PGPORT environment variable will enable pg_ctl to do the right thing.
In Releases 6.5 up to but not including 7.2, in a select target list you could use the following expression and the result appear to be true or false. The '= NULL' had been substituted with IS NULL. SELECT col = NULL This behavior is wrong and has been changed as of 7.2. Older versions of PostgreSQL had allowed comparisons between NULLs for compatibility with older versions of Access. Access has been changed since then, too. Tom Lane said, "This should never have been made the default behavior IMHO, ... because it blatantly broke SQL compatibility, not to mention violating any usable concept of what NULL means. ... [Releases] 6.5, 7.0, and 7.1 lasted long enough to allow a lot of people to become confused :-( " When testing a NULL value you must always use IS NULL. SELECT col IS NULL from mytable where othercol IS NOT NULL; Assignment of variables to NULL in pgplsql can be done with the assignment operator ':=' which should never be confused with the boolean operator '='. It is a common misconception that NULL is a value which can be compared against another value. The fact is that NULL is not a value; it is the absence of a value. While values may be absent in two variables, the variables still cannot be compared for equality.
Returning multiple columns as well as multiple rows sometimes causes the two ideas to be confused. For multiple columns you must use composite types. For multiple rows, you would return a SETOF some-type-could-be-composite-or-not. As of 7.3, returning a SETOF some type is possible in SQL, plpgsql and C. (Ed. What about plpython??) In order to return multiple columns from a function, you must return a composite type. This is true whether you are returning one row or many rows. In this example, first a composite type named doubletext is defined and then only one row value is returned. CREATE TYPE doubletext(a text, b text); CREATE OR REPLACE FUNCTION test_multiple() RETURNS doubletext AS 'select ''a''::text, ''b''::text;' language 'sql'; select * from test_multiple(); If you want to return many rows of doubletext, return a SETOF doubletext. CREATE OR REPLACE FUNCTION test_multiple2() RETURNS SETOF doubletext AS 'select ''a''::text, ''b''::text union select ''c''::text, ''d''::text;' language 'sql'; select * from test_multiple2();
As of 7.3, an empty string is no longer converted to 0 on assignment to an integer. The change was made in order to conform more closely with SQL standards. In the past, insert into test (integercolumn) values ('');resulted in 0 being assigned to the column value. Now, this assignment will raise an error message. This behavior applied both to integer columns as well as integer arrays. It was noted that float4, float8, OID still accept '' as 0 and int8 and numeric have never accepted '' as 0. By the time you read this, float4, float8 and OID will probably be fixed so as not to accept '' as 0. Another side observation was the proper treatment of spaces in quoted numbers. Currently leading spaces are accepted, however, trailing spaces are not. The SQL99 rules state that both leading and trailing spaces should be removed. This will probably also be fixed by the time you read this :-) The primary discussion of this thread, however, discussed the fact that Request Tracker broke with this change. It was said that, "The author has gone so far as to notify all current users that Postgres 7.3 is not usable with RT, and to stick to 7.2, or to use MySQL." The general concensus seemed to be that this was very unfortunate, however, it was a bug with Request Tracker. Some people criticized the fact that this change was not widely advertised. It is generally accepted that behavior changes that affect many programs need to be implemented slowly so that their impact is minimized. Others pointed out that participation in the beta program was important. The beta program is the time to single out particular problems like this and to help the team choose which things to change and which to deprecate over a release. In the 7.3 beta program, this issue seems to had been raised once but was not deemed widespread enough to provide a deprecation path. Lively discussion ensued. Bruce Momjian created a patch which would allow the old behavior but would also raise a warning. This patch is now available.
|
||||||||||||||||||||||||||||||
Comments and Corrections are welcome. Suggestions and contributions of items are also welcome. Send them in! Copyright A. Elein Mustain 2003, 2004, 2005, 2006, 2007, 2008, 2009 |