|
|||||||||||||||
|
|||||||||||||||
|
|||||||||||||||
|
|||||||||||||||
|
|||||||||||||||
The debug_print options in postgresql.conf file can be set so that you can see many details in the processing of a query. However, even if you set these variables, it is required that server_min_messages also be set to DEBUG1 in order for the debug_print messages to get to the log file at all. These are the debug_print messages available. Also note that to get the "pretty" version of the output, set debug_pretty_print.
The output you get with these debug flags can be very interesting or very obtuse. I ran the following query on my 8.0.1 data base. It is followed here by its explain analyze output. webstat=# select grandtot, to_char(current_date,'DD-Mon-YY') as when, grandtot/(date((current_date-1)) - date((current_date-7)) +1) as perrange, (date((current_date-1)) - date((current_date-7)) +1) as numdays, to_char( date((current_date-7)),'DD-Mon-YY') as startdate, to_char( date((current_date-1)),'DD-Mon-YY') as enddate from ( select count(*) as grandtot from rawlog where (code=200 or code=301) and ldate >= (current_date-7) and ldate <= (current_date-1) and doc like '%General%' ) foo; grandtot | when | perrange | numdays | startdate | enddate ----------+-----------+----------+---------+-----------+----------- 8443 | 07-Apr-05 | 1206 | 7 | 31-Mar-05 | 06-Apr-05 (1 row) QUERY PLAN --------------------------------------------------------------------------------------- Subquery Scan foo (cost=20266.77..20266.84 rows=1 width=8) (actual time=72.860..72.863 rows=1 loops=1) -> Aggregate (cost=20266.77..20266.77 rows=1 width=0) (actual time=72.707..72.708 rows=1 loops=1) -> Index Scan using logdate on rawlog (cost=0.01..20266.77 rows=2 width=0) (actual time=0.111..63.919 rows=8443 loops=1) Index Cond: ((ldate >= (('now'::text)::date - 7)) AND (ldate <= (('now'::text)::date - 1))) Filter: (((code = 200) OR (code = 301)) AND (doc ~~ '%General%'::text)) Total runtime: 73.199 ms (6 rows)The parsetree, the rewritten plan and the plan can be viewed here. As you can see, it gets rather long, but you can follow how the query was parsed originally and as rewritten and the final scan for the query. You might want to try it first on a simple query so you can begin to recognize the common elements of the parsed plans.
On my machine I run multiple installations of postgreSQL in non-standard locations.
I have 7.4 installed in /local/pgsql74 and 8.0 installed in /local/pgsql80.
In order to run anything from an installation I source a file which sets up
the appropriate PG variables for whichever installation I indicate on the
command line. This script also adds the appropriate bin to my path.
Here is an abbreviated version of my pgenv script which I invoke
by typing # pgenv if [ $1 = 4 ] then # # 7.4 PRODUCTION # PG_INST=/local/pgsql74 PGPORT=5434 MYPG=PG74 elif [ $1 = 8 ] then # # 8.0 BETA # PG_INST=/local/pgsql80 PGPORT=5430 MYPG=PG80 fi PGDATA=$PG_INST/data PGLIB=$PG_INST/lib PGHOST=localhost PATH=.:$PG_INST/bin:$PATH export PGDATA PGHOST PGPORT PG_INST PATH MYPG PGLIB While this serves me very well, there is another, almost simpler way to do this that includes more detail than my little script. This is the pg_service.conf file. This file allows you to name a service which consists of PG variable settings. Then you can just use the service name, PG_SERVICE, to indicate what group of settings you need. This is how it works. If you have multiple, isolated installations and one of them in $PG_INST, then in $PG_INST/share is a file called pg_service.conf.sample. You will need to edit it (we'll get to that) and put it in $PG_INST/etc/pg_service.conf. If you have a packaged install the file should be put into /usr/etc/postgresql/pg_service.conf or some variation on that depending on the package installations. If you had to put $PG_INST on your PATH before, you still need to do that. Then, you will only need to export PG_SERVICE=servicename to pick up all of the PG variables defined in your pg_service.conf file. This is what my pg_service.conf file looks like. I have a 74 installation and an 80 installation on different ports. I have an elein and a gb database on each of them. I created four service names, elein74, elein80, gb74 and gb80. [elein74] dbname=elein user=elein port=5434 [gb74] dbname=gb user=elein port=5434 [elein80] dbname=elein user=elein port=5430 [gb80] dbname=gb user=elein port=5430 Now at the shell, I ensure my PATH contains the $PG_INST/bin directory and export PGSERVICE as gb74. $ printenv | grep PG PGSERVICE=gb74 $ echo $PATH /local/pgsql80/bin:.:/home/elein/bin:/usr/X11R6/bin: /usr/local/bin:/usr/bin:/bin:/usr/bin/X11:Now when I type psql I get the gb database in the 7.4 installation while running the 8.0 version of psql. Which executables you use is strictly up to where those executables lie on your PATH. I no longer need to use my installation setting script. I can use pg_service.conf to point to specific databases by setting db name, to different installations by setting the port and different host and user names. Note that it is not necessary to set all of the variables. For example, if I ommitted dbname for a service called [80] then I would still need to reference the database name when I called up a program. Client applications written using libpq can pass a service=servicename variable in the conninfo string when opening the connection. Use service just as you would use the database or port number.
It is important to fool around with arrays and their functions in order
to get a good sense of the arrangement of data in multi-dimensional arrays.
In this article we will look at flattening arrays into column. Flattening
arrays into columns is not necessary to do conditional testing against
array elements An Aside: There is a running religious argument regarding whether or not arrays belong in a "pure" relational database. I assert that if the information describing an object is best described in an array, then it belongs in an array. The best example is that of a polygon. The definition of a polygon is an array of specific points. This article is not about the arguments for or against arrays, but rather how they can be used when you have them. Having said that, the examples in this article, taken from the documentation, are not good examples of when and why to use an array. Let us start with creating a table that contains two columns each of which is an array. The first is a one dimensional array and the second is a three dimensional array. We use the ARRAY constructor to insert data into the table. CREATE TABLE sal_emp ( name text, pay_by_quarter integer[], schedule text[][][] ); INSERT INTO sal_emp VALUES ('Bill', ARRAY[10000, 10000, 10000, 10000], ARRAY[['meeting', 'lunch', 'noon'], ['training', 'presentation','2pm']] ); INSERT INTO sal_emp VALUES ('Carol', ARRAY[20000, 25000, 25000, 25000], ARRAY[['breakfast', 'consulting','10am'], ['meeting', 'lunch','noon']]); select * from sal_emp; -- name | pay_by_quarter | schedule ---------+---------------------------+---------------------------------------------------- -- Bill | {10000,10000,10000,10000} | {{meeting,lunch,noon},{training,presentation,2pm}} -- Carol | {20000,25000,25000,25000} | {{breakfast,consulting,10am},{meeting,lunch,noon}} --(2 rows)pay_by_quarter is a one level array and schedule is a two level array. The array level corresponds to the nesting count of the square brackets which do not match exactly to syntax used to create the table, but rather to the syntax required to do the inserts. For example, the array level of pay_by_quarter is 1. The array level of schedule is 2. The upper bound of pay_by_quarter at level 1 is 4. The upper bound of schedule at level 1 is 2, the upper bound of the array at level 2 is 3. We care about the upper bound of arrays in order to iterate over them. select array_upper(pay_by_quarter,1) from sal_emp; -- array_upper -- ------------- -- 4 -- 4 (3 rows) select array_upper(schedule,1) from sal_emp;; -- array_upper -- ------------- -- 2 -- 2 -- select array_upper(schedule,2) from sal_emp;; -- array_upper -- ------------- -- 3 -- 3 The basic technique for iterating over array elements includes using the function generate_series(). This function takes a starting number, an ending number and an option step count. select * from generate_series(1,10,2); -- generate_series -- ----------------- -- 1 -- 3 -- 5 -- 7 -- 9 -- (5 rows)To use the generate_series() function to iterate over an array, you can do something like this. We have chosen an higher than needed end number. And we aliased the function to idx(n) so that we can use it's return value as n. select name, idx.n, pay_by_quarter[idx.n] from sal_emp, generate_series(1,7) idx(n) order by name; -- name | n | pay_by_quarter -- -------+----+---------------- -- Bill | 1 | 10000 -- Bill | 2 | 10000 -- Bill | 3 | 10000 -- Bill | 4 | 10000 -- Bill | 5 | -- Bill | 6 | -- Bill | 7 | -- Carol | 1 | 20000 -- Carol | 2 | 25000 -- Carol | 3 | 25000 -- Carol | 4 | 25000 -- Carol | 5 | -- Carol | 6 | -- Carol | 7 |Notice that the values are NULL for the extra rows generated by generate_series(). To eliminate the NULL values you can use: select name, idx.n, pay_by_quarter[idx.n] from sal_emp, generate_series(1,7) idx(n) where pay_by_quarter[idx.n] is not null order by name;schedule has two levels of array to then we can use two instances of generate_series() with different alias to flatten out all of the elements of the schedule column. select name, idx.i, idx2.j, schedule[idx.i][idx2.j] from sal_emp, generate_series(1,10) idx(i), generate_series(1,10) idx2(j) where schedule[idx.i][idx2.j] is not null order by name; -- name | i | j | schedule -- -------+---+---+-------------- -- Bill | 1 | 1 | meeting -- Bill | 1 | 2 | lunch -- Bill | 1 | 3 | noon -- Bill | 2 | 1 | training -- Bill | 2 | 2 | presentation -- Bill | 2 | 3 | 2pm -- Carol | 1 | 1 | breakfast -- Carol | 1 | 2 | consulting -- Carol | 1 | 3 | 10am -- Carol | 2 | 1 | meeting -- Carol | 2 | 2 | lunch -- Carol | 2 | 3 | noon -- (12 rows)
But we are still using an arbitrary value of 10 as the maximum array index.
What if we really did not know off hand what the upper bound of the array is?
We can know the upper bound of the array programmatically, but we cannot
insert it in the appropriate place in the call to generate_series().
The first of these two functions
flattens the pay_by_quarter column. The second flattens the schedule
column.
Each selects the max(array_upper()) and then uses it in a LOOP
query returning each row. array_upper() returns a value for each
row so we want to use
|
|||||||||||||||
Comments and Corrections are welcome. Suggestions and contributions of items are also welcome. Send them in! Copyright A. Elein Mustain 2003, 2004, 2005, 2006, 2007, 2008, 2009 |