Varlena, LLC | PostgreSQL General Bits Newsletter

PostgreSQL Training,
Consulting & Support

By A. Elein Mustain

10-Mar-2003 Issue: 16

Archives |

General Tidbits |

Google General Bits |

Prev |

General Bits is a column loosely based on the PostgreSQL mailing list pgsql-general.
To find out more about the pgsql-general list and PostgreSQL, see www.PostgreSQL.org.

Large dumps
[GENERAL] URGENT: pg_dump error	11-Feb-2003

As an installation grows, so does the file from pg_dump. For a file systems which limit file sizes to 2 gigs, this can be a problem. The first thing that can help is to use the -t option and dump table by table. This breaks up the dump into logical units. A nice dump script can be created to dump each pertinent table in your installation. However, for very large tables, there still might be problems.

Using the -t option, Dmitry Tkack offers these alternatives to plain dumps. They utilize standard unix utilities to split and or compress the result. The first simply zips the output of pg_dump as it is running. The second splits the output of the dump at 2000 megabytes, storing the result in sample01, sample02, etc. as needed. The third example combines the two by first zipping and then splitting the zipped result.

	pg_dump mydatabase -t mytable | gzip -f > sample.gz
	pg_dump mydatabase -t mytable | split -C 2000m - sample.
	pg_dump mydatabase -t mytable | gzip -f | split -b 2000m - sample.gz.

Contributors: jerome jerome at gmanmi.tv, Dmitry Tkach dmitry at openratings.com

Using Schemas
[GENERAL] What is the benefit of schemas?	31-Jan-2003

Namespaces are used to divide (or aggregate) parts of databases. They can be thought of as pseudo databases or databases within databases. One person suggests the idea that the database is a "file system" while the schemas are the "directories". This analogy is particularly good when considering that the SQL variable search_path lists the schema in which you want to search.

Tables are usually what people think of when creating objects in a schema, however, functions, views and other objects can also be schema specific.

Developers may want to create objects in their own schemas to override corresponding objects in the public or a common schema while working on changes to those objects. The developer's search_path can be prefixed to include the development schema first. This allows changes and testing without interfering with the public or common schema versions of objects. This is like, in Unix, including ~/bin before /usr/bin in your PATH so that you pick up your customized version of some utility.

Schemas allow individual users to create tables in their own user-accessible schemas. This is particularly helpful when running an ISP or administering a site with many users. Instead of creating a new database for each user, only a schema needs to be created. This scheme can also create a group of users which may have their own common pool as well as their individual schemas.

In an application context, a schema can hold application specific data, modularizing the data to the application. If you are running several applications against the same database, common lookup tables and functions can be put into the public schema for use by all. Generic lookup tables might be city, state/province, country, zip codes, titles, etc. Common functions may be company specific formatting functions.

Contributors: Berend Tober btober at computer.org, Bruce Momjian pgman at candle.pha.pa.us, Medi Montaseri medi.montaseri at intransa.com, Arjen van der Meijden acm at tweakers.net elein at varlena.com

Speaking of schemas...
[GENERAL] pg_temp schemas 6-Feb-2003

Speaking of schemas, those temporary schemas that you see when you list yours schemas are there to hold temporary tables.

   elein=# select nspname from pg_namespace;
   nspname
   ------------
   pg_catalog
   pg_toast
   public
   pg_temp_1
   sinbin
   pg_temp_3
   rr
   ere

The tables are removed when the session is over, but the schemas remain. They are reused. Each process has a number, the pgproc slot number, known to be unique to itself. This is the "n" used for naming pg_temp_n. When the session is closed, the pgproc slot is freed up for the next session.

Contributors: Fernando Papa fpapa at claxson.com, Tom Lane tgl at sss.pgh.pa.us

The fastest query
[GENERAL] What is the quickest query in the database? 20-Feb-2003

This an academic search for the fastest query and also to find the best way to test for a "live" database. Usually you can just connect and disconnect to ensure that the database is alive and well. To test a middleware connection like JDBC, you might really want to execute a fast, short query. And people could not resist thinking about the fastest possible query.

Jan Weick suggested that the semicolon is the fastest as it does not start a transaction or parse anything but will fail if the connection is lost.

Others suggested queries with minimal parsing using constants, keywords or built-in variables.

	select 1;
	select '';
	select NULL;
	select current_user;

Contributors: P G pg_dba at yahoo.com, Tino Wildenhain tino at wildenhain.de, Jan Wieck JanWieck at Yahoo.com, Jonathan Bartlett johnnyb at eskimo.com, Medi Montaseri medi.montaseri at intransa.com, Steve Crawford scrawford at pinpointresearch.com, Ericson Smith eric at did-it.com

Transaction time and real time
[GENERAL] now() more precise than the transaction 4-Feb-2003

The function now() and its alias, current_timestamp, show a consistent time within the scope of a transaction. This scope could be a single statement, a statement and subselects or a block transaction with BEGIN and COMMIT.

Occasionally, you may want the precise time within a transaction. In that case, the timeofday() function is appropriate. This function returns text rather than a timestamp, however, if a timestamp is wanted, it can be cast as a timestamp.

	e=# BEGIN;
	e=# select now();
	              now
	-------------------------------
	 2003-03-07 13:18:47.864102-08
	(1 row)
	
	e=# select current_timestamp;
	          timestamptz
	-------------------------------
	 2003-03-07 13:18:47.864102-08
	(1 row)

	e=# select timeofday();
	              timeofday
	-------------------------------------
	 Fri Mar 07 13:18:47.865244 2003 PST
	(1 row)

	e=# select now();
	              now
	-------------------------------
	 2003-03-07 13:18:47.864102-08
	(1 row)

	e=# select current_timestamp;
	          timestamptz
	-------------------------------
	 2003-03-07 13:18:47.864102-08
	(1 row)

	e=# select timeofday();
	              timeofday
	-------------------------------------
	 Fri Mar 07 13:18:47.866053 2003 PST
	(1 row)
	
	COMMIT;

Contributors: RobertD.Stewart at mail.state.ky.us, Bruno Wolff III bruno at wolff.to, Greg Stark gsstark at mit.edu, Andrew Sullivan andrew at libertyrms.info

Partial field sorts
[GENERAL] natural sort order 14-Feb-03

An alphanumeric field sorts in alphabetic order. If your fields have embedded numbers, the sort might be non-intuitive. For example,

	item1, item10, item2, item3

To sort these fields in numeric order you can sort by the substring of the number:

 SELECT * FROM item_table ORDER BY substring(item_col,5)::int

Now suppose there are a number of prefixes. The previous solution and does not take into account the prefix in the sort order. If the prefix length were known, and you wanted to sort by prefix and suffix, divide the value up for sorting. In this case we, assuming the prefix is alway some four characters:

	SELECT * FROM items_table 
	ORDER by substring(item_col from 1 for 4), substring(item_col from 5);

But it seems more likely that the prefix will have varying lengths. Both previous solutions above relied on knowing the length of the prefix. Let us suppose that the form of the data were an alpha string followed by a numeric string, for example bob450, bob16, bob72, gretchen12 and harold63. It is still necessary to divide the field into the prefix and the suffix, but in this case the substring will use a regular expression.

	SELECT * FROM IDS
	ORDER BY substring(id  from '[A-Za-z]*'), substring(id from '[0-9]*$')::int;

This example will also work with 123abc456, sorting by '123abc' then by '456'. This example will not work if the trailing number is not there.

There are many other solutions to this problem. They range from creating an operator class or datatype to just using a perl or python function to extract the components. PostgreSQL is very accommodating. That said, however, considering the construction of your data table is important. There should be a good reason to create a column composed of several parts. Otherwise, the parts of the composite keys should be stored in separate fields, avoiding this sorting problem (and other problems) in the first place. If the only problem that the column raises is sorting, then your reasons are probably good. If you find that you must dissect the column for a variety of reasons, they probably should be stored separately.

Contributors: Ken Guest kguest at stockbyte.com, Arguile arguile at lucentstudios.com, elein at varlena.com

Comments and Corrections are welcome. Suggestions and contributions of items are also welcome. Send them in!

Top