|
|||||||||||||||||||||||||
|
|||||||||||||||||||||||||
|
|||||||||||||||||||||||||
|
|||||||||||||||||||||||||
|
|||||||||||||||||||||||||
In order to break up large datasets, it is desireable to select a subset of them based alphabetically and sort them correctly. For numbers, you would simply use greater than or less than. select ... FROM mytable WHERE id > 10000 order by id;For names, this problem is just as easily solved if you remember how names are formed and characteristics of alphabetical sorting.
SELECT last, first FROM mytable WHERE name > 'Smith' ORDER BY last;This is fine if your capitalization is perfect in both your data and selection. However, correct name sorting must also include names which begin with lower case, "von Tropp" for example. In order to sort and select the lower cased names along with their upper cased equivalents, you'll need to set the case to lower() or upper(). SELECT last, first FROM mytable WHERE lower(name) > lower('Smith') ORDER by lower(name); If there are many queries which sort or select by name in this manner, you can add a functional index on your table. create index mytable_last_lower_dx on mytable (lower(last));
In order to find out information about how your installation was initialized as well as its current running status, you can invoke the shell command pg_controldata as the user superuser (usually postgres). What we see below is pg_controldata run on my installation of 7.3.1. It shows my installation is in production. And it shows something fishy with regards to the version of pg_control. I would have expected to see 7.3. (Note to self: look into this). From the building of my system, it shows that my LL_COLLATE and LL_CTYPE are set to the usual for the US. It shows my maximum identifier length and maximum number of arguments I can use when defining a server function. The runtime information it shows are the details of the write-ahead logging (WAL) and the checkpoint processing. This is useful for down and dirty debugging. $ pg_controldata pg_control version number: 72 Catalog version number: 200211021 Database cluster state: in production pg_control last modified: Fri 09 May 2003 10:59:22 AM PDT Current log file ID: 0 Next log file segment: 65 Latest checkpoint location: 0/40776550 Prior checkpoint location: 0/4070CDB0 Latest checkpoint's REDO location: 0/40776550 Latest checkpoint's UNDO location: 0/0 Latest checkpoint's StartUpID: 47 Latest checkpoint's NextXID: 216539 Latest checkpoint's NextOID: 1409824 Time of latest checkpoint: Fri 09 May 2003 10:59:20 AM PDT Database block size: 8192 Blocks per segment of large relation: 131072 Maximum length of identifiers: 64 Maximum number of function arguments: 32 Date/time type storage: Floating point Maximum length of locale name: 128 LC_COLLATE: C LC_CTYPE: en_US All of this information is installation specific. It applies to the group of databases in the installation.
There are several different ways to handle storing of images with PostgreSQL. The primary suggestions are:
There are plusses and minuses for each of these methodologies, but Jan Weick focussed in on the central point of design, "it depends on the requirements. The problem is that people tend to do again and again what they've done before (cut'n'paste programmers) without really reevaluating how the solution fits into the new requirements." The requirements outlined were:
bytea
text with Base-64 encoding
large objects
File System
File System vs. Database storage Images stored on the file system require separate storage management and backup as well as client filesystem management, including permissions. There is no lock coordination between writing the column value and writing the file file. Rolling back a transaction does not rollback changes to the file system. These pros and cons must be considered with the requirements. Is speed absolutely the most important requirement? Are the images never moved or updated? Are the images managed from other image management applications which require file system storage? Is nanosecond accuracy required? Either the database methods or the file system method is appropriate under different circumstances. Arjen van der Meijden sums up the choices, "...it becomes a "correctness and ease of use vs raw performance and system load (including distribution of tasks)" comparison
In order to maximize the speed for realtime data insertion, it may be faster to use COPY instead of INSERT. In one example, realtime data incoming at a rate of 500 packets of 512 bytes a second created INSERT statements with just two other columns. The maximum rate that was achieved was 350 inserts per second on a Sun Blade 2000. The inserts were grouped in transactions and committed every 1200 rows via libpq. It was suggested that a COPY be used instead of inserts. This would require formatting the data for COPY--usually only delimiters need to be added. To COPY from stdin, you could pipe your input into psql using copy: generatedata | psql -c "copy copytest from stdin;"This, in one person's case was able to achieve thousands of rows per second. NOTE: YMMV
There are several ways to see how much disk space your database and its individual tables are using. To see the size of all of your tables in kilobytes, VACUUM and then run; select relname, relpages * 8 from pg_class;The relpages value is only updated by VACUUM so you must have vacuumed recently to get accurate numbers. An alternative method uses the file system tools ls and du. The database directories are under PGDATA/base/database oid and the individual tables are under PGDATA/base/database oid/table oid. In Issue 10 we discussed how to find the physical file storage of any given table with oid2name and via the system catalog tables. Using that information, the following shell script can be used to determine the sizes of your database and tables. The database name and optionally the table name are input and verified. They are used to extract their respective oids. The information for database directory is then shown using du and if a table name is specified it is shown using ls -l ######################################################### #!/bin/bash # dbtablespace.sh # Show PostgreSQL disk usage by table or by database # (elein@varlena.com) ######################################################### USAGE() { echo; echo "$1" echo "USAGE: dbtablespace.sh [-d \"
|
|||||||||||||||||||||||||
Comments and Corrections are welcome. Suggestions and contributions of items are also welcome. Send them in! Copyright A. Elein Mustain 2003, 2004, 2005, 2006, 2007, 2008, 2009 |