Varlena, LLC | PostgreSQL General Bits Newsletter

PostgreSQL Training,
Consulting & Support

By A. Elein Mustain

22-Sep-2003 Issue: 44

Archives |

General Tidbits |

Google General Bits |

Prev |

General Bits is a column loosely based on the PostgreSQL mailing list pgsql-general.
To find out more about the pgsql-general list and PostgreSQL, see www.PostgreSQL.org.

Table defaults with View Rules
[GENERAL] Column defaults fail with rules on view	19-Sep-2003

A RULE to update a table based on the view does not "automatically" inherit the DEFAULT values for columns in a table. The rewrite RULE is very explicit--it does just what it says and no more. The time for handling the default values is earlier than the RULE.

This said, constraints on the table (as distinct from DEFAULT values) are enforced by the underlying table. SERIAL columns also work correctly with update rules as long as the value is not set.

Berend Tober has a particularly nice test case to show the behavior of update rules with DEFAULT values. This test case shows the creation of the table with default values set for columns three and four as well as a SERIAL in column 2.

   CREATE TABLE test_table (
      field1 char(1) NOT NULL,
      field2 SERIAL,
      field3 INTEGER DEFAULT 1,
      field4 VARCHAR(24) DEFAULT '(default value)',
      CONSTRAINT testdb_pkey PRIMARY KEY (field2));
   
   CREATE VIEW test_table_v AS
      SELECT field1, field3, field4 FROM test_table;
   
   CREATE RULE test_table_rd AS ON DELETE TO test_table_v DO INSTEAD
      DELETE FROM test_table WHERE field1 = old.field1;
   
   CREATE RULE test_table_ri AS ON INSERT TO test_table_v DO INSTEAD
      INSERT INTO test_table (field1, field3, field4)
      VALUES (new.field1, new.field3, new.field4);
   
   CREATE RULE test_table_ru AS ON UPDATE TO test_table_v DO INSTEAD
      UPDATE test_table SET
         field1 = new.field1, field3 = new.field3, field4 = new.field4
      WHERE field1 = old.field1;
   
   INSERT INTO test_table VALUES ('A');
   INSERT INTO test_table_v VALUES ('B');
   
   -- Result --
   =# SELECT * FROM test_table;
    field1 | field2 | field3 |     field4      
   --------+--------+--------+-----------------
    A      |      1 |      1 | (default value)
    B      |      2 |        | 
   (2 rows)
   
   -- Just for fun, show NON NULL constraints work
   =# insert into test_table_v values (NULL);
   ERROR:  ExecInsert: Fail to add null value in not null attribute field1

The solution to having updateable views with proper defaults is to ALTER TABLE ... SET DEFAULT. Most ALTER TABLE commands will not let you alter a view, but ALTER TABLE ... SET DEFAULT will. It is also possible that you may want the view to have different defaults than the underlying table and that can also implemented with the ALTER.

   ALTER TABLE test_table_v
       ALTER field3 SET DEFAULT 1;
   ALTER TABLE test_table_v 
      ALTER field4 SET DEFAULT '(default value)';
                                                 
   =# insert into test_table_v VALUES ('C');
   INSERT 3702953 1
   # select * from test_table;
    field1 | field2 | field3 |     field4      
   --------+--------+--------+-----------------
    A      |      1 |      1 | (default value)
    B      |      2 |        | 
    C      |      3 |      1 | (default value)
   (3 rows)

This shows the default value settings for the view enabled by the ALTER TABLE statement.

Contributors: btober at seaworthysys.com, Richard Huxton dev at archonet.com, Tom Lane tgl at sss.pgh.pa.us

Large Objects General Description
Several Threads 20-Sep-2003

A large object is a storage method (not a data type) which enables postgresql to store large columns separately from the enclosing tuple row data. The large object appears as an OID in the tuple.

A TOASTed column is one where the server automatically stores large values out of the tuple for various types. The fact that a column is TOASTed is invisible at the SQL layer.

Large objects are not needed as much as they were needed when before TOAST was introduced in 7.1. However, large objects can be accessed with the client and server C interfaces to do Unix like file operations of open, close, lseek, read, and write. TOASTED objects are always treated as whole objects.

The downside of large objects is that they are not automatically stored and must be handled separately from their in row values. This means that you must use special commands for inserting, updating and deleting large objects. pg_dump also has special arguments for dumping and restoring the large objects.

If you were storing images in the database which you were then selecting out and displaying then probably a bytea type would work well. The image, if large, would be TOASTED. If you had an image management application which manipulated or analyzed the image contents, then a large object storage would be better. With the large object storage you could seek to a specific location, read and write out changes using the C interface. If you used a simple (TOASTED) bytea column, you would have to read in the entire image and replace the image entirely in order to change even a bit.

The C interfaces are also useful as server side functions for analysis of the data. Your image knowledgeable server side C function can seek and search for patterns or perform morphing of the image without having to transfer any part of the image over to the client. pgtcl, the client interface, also supports large object functionality.

Contributors: elein at varlena.com

Large Object Basics with SQL Only
[GENERAL] psql and blob 17-Sep-2003

To create a table using which contains a large object, define the table with an OID typed column. The OID will be the large object identifier in this case.

The next problem is how to get the large object from your file system into the table storage. Is your large object file on the client or server side? A client side file is a file that lives on the machine where the client is running. It is only important to make that distinction when the client is on a different host than the database server. A server side file is a file residing on the database server host.

There is a difference between server side file loading and client side file loading. For server side files, you can invoke lo_import() in your insert statement. All of the work is handled by the database server.

For files on the client, the client has to do the work of opening the file and sending it to the server. This is done in the client program. It can also be done using psql (which is a special client program) using two steps. First import the large object and then insert its OID into the the table row using the psql variable :LASTOID.

When you select out a row which contains a large object, what you will see in the large object column is the OID. In order to access the contents of the large object, you must either extract it to a file or open and manipulate it in your client interface or server function.

To extract the image from the mypictures table onto the server host, use lo_export(). Specify the large object column and the target output file. The target output file must be writable by the database server and will be owned by the database process owner.

To extract the image from the server into a file on the client host, using psql it is a bit tricky. Without resorting to a shell script, you must eyeball value of the picture column and then use it in the \lo_export statement. (If anyone knows how to do this with only SQL and psql please let me know!)

The following code shows the creation of the table containing a large object. Then it loads an image from the server host and exports a copy of it to the server host. Then it loads an image from the client host and exports a copy of it to the client host machine.

   --
   -- My Picture table.
   --
   CREATE TABLE mypictures (
      title TEXT NOT NULL primary key,
      picture OID);
   
   -- -- Red Roses picture is on the Server Host
   -- Load and export a copy on the Server Host
   --
   INSERT INTO mypictures (title, picture)
       VALUES ('Red Roses', lo_import('/tmp/redroses.jpg'));
   SELECT lo_export(picture, '/tmp/redroses_copy.jpg') FROM mypictures
       WHERE title = 'Red Roses';
   
   --
   -- White Roses picture is on the Client Host
   -- Load and export a copy on the Client Host
   --
   \lo_import '/tmp/whiteroses.jpg'
   INSERT INTO mypictures (title, picture) VALUES ('White Roses', :LASTOID);
   
   SELECT * from mypictures;
   
   --     title    | picture
   -- -------------+---------
   --  Red Roses   | 3715516
   --  White Roses | 3715518
   -- (2 rows)
   
   \lo_export 3715518 '/tmp/whiteroses_copy.jpg'

This all works, however, what happens when you delete a row or update a large object? Deleting a row from the table in the usual way does delete that that row, however, the large object is left dangling. You can tell that the large object still exists by using \lo_list in psql. This command lists the oids of the large objects known to the system.

On the other hand, perhaps the large object is not left dangling because another row references the same image. Since large objects are, well, large, it was thought that it would best to only carry around the object's identifier, the OID. That OID may be referenced by other rows if the developer does not want multiple copies of the same large object and that OID may also be involved in a transaction in the same or different rows. This is a problem that the developer must resolve.

In this example, we will assume that when the row is deleted then we want to also delete the large object. This implies that there is never another row referring to the same large object.

To do this use a rule to call the server function lo_unlink(). Using the large object interfaces it is possible to write to large objects and modify them. When you are only using SQL, then you must drop and replace the large object. This would also be done with a rule. As you assign a new value for the large object, just unlink the old one. But only unlink it if it is getting a new value.

   CREATE RULE droppicture AS ON DELETE TO mypictures 
   DO SELECT lo_unlink( OLD.picture );

   CREATE RULE reppicture AS ON UPDATE TO mypictures 
   DO SELECT lo_unlink( OLD.picture ) where OLD.picture <> NEW.picture;

To check that the rule is working, in psql select the row from the table and observe the row you want to delete. Use \lo_list to show the large objects. Drop the row in the usual way. By selecting from the table, you know the table row is gone and by using \lo_list you can see that the corresponding large object is also gone.

   =# select * from mypictures;
       title    | picture 
   -------------+---------
    White Roses | 3715592
    Red Roses   | 3715593
   (2 rows)

   =# update mypictures set picture=lo_import('/tmp/redroses_copy.jpg') 
      where title='Red Roses';
    lo_unlink 
   -----------
            1
   (1 row)
   
   =# \lo_list
        Large objects
      ID    | Description 
   ---------+-------------
    3715592 | 
    3715598 | 
   (2 rows)
   
   =# select * from mypictures;
       title    | picture 
   -------------+---------
    White Roses | 3715592
    Red Roses   | 3715598
   (2 rows)

Caution: Note that if you unlink a large object which is being referenced by an existing row AND you have these rules in place, you cannot drop the row. You must drop the rule, drop the row and recreate the RULE. LO_TRANSACTION, the variable affecting transactions for large object operations has no effect on this functionality and no longer exists in 7.4.

Contributors: elein at varlena.com

Smart Blobs and Dumb Blobs
What is a Smart Blob? 20-Sep-2003

It was the original intent of the postgres designers that a large object be a part of a User Defined data type. The difference here is the difference between a smart blob (a sblob :-) and a dumb blob. The smart blob "knows" its data type and what operations and functions are appropriate for its type.

For example, a document type could be created and it could contain the large object, but may also contain interesting attributes of the document such as a set of key words, the word count, the size and the language of the document.

Functions would be written to manage the document type. Basic functions might include:

Input from text functions. (required) This could calculate the size and word word count on insertion. And it would handle the large object using the large object routines.
Output to text functions. (required) Output to text may not need to output the actual document.
An equal comparison. Several types of equality possibilities come to mind:
- Compare the text with or without spaces
- Compare the large object identifier
- Compare all or some of the attributes of the document
Greater than and Less than
Perhaps a comparison of the document's word count or size.
Get and Set functions for all of the attributes, including the document.

Document specific functions might include the following. Note that these functions give the smart blob its character.

A spell checker function, with or with corrections
A formatting exporter for pretty printing the output to a file or printer.
A phrase finder
Search by specific key word.
Search by full text
A smart indexing function to create the keyword list

The equality functions/operators and the greater than and less than functions/operators can be used in an ordinary BTREE index. The accessor functions can be used as functional indexes on language, size and word count. With some wrangling, a keyword index can also be created.

The implementation of this document data type is left as an exercise for the reader :-) Any similarities between this document data type and any existing datatypes is purely common sense and coincidental.

These ideas all relate, for the most part, to the creation of any datatype and are meant to illustrate how large objects need not be dumb blobs.

Contributors: elein at varlena.com

Comments and Corrections are welcome. Suggestions and contributions of items are also welcome. Send them in!

Top