varlena
varlena
PostgreSQL Training,
Consulting & Support
General Bits
By A. Elein Mustain

21-Feb-2005 Issue: 98

Archives | General Tidbits | Google General Bits | Docs | Castellano | Português | Subscriptions | Notifications | | Prev | Next

General Bits is a column loosely based on the PostgreSQL mailing list pgsql-general.
To find out more about the pgsql-general list and PostgreSQL, see www.PostgreSQL.org.

Table Inheritance Overview
Table Inheritance 18-Feb-2005

Table inheritance is the ability to create a child table based on the definition of a parent table. In the PostgreSQL implementation, this is not true object inheritance. Do not confuse the two.

Table inheritance allows you to create a table as the basis for one or more child tables where the child tables contains all of the columns of the base table and zero or more additional columns. The related tables together are referred to as a table hierarchy. Each table in the hierarchy is a separate table. However, the common attributes can be selected and manipulated via the parent table, even though in general the parent table usually does not contain any rows at all.

Child tables inherit columns and DEFAULTS and CONSTRAINTS, including SERIAL constraints. Child tables do not inherit indexes, including primary key indexes. Lack of index inheritance is somewhat controversial but is useful in its own right for distributing indexes across tables and possibly tablespaces.

In the following illustration we have a schema for a library collection. Every item in the collection carries a common set of attributes defined in the parent table. There are vertical queries on collections and horizontal queries on specific item types.

Table Inheritance
  Collection
 
       
Books
           
           
Records
             
             
Magazines
           
           

For example, the library's inventory system cares about items in general. It wants to know the number of items at each branch and how many are available or lost or checked out and so on. By defining the type of object in the parent table, it can easily group by item type for inventory purposes. It does not need to know, nor does it care about the particulars of any given item. This query counts the number of each type of item in each branch.

	SELECT count(*), collection_type, branch_name 
	FROM collection
	GROUP BY branch_name, collection_type
	ORDER BY branch_name, collection_type;

The book catalog system cares about books. It will do its operations directly on the book table which contains information pertinent to books specifically. For example, the book record would contain Title, Author, and ISBN number in addition to the collection information. The periodicals part of the application has different information to store and display for each magazine. The operations and queries necessary would be applied directly to the magazine table.

In a purely relational context, each item table would have a one-to-one foreign key into the collection table instead of storing the collection information directly in the same table as the item. There will be two records for each book, for example, one in the collection table and one in the book table. The indication that a table is an element of the collection is that it has a foreign key to the common collection information. Each item type's table has no other relation to any other collection table.

Relational Model
Collection  
       
       
       
       
       
       
       
Books
     
     

Records
       
       
Magazines
     
     
This seems common enough, but it is a little odd because the same keys in the collection table relate to several different tables.

In a relational context, the aggregation of collection information would still be in the collection table. In either the relational or inheritance model queries with general collection information would still have to query the item tables individually.

It is more compelling to use a single book row as the complete description of a book. A view of books joined to collection information is not necessary. Updates are simpler and the indexes into the book table are independent of all of the other items in the collection making it smaller and faster.

Aside: If jagged rows were enabled, then queries on collections in the table inheritance model would also be able to fetch rows from all of the item types in one query. The application would check the type of item and do the right thing based on the different data in each row type. PostgreSQL does not implement jagged rows. Illustra did.

Contributors: elein at varlena.com
Distributed Indexing with Table Inheritance
Using Table Inheritance for Local and Distributed queries 19-Feb-2005

One advantage of table inheritance is that you can treat a set of tables as if they were a single table while distributing the indexing amongst them. Indexing is on each individual table and not on the table hierarchy. The indexes on the children tables must be created explicitly. They are not inherited. (Because child tables are truly individual tables, you may also want to distribute the children tables on different tablespaces.)

The best example of this is a table which grows in time. Suppose you have a log table of customer support calls that you keep over a very long period of time. The application requires 1) most queries are on the current time set and 2) occassional queries are required for the entire data set. You can create a parent table and one child table of current information and other child tables populated with archived data. The child tables look exactly like the parent table in this case.

You set up an empty parent table called parent_log and a child called log_current. On a periodic basis, create a new child table inheriting parent_log with an appropriate name, say log_2004 and move the rows corresponding to that date range from the log_current table to the child table (and vacuum). Each child table should have the same indexes as the parent table, but remember that these indexes must be created explicitly.

If you create this scenario without table inheritance, the ability to query across the entire data set is very slow. It requires a union of all log tables (and you have to keep track of new log tables). Qualification is done on the union of the tables not on the individual tables.

Without inheritance, selection across the data set would be like this:

	SELECT ....
	FROM log_current UNION log_2000 UNION log_2001 
		UNION log_2002 UNION log_2003 UNION log_2004 
	WHERE cust_name = 'george'; 
This query will perform the UNIONs and then apply the WHERE clause. You can go out to lunch and come back later to see the results.

Using table inheritance, selection across the data set would be like this:

	SELECT ....
	FROM parent_log
	WHERE cust_name = 'george';
Because parent_log is the parent of a table hierarchy, each child table will be examined individually. If there is an index on cust_name and, for example, there were no questions from this person from 2000 to 2002, the first three tables will only have their indexes examined and then will be skipped. The indexes will be used on the other tables if it is deemed appropriate by the optimizer. This is much faster than performing the UNIONs and then qualifying the larger data set.

Applications will generally operate only on the log_current table ignoring the table hierarchy. This allows fast access to the most needed information.

Careful design of the table hierarchy can greatly speed up both simple and especially aggregated queries across the entire data set.

Contributors: elein at varlena.com


Comments and Corrections are welcome. Suggestions and contributions of items are also welcome. Send them in!
Copyright A. Elein Mustain 2003, 2004, 2005, 2006, 2007, 2008, 2009

Top
Google
Search General Bits & varlena.com Search WWW