varlena
varlena
PostgreSQL Training,
Consulting & Support
General Bits
By A. Elein Mustain

20-Feb-2006 Issue: 128

Archives | General Tidbits | Google General Bits | Docs | Castellano | Português | Subscriptions | Notifications | | Prev | Next

General Bits is a column loosely based on the PostgreSQL mailing list pgsql-general.
To find out more about the pgsql-general list and PostgreSQL, see www.PostgreSQL.org.

Base Type using Domains
Creating an Inherited Type Using Domains 19-Feb-2006

Updated 27-Feb-2006

Base type inheritance is useful when you want to inherit some or all of the parent type's functions, including input and output. For example, if you want a text type, but want to have it in a particular format and be able to override the equality and comparison operators.

Type inheritance for base types should work by inheriting all of the parent's definitions and overriding them only as necessary. What I will try to do is to demonstrate that we can weasel this functionality using domains and only SQL and plperl.

In an ideal implementation CREATE DOMAINS would work the same as CREATE TYPE foo UNDER bar. The CREATE TYPE foo UNDER bar should have bar inherit all of the operators and functions from foo unless they are overridden.

In this exercise we will create an email type under the text type using a domain, functions, operators and an operator class. This creates the elements of a type inheritance. All of the implementation will be in either plperl (good for parsing) and SQL (keeping things as simple as possible). The point is to create a base type, not a composite type using only procedural languages.

In creating a type there are many things to consider.

  • What parent type to use? Which is the closest to the type we are implementing? In this case, our email type is closest to a text type.
  • What things make this type different than its parent type? In our case it is these things:
    • There should be some parsing to ensure the email is in a proper format.
    • The comparison and equality functions of the email type need to be case insensitive.
    • Some accessor functions to different parts of the email address would be helpful.

Let us start with the domain definition and its check constraint. The following statement creates email as a type under type text, inheriting all of its functions and operators, etc. The CHECK constraint is a function which validates the format of the value, making sure it is an email format.

	CREATE DOMAIN email AS text CHECK ( isemail( VALUE) );
We will need to create the isemail() function first! For this we will use perl because it matches strings very nicely. And we will use a "pretty good" regular expression we found in a Cookbook at Active State Programmers Network. For a proper validation of email addresses you should use Email::Valid and RFC::RFC822::Address perl modules. (Thank goodness for pre-written perl recipes!)

This plperl function runs the input through a regular expression match to determine if it is in the right format for email. If it is the correct format it returns TRUE otherwise it returns FALSE. If it is passed NULL, a NULL is returned--this we assert by stating the function is STRICT. The function is also IMMUTABLE which means that given the same input it always returns the same output.

	--
	-- Check constraint isemail for email base type
	-- Using "pretty good" but not strict rfc compliant regex
	--
	CREATE OR REPLACE FUNCTION isemail(text)
	RETURNS BOOLEAN
	AS $$
	   if ( $_[0] =~ m/
	       ^                  # Start.
	        (                 # Start of $1
	            [A-Z0-9]+     # Must start with some A-Z or 0-9s.
	            [._]?         # May have a . or an _ after that
	        )                 # End of $1
	        {1,}              # One or more of $1.
	        [A-Z0-9]+         # More alphanumerics
	        \@                # Literal @ sign
	        (                 # Start of $2
	            (             # Start of $3
	                [A-Z0-9]+ # More alphanumerics
	                [-]?      # At most one - sign.
	            )             # End of $3
	            {1,}          # One or more of $3
	            [A-Z0-9]+\.   # More alphanumerics
	        )                 # End of $2
	        {1,}              # One or more of $2.
	        [A-Z]{2,4}        # 2-4 letters
	        $                 # End.
	    /ix ) {
	      return TRUE;
	   }
	   else {
	      return FALSE;
	   }
	$$ LANGUAGE 'plperl' STRICT IMMUTABLE;
OK, now the CREATE DOMAIN should succeed. Let us test the function. Our table aliases contains an email and a text field. We'll test a CAST, a bad email format and direct calls to the isemail() function.
	elein=#  select '-x-'::email;
	ERROR:  value for domain email violates check constraint "email_check"
	elein=# insert into aliases values ('***', 'BAD EMAIL');
	ERROR:  value for domain email violates check constraint "email_check"
	elein=# select isemail( '555@123.com');
	 isemail
	---------
	 t
	(1 row)
	
	elein=# select isemail( '555@123#com');
	 isemail
	---------
	 f
	(1 row)
So far so good. We cannot cast or insert into a email typed column unless the data is formatted correctly. The function works as expected.

The next criteria we need to address is that all comparisons of email should be case insensitive. There are several aspects of these comparisons to consider. Can we create a UNIQUE index? Can we use ORDER BY? How would LIKE work? The implementation is fairly simple, but the use is a tiny bit quirky.

The implementation of the comparison and equality operators simply needed to leverage the text versions of these functions operating on the lower case version of data. This should finesse any issues of encoding and locale. Here are a few of the functions. The rest can be found in the full code. The functions we implemented are: comparison, equals, greater than, greater than or equals, less than and less than equals. We also added a "like" function so we could do case insensitive like comparisons. The thing to notice about the like function is that its second parameter is of type text. This is because a partial match string like 'bozo%' is not an email type according to our definition of an email type.

	-- cmp: 0 if equal, 1 if greater than, -1 if less than
	CREATE OR REPLACE FUNCTION email_cmp(email,email)
	RETURNS INTEGER AS
	$$
	   SELECT CASE WHEN lower($1) = lower($2) THEN 0
	               WHEN lower($1) > lower($2) THEN 1
	   ELSE -1
	   END;
	$$ LANGUAGE 'SQL' IMMUTABLE STRICT;
	
	-- Equals: lower(text) = lower(text)
	CREATE OR REPLACE FUNCTION email_eq (email, email) RETURNS boolean AS
	$$
	   SELECT CASE WHEN email_cmp($1, $2) = 0
	   THEN TRUE ELSE FALSE END;
	$$ LANGUAGE 'SQL' IMMUTABLE STRICT;
	
	-- Greater than: lower(text) > lower(text)
	CREATE OR REPLACE FUNCTION email_gt (email, email) RETURNS boolean AS
	$$
	   SELECT CASE WHEN email_cmp($1, $2) > 0
	   THEN TRUE ELSE FALSE END;
	$$ LANGUAGE 'SQL' IMMUTABLE STRICT;

	--Like
	CREATE OR REPLACE FUNCTION email_like(email, text) RETURNS boolean AS
	$$
   	SELECT lower($1) LIKE lower($2);
	$$ LANGUAGE 'SQL' IMMUTABLE STRICT;
As you can see the functions are written as simply as possible in SQL. They are all STRICT, returning NULL on NULL input and they are all IMMUTABLE.

Once we have the comparison functions, we need to tie them to the appropriate operators. This is so that we use the operator instead of calling the function explicity. The CREATE OPERATOR statement ties a symbol to a function and specifies its right and left argument types. The LIKE keyword is tied to the operator '~~' so we will tie that operator to our like function.

	--
	-- Operators
	--
	CREATE OPERATOR =  ( PROCEDURE = email_eq, LEFTARG = email, RIGHTARG = email);
	CREATE OPERATOR >  ( PROCEDURE = email_gt, LEFTARG = email, RIGHTARG = email);
	CREATE OPERATOR >= ( PROCEDURE = email_gte, LEFTARG = email, RIGHTARG = email);
	CREATE OPERATOR <  ( PROCEDURE = email_lt, LEFTARG = email, RIGHTARG = email);
	CREATE OPERATOR <= ( PROCEDURE = email_lte, LEFTARG = email, RIGHTARG = email);
	CREATE OPERATOR ~~  ( PROCEDURE = email_like, LEFTARG = email, RIGHTARG = text);
Each of these operators are tested in statements like this. The full set of tests can be found in the test code. Notice that it must be using our equality function because the comparison is case insensitive.
	elein=# select 'PK@unique.com'::email = 'pk@UNIQUE.com'::email;
	 ?column?
	----------
	 t
	(1 row)

In testing LIKE, we found some anomolies. When the right hand value is not cast explicitly to text, then the text like function is used instead of the email like function. These tests demonstrate the varied behavior of LIKE, ~~ and direct calls to the like function. This is a bug--but it may or may not be fixed. In the meanwhile we will cast the right operand to text or use the like function directly as necessary. The full set of tests can be found in the test code.

	\echo select column like value: works
	select lname, email from aliases where email like 'BOZO%'::text;
	select lname, email from aliases where email ~~ 'BOZO%'::text;
	select lname, email from aliases where email like lname || '%';
	\echo  direct function all to email_like works without casts
	select lname, email from aliases where email_like(email, 'BOZO%');
	select lname, email from aliases where email_like(email, lname || '%');
	\echo EXPECT NO ROWS: LIKE and ~~ with unknown Right type constants dont work
	select lname, email from aliases where email like 'BOZO%';
	select lname, email from aliases where email ~~ 'BOZO%';

In order to create a UNIQUE index on our test table, we will need to create an index which uses these operators and knows about the comparison function. Indexes do this by using OPERATOR CLASSES based on an index type, for example, BTREE or RTREE. An operator class groups a set of operators for a type or types. Our parent type, text, uses text_ops OPERATOR CLASS for BTREE indexes. We want to set up a BTREE type of index with email_ops. The OPERATOR CLASS for BTREEs requires the functions and operators we just defined and the comparison function we snuck in when we wrote the operator functions. We will group those operators into an email operator class.

	--
	-- Operator Class
	--
	CREATE OPERATOR CLASS email_ops
	DEFAULT FOR TYPE email USING BTREE AS
	   OPERATOR 1 <,
	   OPERATOR 2 <=,
	   OPERATOR 3 =,
	   OPERATOR 4 >=,
	   OPERATOR 5 >,
	   FUNCTION 1 email_cmp(email,email) ;
The operator class specifies it is using a BTREE indexing type. The operators for the BTREE are, in order, less than, less than or equal to, equal, greater than or equal to and greater than. BTREES also require one function, the comparison function. The numbering of these operations, in order, is pertinent. We could be using different operator symbols than the traditional ones, but that would confuse things. The operator class is tied to the operators which are tied in turn to each function.

Now on our test table aliases, we can add a unique index.

	CREATE UNIQUE INDEX aliases_email ON aliases USING BTREE (email email_ops);
And let us test it. We want it to reject duplicate keys based on the lower cased values of the email.
	elein=# \echo expect PK ERROR
	expect PK ERROR
	elein=# insert into aliases values ('pk@email.com', 'PK');
	INSERT 0 1
	elein=# insert into aliases values ('PK@email.com', 'PK');
	ERROR:  duplicate key violates unique constraint "aliases_email"

Notice that we are creating an explicit UNIQUE INDEX instead of using the PRIMARY KEY syntax in CREATE TABLE. We are doing this because the PRIMARY KEY syntax has no place for specifing an OPERATOR CLASS on the PRIMARY KEY. This is a missing feature which may or may not be fixed.

Now let us look at sort. We found that when we did a plain sort, it defaulted to the text comparison function. We know this because the values are sorted with capitalized letters first. Then we tried using the USING operator clause on the ORDER BY command. In that case it was sorted correctly.

	elein=# select * from aliases order by email;
	           email            |  lname
	----------------------------+----------
	 Abalone@little.fishies.com | Abe
	 Carolina@my.mind.edu       | Carolina
	 DDuck@ducks.disney.com     | Donald
	 bozo@Clowns.com            | Boz
	 et@galaxy.net              | ET
	 frodo@TheShire.net         | Frodo
	 galahad@camelot.org        | Sir
	 pk@email.com               | PK
	(8 rows)
	
	elein=# select * from aliases order by email using <;
	           email            |  lname
	----------------------------+----------
	 Abalone@little.fishies.com | Abe
	 bozo@Clowns.com            | Boz
	 Carolina@my.mind.edu       | Carolina
	 DDuck@ducks.disney.com     | Donald
	 et@galaxy.net              | ET
	 frodo@TheShire.net         | Frodo
	 galahad@camelot.org        | Sir
	 pk@email.com               | PK
	(8 rows)
In this case ORDER BY does have a way to let us tell it which operator to use. We will have to remember to always use the USING operator clause with the ORDER BY clause when sorting our email type.

To ensure that our email datatype is useful, we've added some accessor functions. These perl functions extract the login name, the full domain, the domain type and the reverse domain. For any datatype you create, you may want manipulator functions like these that can be called directly in SQL.

	--
	-- Accessor functions
	--
	-- login: before the @ sign
	CREATE OR REPLACE FUNCTION email_login (email) RETURNS text AS
	$$
	   return (split(/\@/, $_[0]))[0];
	$$ LANGUAGE 'plperl' IMMUTABLE STRICT;
	
	-- Full Domain: after the @ sign
	CREATE OR REPLACE FUNCTION email_fdomain (email) RETURNS text AS
	$$
	   return  (split /\@/, $_[0])[-1];
	$$ LANGUAGE 'plperl' IMMUTABLE STRICT;
	
	-- Top Level Domain:: after the last dot
	CREATE OR REPLACE FUNCTION email_domain_type (email) RETURNS text AS
	$$
	   return (split(   # Split the output of split below
	        /\./,  # Split the following on "."
	        (
	            split(
	                /\@/,  # Split whole input on "@"
	                $_[0]
	            )
	        )[-1]  # last part of inner split
	    ))[-1];  # last part of outer split
	$$ LANGUAGE 'plperl' IMMUTABLE STRICT;
	
	-- Reverse Domain
	CREATE OR REPLACE FUNCTION email_reverse_domain(email) RETURNS text AS
	$$
	   return join(".", reverse split(   # Split the output of split below
	        /\./,  # Split the following on "."
	        (
	            split(
	                /\@/,  # Split whole input on "@"
	                $_[0]
	            )
	        )[-1]  # last part of inner split
	    ));
	$$ LANGUAGE 'plperl' IMMUTABLE STRICT;
And of course some tests for the accessor functions.
	elein=# select email, email_login(email) from aliases limit 2;
	           email            | email_login
	----------------------------+-------------
	 bozo@Clowns.com            | bozo
	 Abalone@little.fishies.com | Abalone
	(2 rows)
	
	elein=# select email, email_fdomain(email) from aliases limit 2;
	           email            |   email_fdomain
	----------------------------+--------------------
	 bozo@Clowns.com            | Clowns.com
	 Abalone@little.fishies.com | little.fishies.com
	(2 rows)
	
	elein=# select email, email_domain_type(email) from aliases limit 2;
	           email            | email_domain_type
	----------------------------+-------------------
	 bozo@Clowns.com            | com
	 Abalone@little.fishies.com | com
	(2 rows)
	
	elein=# select email, email_reverse_domain(email) from aliases limit 2;
	           email            | email_reverse_domain
	----------------------------+----------------------
	 bozo@Clowns.com            | com.Clowns
	 Abalone@little.fishies.com | com.fishies.little
	(2 rows)
Notice that the original case is always preserved in storage.

Summary

This is what we did to create the email type under the text type:
  • Created a check function returning a boolean to validate the format of the input.
  • Created a domain, email, based on the text type, using the check function as a constraint.
  • Created a set of comparison, equals and like functions.
  • Associated the comparison, equals and like functions to operators.
  • Created an BTREE based operator class for the email type using the set of operators and the comparison function.
  • Created accessor and manipulation functions for the email type.
When you create your own type, your requirements will be different. If it is based on text, you should be able to use this model fairly directly, substituting your requirements for the check and comparison functions. If your type is based on a numeric type, it should still be straightforward, but you will have to take into account any arithmetic that you want to behave differently from the base type. A complex type, like a geometric type, could also be done, but you may have to use RTREE comparison functions and requirements rather than the simple BTREE functions. The most important thing to know is how your data type will behave in as many circumstances as you can forsee and then write a function to make it so.

Problems and Issues:

  • Creating the table with an email PRIMARY KEY did not use our comparison function. It was necessary to create a unique index which explicitly used the email operator class.
  • ORDER BY requires USING op clause.
  • LIKE requires explicit casting of the second argument to text.
There are convincing arguments for and against this behavior. Feel free to argue one way or the other.

The code for the email type and tests for the email type are available at these links. Special thanks to David Fetter for making the Perl functions better and prettier.

Contributors: elein at varlena.com, david at fetter.org, Martijn van Oosterhout kleptog at svana.org Tom Lane tgl at sss.pgh.pa.us


Comments and Corrections are welcome. Suggestions and contributions of items are also welcome. Send them in!
Copyright A. Elein Mustain 2003, 2004, 2005, 2006, 2007, 2008, 2009

Top
Google
Search General Bits & varlena.com Search WWW