|
||||||||||||
|
||||||||||||
|
||||||||||||
|
||||||||||||
|
||||||||||||
Updated 27-Feb-2006 Base type inheritance is useful when you want to inherit some or all of the parent type's functions, including input and output. For example, if you want a text type, but want to have it in a particular format and be able to override the equality and comparison operators. Type inheritance for base types should work by inheriting all of the parent's definitions and overriding them only as necessary. What I will try to do is to demonstrate that we can weasel this functionality using domains and only SQL and plperl. In an ideal implementation CREATE DOMAINS would work the same as CREATE TYPE foo UNDER bar. The CREATE TYPE foo UNDER bar should have bar inherit all of the operators and functions from foo unless they are overridden. In this exercise we will create an email type under the text type using a domain, functions, operators and an operator class. This creates the elements of a type inheritance. All of the implementation will be in either plperl (good for parsing) and SQL (keeping things as simple as possible). The point is to create a base type, not a composite type using only procedural languages. In creating a type there are many things to consider.
Let us start with the domain definition and its check constraint. The following statement creates email as a type under type text, inheriting all of its functions and operators, etc. The CHECK constraint is a function which validates the format of the value, making sure it is an email format. CREATE DOMAIN email AS text CHECK ( isemail( VALUE) );We will need to create the isemail() function first! For this we will use perl because it matches strings very nicely. And we will use a "pretty good" regular expression we found in a Cookbook at Active State Programmers Network. For a proper validation of email addresses you should use Email::Valid and RFC::RFC822::Address perl modules. (Thank goodness for pre-written perl recipes!) This plperl function runs the input through a regular expression match to determine if it is in the right format for email. If it is the correct format it returns TRUE otherwise it returns FALSE. If it is passed NULL, a NULL is returned--this we assert by stating the function is STRICT. The function is also IMMUTABLE which means that given the same input it always returns the same output. -- -- Check constraint isemail for email base type -- Using "pretty good" but not strict rfc compliant regex -- CREATE OR REPLACE FUNCTION isemail(text) RETURNS BOOLEAN AS $$ if ( $_[0] =~ m/ ^ # Start. ( # Start of $1 [A-Z0-9]+ # Must start with some A-Z or 0-9s. [._]? # May have a . or an _ after that ) # End of $1 {1,} # One or more of $1. [A-Z0-9]+ # More alphanumerics \@ # Literal @ sign ( # Start of $2 ( # Start of $3 [A-Z0-9]+ # More alphanumerics [-]? # At most one - sign. ) # End of $3 {1,} # One or more of $3 [A-Z0-9]+\. # More alphanumerics ) # End of $2 {1,} # One or more of $2. [A-Z]{2,4} # 2-4 letters $ # End. /ix ) { return TRUE; } else { return FALSE; } $$ LANGUAGE 'plperl' STRICT IMMUTABLE;OK, now the CREATE DOMAIN should succeed. Let us test the function. Our table aliases contains an email and a text field. We'll test a CAST, a bad email format and direct calls to the isemail() function. elein=# select '-x-'::email; ERROR: value for domain email violates check constraint "email_check" elein=# insert into aliases values ('***', 'BAD EMAIL'); ERROR: value for domain email violates check constraint "email_check" elein=# select isemail( '555@123.com'); isemail --------- t (1 row) elein=# select isemail( '555@123#com'); isemail --------- f (1 row)So far so good. We cannot cast or insert into a email typed column unless the data is formatted correctly. The function works as expected. The next criteria we need to address is that all comparisons of email should be case insensitive. There are several aspects of these comparisons to consider. Can we create a UNIQUE index? Can we use ORDER BY? How would LIKE work? The implementation is fairly simple, but the use is a tiny bit quirky. The implementation of the comparison and equality operators simply needed to leverage the text versions of these functions operating on the lower case version of data. This should finesse any issues of encoding and locale. Here are a few of the functions. The rest can be found in the full code. The functions we implemented are: comparison, equals, greater than, greater than or equals, less than and less than equals. We also added a "like" function so we could do case insensitive like comparisons. The thing to notice about the like function is that its second parameter is of type text. This is because a partial match string like 'bozo%' is not an email type according to our definition of an email type. -- cmp: 0 if equal, 1 if greater than, -1 if less than CREATE OR REPLACE FUNCTION email_cmp(email,email) RETURNS INTEGER AS $$ SELECT CASE WHEN lower($1) = lower($2) THEN 0 WHEN lower($1) > lower($2) THEN 1 ELSE -1 END; $$ LANGUAGE 'SQL' IMMUTABLE STRICT; -- Equals: lower(text) = lower(text) CREATE OR REPLACE FUNCTION email_eq (email, email) RETURNS boolean AS $$ SELECT CASE WHEN email_cmp($1, $2) = 0 THEN TRUE ELSE FALSE END; $$ LANGUAGE 'SQL' IMMUTABLE STRICT; -- Greater than: lower(text) > lower(text) CREATE OR REPLACE FUNCTION email_gt (email, email) RETURNS boolean AS $$ SELECT CASE WHEN email_cmp($1, $2) > 0 THEN TRUE ELSE FALSE END; $$ LANGUAGE 'SQL' IMMUTABLE STRICT; --Like CREATE OR REPLACE FUNCTION email_like(email, text) RETURNS boolean AS $$ SELECT lower($1) LIKE lower($2); $$ LANGUAGE 'SQL' IMMUTABLE STRICT;As you can see the functions are written as simply as possible in SQL. They are all STRICT, returning NULL on NULL input and they are all IMMUTABLE. Once we have the comparison functions, we need to tie them to the appropriate operators. This is so that we use the operator instead of calling the function explicity. The CREATE OPERATOR statement ties a symbol to a function and specifies its right and left argument types. The LIKE keyword is tied to the operator '~~' so we will tie that operator to our like function. -- -- Operators -- CREATE OPERATOR = ( PROCEDURE = email_eq, LEFTARG = email, RIGHTARG = email); CREATE OPERATOR > ( PROCEDURE = email_gt, LEFTARG = email, RIGHTARG = email); CREATE OPERATOR >= ( PROCEDURE = email_gte, LEFTARG = email, RIGHTARG = email); CREATE OPERATOR < ( PROCEDURE = email_lt, LEFTARG = email, RIGHTARG = email); CREATE OPERATOR <= ( PROCEDURE = email_lte, LEFTARG = email, RIGHTARG = email); CREATE OPERATOR ~~ ( PROCEDURE = email_like, LEFTARG = email, RIGHTARG = text);Each of these operators are tested in statements like this. The full set of tests can be found in the test code. Notice that it must be using our equality function because the comparison is case insensitive. elein=# select 'PK@unique.com'::email = 'pk@UNIQUE.com'::email; ?column? ---------- t (1 row) In testing LIKE, we found some anomolies. When the right hand value is not cast explicitly to text, then the text like function is used instead of the email like function. These tests demonstrate the varied behavior of LIKE, ~~ and direct calls to the like function. This is a bug--but it may or may not be fixed. In the meanwhile we will cast the right operand to text or use the like function directly as necessary. The full set of tests can be found in the test code.
\echo select column like value: works select lname, email from aliases where email like 'BOZO%'::text; select lname, email from aliases where email ~~ 'BOZO%'::text; select lname, email from aliases where email like lname || '%'; \echo direct function all to email_like works without casts select lname, email from aliases where email_like(email, 'BOZO%'); select lname, email from aliases where email_like(email, lname || '%'); \echo EXPECT NO ROWS: LIKE and ~~ with unknown Right type constants dont work select lname, email from aliases where email like 'BOZO%'; select lname, email from aliases where email ~~ 'BOZO%'; In order to create a UNIQUE index on our test table, we will need to create an index which uses these operators and knows about the comparison function. Indexes do this by using OPERATOR CLASSES based on an index type, for example, BTREE or RTREE. An operator class groups a set of operators for a type or types. Our parent type, text, uses text_ops OPERATOR CLASS for BTREE indexes. We want to set up a BTREE type of index with email_ops. The OPERATOR CLASS for BTREEs requires the functions and operators we just defined and the comparison function we snuck in when we wrote the operator functions. We will group those operators into an email operator class. -- -- Operator Class -- CREATE OPERATOR CLASS email_ops DEFAULT FOR TYPE email USING BTREE AS OPERATOR 1 <, OPERATOR 2 <=, OPERATOR 3 =, OPERATOR 4 >=, OPERATOR 5 >, FUNCTION 1 email_cmp(email,email) ;The operator class specifies it is using a BTREE indexing type. The operators for the BTREE are, in order, less than, less than or equal to, equal, greater than or equal to and greater than. BTREES also require one function, the comparison function. The numbering of these operations, in order, is pertinent. We could be using different operator symbols than the traditional ones, but that would confuse things. The operator class is tied to the operators which are tied in turn to each function. Now on our test table aliases, we can add a unique index. CREATE UNIQUE INDEX aliases_email ON aliases USING BTREE (email email_ops);And let us test it. We want it to reject duplicate keys based on the lower cased values of the email. elein=# \echo expect PK ERROR expect PK ERROR elein=# insert into aliases values ('pk@email.com', 'PK'); INSERT 0 1 elein=# insert into aliases values ('PK@email.com', 'PK'); ERROR: duplicate key violates unique constraint "aliases_email" Notice that we are creating an explicit UNIQUE INDEX instead of using the PRIMARY KEY syntax in CREATE TABLE. We are doing this because the PRIMARY KEY syntax has no place for specifing an OPERATOR CLASS on the PRIMARY KEY. This is a missing feature which may or may not be fixed. Now let us look at sort. We found that when we did a plain sort, it defaulted to the text comparison function. We know this because the values are sorted with capitalized letters first. Then we tried using the USING operator clause on the ORDER BY command. In that case it was sorted correctly. elein=# select * from aliases order by email; email | lname ----------------------------+---------- Abalone@little.fishies.com | Abe Carolina@my.mind.edu | Carolina DDuck@ducks.disney.com | Donald bozo@Clowns.com | Boz et@galaxy.net | ET frodo@TheShire.net | Frodo galahad@camelot.org | Sir pk@email.com | PK (8 rows) elein=# select * from aliases order by email using <; email | lname ----------------------------+---------- Abalone@little.fishies.com | Abe bozo@Clowns.com | Boz Carolina@my.mind.edu | Carolina DDuck@ducks.disney.com | Donald et@galaxy.net | ET frodo@TheShire.net | Frodo galahad@camelot.org | Sir pk@email.com | PK (8 rows)In this case ORDER BY does have a way to let us tell it which operator to use. We will have to remember to always use the USING operator clause with the ORDER BY clause when sorting our email type. To ensure that our email datatype is useful, we've added some accessor functions. These perl functions extract the login name, the full domain, the domain type and the reverse domain. For any datatype you create, you may want manipulator functions like these that can be called directly in SQL. -- -- Accessor functions -- -- login: before the @ sign CREATE OR REPLACE FUNCTION email_login (email) RETURNS text AS $$ return (split(/\@/, $_[0]))[0]; $$ LANGUAGE 'plperl' IMMUTABLE STRICT; -- Full Domain: after the @ sign CREATE OR REPLACE FUNCTION email_fdomain (email) RETURNS text AS $$ return (split /\@/, $_[0])[-1]; $$ LANGUAGE 'plperl' IMMUTABLE STRICT; -- Top Level Domain:: after the last dot CREATE OR REPLACE FUNCTION email_domain_type (email) RETURNS text AS $$ return (split( # Split the output of split below /\./, # Split the following on "." ( split( /\@/, # Split whole input on "@" $_[0] ) )[-1] # last part of inner split ))[-1]; # last part of outer split $$ LANGUAGE 'plperl' IMMUTABLE STRICT; -- Reverse Domain CREATE OR REPLACE FUNCTION email_reverse_domain(email) RETURNS text AS $$ return join(".", reverse split( # Split the output of split below /\./, # Split the following on "." ( split( /\@/, # Split whole input on "@" $_[0] ) )[-1] # last part of inner split )); $$ LANGUAGE 'plperl' IMMUTABLE STRICT;And of course some tests for the accessor functions. elein=# select email, email_login(email) from aliases limit 2; email | email_login ----------------------------+------------- bozo@Clowns.com | bozo Abalone@little.fishies.com | Abalone (2 rows) elein=# select email, email_fdomain(email) from aliases limit 2; email | email_fdomain ----------------------------+-------------------- bozo@Clowns.com | Clowns.com Abalone@little.fishies.com | little.fishies.com (2 rows) elein=# select email, email_domain_type(email) from aliases limit 2; email | email_domain_type ----------------------------+------------------- bozo@Clowns.com | com Abalone@little.fishies.com | com (2 rows) elein=# select email, email_reverse_domain(email) from aliases limit 2; email | email_reverse_domain ----------------------------+---------------------- bozo@Clowns.com | com.Clowns Abalone@little.fishies.com | com.fishies.little (2 rows)Notice that the original case is always preserved in storage.
SummaryThis is what we did to create the email type under the text type:
Problems and Issues:
The code for the email type and tests for the email type are available at these links. Special thanks to David Fetter for making the Perl functions better and prettier.
|
||||||||||||
Comments and Corrections are welcome. Suggestions and contributions of items are also welcome. Send them in! Copyright A. Elein Mustain 2003, 2004, 2005, 2006, 2007, 2008, 2009 |