Varlena, LLC | PostgreSQL General Bits Newsletter

PostgreSQL Training,
Consulting & Support

By A. Elein Mustain

21-Aug-2006 Issue: 130

Archives |

General Tidbits |

Google General Bits |

Prev |

General Bits is a column loosely based on the PostgreSQL mailing list pgsql-general.
To find out more about the pgsql-general list and PostgreSQL, see www.PostgreSQL.org.

Gapless Sequences for Primary Keys
[GENERAL] Best approach for a "gap-less" sequence	21-Aug-2006

By defining a column in your table to be of type SERIAL, you can guarantee that column will increase in value and be unique. However, you cannot guarantee that the values will be gapless.

SERIALs are implemented using SEQUENCES. The value of the column is DEFAULTed to the next value of the associated sequence. If there are many connections doing inserts and requesting sequences, they may be allocated in bunches at the connection level. If the connection closes before using all of the allocated sequences, then those remaining will be discarded. This can create a possible gap in the sequences.

In most circumstances a SERIAL works perfectly for primary keys. The exception is when there is an external business rule which says that there must be no gaps in a sequence. For example, German tax law requires invoices to be numbered sequentially without gaps.

What I will show you here is first a simple primary key implemented with a a gapless sequence-like value. Then we will look into multi-part primary keys which also require gapless sequences.

Single Level Gapless Sequence

The first situation will be an employee table which requires a gapless sequential primary key. We will disallow deletes in order to prevent gaps in the sequences. We could also add a trigger to the table to disallow updates to the key fields if we are unsure of the application code. But we'll leave that as an exercise for the reader. (Hint: compare OLD.value and NEW.value )

   -- Employee Data Field
   CREATE TABLE employees
   (
      employee_pk int8 DEFAULT emp_pk_next(), -- Identifies the employee.
      employee_name text,
      -- non-relevant columns omitted
      ex-p_report_sew int4 DEFAULT 0, -- Compound sequence control.
      CONSTRAINT employee_pkey PRIMARY KEY (employee_pk)
   );

CREATE RULE nodel_employees AS ON DELETE TO employees
DO NOTHING;

Next we will create a table with a single row and column emp_pk_counter and initialize it with one row. Then to ensure that this table has one and only one row, we add rules to ignore further inserts and all deletes.

   -- Employee Primary Key Counter Table --
   CREATE TABLE emp_pk_counter
   (       employee_pk int8
   );
   
   -- Initialize table with one row on creation --
   INSERT INTO emp_pk_counter VALUES (0);
   
   -- Disallow further insertions and deletions --
   CREATE RULE noins_emp_pk AS ON INSERT TO emp_pk_counter
   DO NOTHING;
   CREATE RULE nodel_only_emp_pk AS ON DELETE TO emp_pk_counter
   DO NOTHING;

With the counter table set up and in place we want to write a "next" function to return the next value. This is a simple function: an UPDATE followed by the select. It is important to note that the UPDATE will lock the row for the duration of the transaction. No other connection will be able to access the value until this transaction is closed. This gives us the transactional integrity that sequences (SERIAL) does not. But it is a bit slower than SERIAL values. If a second connection is waiting on the counter row, it will wait for the commit of the first transaction and so be ready to add the next value.

   -- Get next available emp_pk value from counter --
   CREATE OR REPLACE FUNCTION emp_pk_next()
   returns int8 AS
   $$
   DECLARE
      next_pk int8;
   BEGIN
      UPDATE emp_pk_counter set employee_pk = employee_pk + 1;
      SELECT INTO next_pk employee_pk from emp_pk_counter;
      RETURN next_pk;
   END;
   $$ LANGUAGE 'plpgsql';

That is all that is necessary for a single column gapless sequence. There is not much to it: a counter table, a default for the column, and a function to return the next value.

Two Level Gapless Sequence

The next part is to create a table of expense reports for the employees. The numbering of the expense reports must be a gapless sequence unique to each employee. That is, employee 7 must have expense reports numbered from 1 to 3 and employee 10 must have expense reports numbered from 1 to 5. The primary key on the employee expenses table is (employee_pk, expense_pk).

The expense table looks like this. Notice it also has a no delete rule to prevent gaps in the sequences.

CREATE TABLE expenses
(
   employee_pk int4 NOT NULL,
   expense_report_pk int4 NOT NULL, -- set by trigger
   expense_descr text NOT NULL,
   -- non-relevant columns omitted 
   CONSTRAINT expense_report_pkey
      PRIMARY KEY (employee_pk, expense_report_pk),
   CONSTRAINT expense_fkey FOREIGN KEY (employee_pk)
      REFERENCES employees (employee_pk)
);
CREATE RULE nodel_expenses AS ON DELETE TO expenses
DO NOTHING;

The counter table for this case is different. It will have one column for each employee and we will store the counter in the employee table rather than have a separate table for counters. That counter column will contain the counter for that employee. The counter column will be added by an ALTER TABLE statement and must default to 0.

   ALTER TABLE employees ADD COLUMN exp_report_counter int8 DEFAULT 0;

The next thing we need is the next function for the employee expense record. This will be different in that it will take as a parameter the primary key of the employee. In order to make this work, we will need the get next counter function called by a trigger function. The trigger function will have access to the value of the employee primary key. These two functions could be collapsed into one trigger function if it suits you better.

CREATE OR REPLACE FUNCTION emp_exp_next(emp_pk int8)
RETURNS int8 AS
$$
DECLARE
   next_pk int8;
BEGIN
   UPDATE employees set exp_report_counter = exp_report_counter + 1
      WHERE employee_pk = emp_pk;
   SELECT INTO next_pk exp_report_counter from employees
      WHERE employee_pk = emp_pk;
   RETURN next_pk;
END;
$$ LANGUAGE 'plpgsql';

CREATE OR REPLACE FUNCTION set_exp_pk ()
RETURNS TRIGGER AS
$$
BEGIN
   NEW.expense_report_pk = emp_exp_next(NEW.employee_pk);
   RETURN NEW;
END
$$ language 'plpgsql';
CREATE TRIGGER set_exp_pk BEFORE INSERT ON expenses
FOR EACH ROW EXECUTE PROCEDURE set_exp_pk();

The primary difference between the single primary key and the multi-part primary key is that the first can be handled with a simple DEFAULT column value. The second must be handled by a trigger which has the other parts of the primary key available.

There should be no gaps or deadlocks using this code. There may be contention for the tables with the counters in them, but this should not slow things down too much. If anyone has examples with heavy concurrency that do not show this, I would be interested in hearing about it.

The code and test data for this example can be found in 130_code.sql 130_test.sql 130_test2.sql

Jorge Godoy jgodoy at gmail.com, Ron Johnson ron.l.johnson at cox.net, Chris dmagick at gmail.com, Michael Fuhr mike at fuhr.org, Alvaro Herrera alvherre at commandprompt.com, Harald Fuchs hf0731x at protecting.net, Richard Broersma Jr rabroersma at yahoo.com, Scott Ribe scott_ribe at killerbytes.com, Berend Tober btober at seaworthysys.com, Brad Nicholson bnichols at ca.afilias.info, Adrian Klaver aklaver at comcast.net, Christian Kratzer ck-lists at cksoft.de, Dawid Kuroczko qnex42 at gmail.com. Merlin Moncure mmoncure at gmail.com. Elein Mustain elein at varlena.com

Comments and Corrections are welcome. Suggestions and contributions of items are also welcome. Send them in!

Top