Varlena, LLC | PostgreSQL General Bits Newsletter

PostgreSQL Training,
Consulting & Support

By A. Elein Mustain

29-Aug-2005 Issue: 118

Archives |

General Tidbits |

Google General Bits |

Prev |

General Bits is a column loosely based on the PostgreSQL mailing list pgsql-general.
To find out more about the pgsql-general list and PostgreSQL, see www.PostgreSQL.org.

Overlapping Revisions
Finding and Preventing Overlapping Revisions	27-Aug-2005

A revision table usually has an item and a start date and and end date. An end_date of NULL usually means that the item is the most current version of the item.

	DROP TABLE items_118;
	CREATE TABLE items_118 (
   	item_id  SERIAL,
   	start_date  DATE,
   	end_date DATE,
	PRIMARY KEY (item_id, start_date)
	);

Selecting out the latest version is easy but we have to account for both open and close ended date intervals. Choose the ones with an open end date or a where today is between the start and end date.

	SELECT item_id, max(start_date), end_date
	FROM items_118
	WHERE end_date IS NULL
	   OR (end_date IS NOT NULL AND end_date >= current_date )
	GROUP BY item_id, end_date
	ORDER BY item_id;

Unfortunately this table definition alone does not prevent input of overlapping date intervals for any given item. We find items in the data with values like these:

	=# select * from items_118;
	 item_id | start_date |  end_date
	---------+------------+------------
	       1 | 2005-01-01 | 2005-03-01
	       1 | 2005-02-01 | 2005-03-15
	       1 | 2005-03-01 |
	       2 | 2005-04-01 | 2005-06-01
	       2 | 2005-05-01 |
	       3 | 2005-07-01 |
	       4 | 2005-08-01 |
	       5 | 2005-08-01 | 2005-12-01
	(8 rows)

Notice the overlaps in items 1 and 2. Oops. The first thing we need to do is to be able to see the overlaps clearly and then we need to fix them. To see which item overlaps which item, we want to draw together the item (first) and the item it overlaps with (second). This means a self join on the item_id. To identify the overlapping item we want the one where the first start date is before the second's start date and the second's end date is null. Or if the second's end date is not null, then if our end date is after its start date. Pay attention to this query. We will use it later.

	SELECT i.item_id, i.start_date, i.end_date,
	       i2.start_date as overlap_start_date, i2.end_date as overlap_end_date
	FROM items_118 i JOIN items_118 i2 USING (item_id)
	WHERE i.start_date < i2.start_date AND
	     (i.end_date IS NULL OR i.end_date > i2.start_date);

If our data set were just this small we could do a few updates by hand and be done with it. However, if you are dealing with a great many records, it is worthwhile to write the not so trivial update statement. Using the query we created above, we select out the old and new start and end dates. If the item_id and start_date matches our overlapped record, we update the record's end_date with the new start date minus one day.

	UPDATE items_118 SET end_date = new_start - interval '1 day'
	FROM (
	   SELECT i.item_id, i.start_date AS old_start, i.end_date AS old_end,
	      i2.start_date AS new_start, i2.end_date as new_end
	   FROM items_118 i JOIN items_118 i2 USING (item_id)
	   WHERE i.start_date < i2.start_date AND
	      (i.end_date IS NULL OR i.end_date > i2.start_date)
	) foo
	WHERE items_118.item_id = foo.item_id
	AND items_118.start_date = old_start;

And then we double check that was right by looking at the overlapping records query.

When you have a problem like this, it is correctable. But why not prevent it in the first place. An insert trigger can go back and fix an existing overlapping record right away, preventing the overlapping from occurring at all.

The trigger function does much the same thing as the UPDATE statement did. It uses the same query we used before to bring up any overlapping record. If it find one, it updates the older record before committing the update on the new one.

	CREATE OR REPLACE FUNCTION no_overlap()
	RETURNS TRIGGER AS
	$$
	   DECLARE
	      r record;
	   BEGIN
	      SELECT INTO r
	         i.item_id, i.start_date AS old_start, i.end_date AS old_end,
	         i2.start_date AS new_start, i2.end_date AS new_end
	      FROM items_118 i JOIN items_118 i2 USING (item_id)
	      WHERE i.item_id = NEW.item_id AND i.start_date < i2.start_date AND
	         (i.end_date IS NULL OR i.end_date > i2.start_date);

	      IF FOUND AND r.old_end IS NULL OR r.old_end > r.new_start THEN
	         UPDATE items_118 SET end_date = r.new_start - interval '1 day'
	         WHERE r.item_id=items_118.item_id 
	            AND r.old_start = items_118.start_date;
	      END IF;

	RETURN NEW;
	END;
	$$ LANGUAGE 'plpgsql';

	CREATE TRIGGER nooverlaps AFTER INSERT ON items_118
	FOR EACH ROW EXECUTE PROCEDURE no_overlap();

Now any further updates of overlapping date intervals will be fixed and NULL ended end_dates will be set to the correct end date when the next revision is inserted..

The entire code for this article is available.

Contributors: elein at varlena.com

Generating Scheduling Entries
[sfpug] Calculating the number of months between two dates 28-Aug-2005

For scheduling applications is is sometimes necessary to generate a set of dates and times for repeating appointments. For example, you may want to schedule something for every two weeks, or every six months or every two days.

The interval arithmetic is not that obvious, but once you know the trick it is simple. timestamp + (N * interval '1 day') will get you the next day, same time. Using the multiplication on the interval of one day (or second or month, etc.) makes it easier to deal with the difficult construction of an interval value.

To start off, we wrote a function that generates dates between a start date and an end date. We threw in an increment so that we could have every two days or every seven days. It is the basic for(n=start; i < end ; n+=increment) formula loop. Notice that this is a set returning function that returns a date.

	-- Generate dates for recurring intervals
	
	CREATE OR REPLACE FUNCTION every_day(start_date date,end_date date,incr integer)
	RETURNS SETOF date AS
	$$
	DECLARE
	   N  integer=0;
	   next_date date = start_date;
	BEGIN
	   WHILE end_date > next_date LOOP
	      RETURN NEXT next_date;
	      next_date = start_date + (N + incr) * interval '1 day';
	      N = N + incr;
	   END LOOP;
	   RETURN;
	END;
	$$
	LANGUAGE 'plpgsql';
	-- every 2 days from today for 10 days.
	=# select * from every_day( current_date, current_date + 10, 2 );
	 every_day
	------------
	 2005-08-28
	 2005-08-30
	 2005-09-01
	 2005-09-03
	 2005-09-05
	(5 rows)

That is helpful. We could write an every_hour(), every_year(), every_month() function. But that would be tedious. Suppose we passed in the interval unit to a generic every_what() function. The only trick in this is to convert the interval unit into a interval type of one unit. We can do this by declaring a variable of type interval and constructing its assignment from the unit parameter.

	CREATE OR REPLACE FUNCTION every_what(    
		start_date date,end_date date,incr integer, unit text)
	RETURNS SETOF date AS
	$$
	DECLARE
	   N  integer=0;
	   next_date date = start_date;
	   int_type interval = '1 ' || unit;
	BEGIN
	   WHILE end_date > next_date LOOP
	      RETURN NEXT next_date;
	      next_date = start_date + (N + incr) * int_type;
	      N = N + incr;
	   END LOOP;
	   RETURN;
	END;
	$$
	LANGUAGE 'plpgsql';
	
	-- select this day for each month for five months
	
	=#  select * from every_what( 
		current_date, (current_date + interval '5 months')::date, 1, 'months' );
	 every_what
	------------
	 2005-08-28
	 2005-09-28
	 2005-10-28
	 2005-11-28
	 2005-12-28
	(5 rows)

The functions here are written for date. If you write them for timestamp, you can cast the result to date, thus making the function even more versatile. This is left as an exercise for the reader.

Contributors: Brian Ghidinelli brian at pukkasoft.com, elein at varlena.com

Comments and Corrections are welcome. Suggestions and contributions of items are also welcome. Send them in!

Top