Copyright © 2004 Jonathan Gardner (jgardner@jonathangardner.net). All rights reserved. Contact me if you'd like to copy or distribute this document.
Materialized views are certainly possible in PostgreSQL. Because of PostgreSQL's powerful PL/pgSQL language, and the functional trigger system, materialized views are somewhat easy to implement. I will examine several methods of implementing materialized views in PostgreSQL.
now()
.now()
.I assume you are not new to PostgreSQL. I assume that you have a fairly solid understanding of how a database works, and in particular, the PostgreSQL database. I also assume you are comfortable with PL/pgSQL, PostgreSQL's SQL syntax, and tools to access and modify the database. All the information you need is in the PostgreSQL documentation.
Note that unless otherwise noted, all of the commands are executed in the database as a root user. If you know what you are doing, you don't have to do this as a root user. If you don't know what a root user is, or how to create one, then read the previous paragraph.
Snapshot materialized views are very easy to implement. They will serve as a foundation for all other materialized views I discuss in this article.
I use a system where the materialized view is based off of a view. I assume that the view definition will never change. All bets are off if it does. Of course, I do expect that the data in the view will change as data in the database is modified.
CREATE TABLE matviews ( mv_name NAME NOT NULL PRIMARY KEY , v_name NAME NOT NULL , last_refresh TIMESTAMP WITH TIME ZONE );
matviews
TableI create a table called matviews
to store the information
about a materialized view.
mv_name
v_name
last_refresh
create_matview
FunctionHere is a function written in PL/pgSQL to insert a row into the
matviews
table and to create the materialized view. Pass in the
name of the materialized view, and the name of the view that it is based on.
Note that you have to create the view first, of course.
This function will see if a materialized view with that name is already
created. If so, it raises an exception. Otherwise, it creates a new table from
the view, and inserts a row into the matviews
table.
CREATE OR REPLACE FUNCTION create_matview(NAME, NAME) RETURNS VOID SECURITY DEFINER LANGUAGE plpgsql AS ' DECLARE matview ALIAS FOR $1; view_name ALIAS FOR $2; entry matviews%ROWTYPE; BEGIN SELECT * INTO entry FROM matviews WHERE mv_name = matview; IF FOUND THEN RAISE EXCEPTION ''Materialized view ''''%'''' already exists.'', matview; END IF; EXECUTE ''REVOKE ALL ON '' || view_name || '' FROM PUBLIC''; EXECUTE ''GRANT SELECT ON '' || view_name || '' TO PUBLIC''; EXECUTE ''CREATE TABLE '' || matview || '' AS SELECT * FROM '' || view_name; EXECUTE ''REVOKE ALL ON '' || matview || '' FROM PUBLIC''; EXECUTE ''GRANT SELECT ON '' || matview || '' TO PUBLIC''; INSERT INTO matviews (mv_name, mv_view, last_refresh) VALUES (matview, view_name, CURRENT_TIMESTAMP); RETURN; END ';
drop_matview
FunctionIf there is a function to create, there must be a function to destroy.
drop_matview
only drops the materialized view and removes the
entry from matviews
. It will leave the view alone.
CREATE OR REPLACE FUNCTION drop_matview(NAME) RETURNS VOID SECURITY DEFINER LANGUAGE plpgsql AS ' DECLARE matview ALIAS FOR $1; entry matviews%ROWTYPE; BEGIN SELECT * INTO entry FROM matviews WHERE mv_name = matview; IF NOT FOUND THEN RAISE EXCEPTION ''Materialized view % does not exist.'', matview; END IF; EXECUTE ''DROP TABLE '' || matview; DELETE FROM matviews WHERE mv_name=matview; RETURN; END ';
refresh_matview
FunctionFinally, you need a way to refresh the materialized views so that the data does not become completely stale. This function only needs the name of the matview. It uses a brute-force algorithm that will delete all the rows and reinsert them from the view. Note that you may want to drop the indexes on you materialized view before executing this, and recreate them when it finished.
CREATE OR REPLACE FUNCTION refresh_matview(name) RETURNS VOID SECURITY DEFINER LANGUAGE plpgsql AS ' DECLARE matview ALIAS FOR $1; entry matviews%ROWTYPE; BEGIN SELECT * INTO entry FROM matviews WHERE mv_name = matview; IF NOT FOUND THEN RAISE EXCEPTION ''Materialized view % does not exist.'', matview; END IF; EXECUTE ''DELETE FROM '' || matview; EXECUTE ''INSERT INTO '' || matview || '' SELECT * FROM '' || entry.mv_view; UPDATE matviews SET last_refresh=CURRENT_TIMESTAMP WHERE mv_name=matview; RETURN; END ';
Let's pretend you are running a database that has the following tables and views:
CREATE TABLE player ( pname VARCHAR(255) PRIMARY KEY ); CREATE TABLE game_score ( pname VARCHAR(255) NOT NULL, score INTEGER NOT NULL ); CREATE VIEW player_total_score_v AS SELECT pname, sum(score) AS total_score FROM game_score GROUP BY pname;
Since a lot of players play games each day, and running the view is kind of
expensive, you decide you want to implement a materialized view on
player_total_score_v
. This is done with the following
command.
SELECT create_matview('player_total_score_mv', 'player_total_score_v'); CREATE INDEX pname_idx ON player_total_score_mv(pname);
Every night (or every hour, depending on how eager the players are to see their score), you can refresh the materialized view with the following command.
DROP INDEX pname_idx ON player_total_score_mv; SELECT refresh_matview('player_total_score_mv'); CREATE INDEX pname_idx ON player_total_score_mv(pname);
Since many players are always playing, many rows in the table will be changed. Also, players don't need their scores updated whenever they finish a game. But the players are constantly querying for their scores, with the understanding that the scores are updated every night. This is a good approach that will improve performance tremendously.
An eager materialized view will be updated whenever the view changes. This
is done with a system of triggers on all of the underlying tables.
Dependencies on mutable functions (like now()
) will cause the
materialized view to become corrupt, but that can be corrected with minor
refreshes, that only refresh affected rows.
First, let's step back and consider what actually makes up the data in a materialized view, or where the data in the view is coming from. Obviously, information in the materialized view will come from any tables mentioned in the view definition. If the view definition relies on other views, then we'll have to consider all the tables that compose that view as well.
Now we need to consider how the data in the underlying table relates to or affects the data in the materialized view.
The simplest case is a one-to-one relation. The view is merely selecting
all or some of the columns from a table. Mutiple rows in the view do not
depend on the same row in the underlying table. In the example below, if the
username
of a single user changes, then only one row in the
user_v
will change.
CREATE VIEW user_v AS SELECT username, password AS '******', uid FROM users;
A more complicated case is many-to-one. Many rows in the view depend on a
single row in an underlying relation. This is most common when you are joining
two tables. In the example below, if the groupname
changes, many
rows in the user_group_v
may change.
CREATE VIEW user_group_v AS SELECT username, groupname FROM groups, users WHERE users.group = groups.group;
Another complicated case is one-to-many. One row in the view is derived
from multiple rows in the underlying table. This is most common with
aggregates. In this example, the total_score
is derived from many
rows in game_score
.
CREATE VIEW player_total_score_v AS SELECT pname, sum(score) AS total_score FROM game_score;
There is also some rare cases where many rows in the view will be affected by many rows in the underlying tables. We need to keep in mind all the cases as we design the functions and triggers below. If one row changes in the underlying table, many rows in the materialized view may be affected.
mv_refresh_row
FunctionWe first need to design an mv_refresh_row
function. I don't
know how to make a generic function that will work for all materialized views,
so we have to hand-craft one for each materialized view we create. It isn't
hard to do. This simple algorithm will get you through this process.
We'll see an example below.
mv_refresh
FunctionIf the view relies on some mutable functions, then you will have to run a refresh function that will only refresh those rows that are affected. Most commonly, this occurs when there is some sort of time-dependence.
A lot of thought needs to go into the mv_refresh
function.
There may be a way to write a generic one for all views, but for now, you'll
have to hand-craft your own. I don't even have a generic algorithm to pass
along. I think this is largely dependent on the mutable and how the mutable
behaves.
You will have to create triggers for every action on every underlying
table. I write three triggers for each table, one each for
INSERT
, UPDATE
, and DELETE
. I feel it
is more efficient than writing one function that handles all three cases.
The basic algorithm for the trigger functions follows. I'll show you a concrete example below of these functions.
DELETE
and INSERT
, merely call the
mv_refresh_row
function for each primary key value.UPDATE
, identify whether the update is going to change
which row(s) in the materialized view this row will affect. If so, then
you'll have to refresh rows for both the old and new values. Otherwise,
only the old or new values will do.Apply the triggers so that they are called after the operation is performed.
These examples are long and contrived; I am using them merely to demonstrate how it all works. Don't bother trying to make sense of the tables. I couldn't come up with any real world examples that would show all three instances.
CREATE TABLE a ( a_id INT PRIMARY KEY, v INT ); CREATE TABLE b ( b_id INT PRIMARY KEY, a_id INT REFERENCES a, v INT, expires TIMESTAMP ); CREATE TABLE c ( c_id INT PRIMARY KEY, b_id INT REFERENCES b, v INT ); CREATE VIEW b_v AS SELECT b.b_id AS b_id, a.v AS a_v, b.v AS b_v, sum(c.v) AS sum_c_v FROM a JOIN b USING (a_id) JOIN c USING (b_id) WHERE (b.expires IS NULL OR b.expires >= now()) GROUP BY b.b_id, a.v, b.v; SELECT create_matview('b_mv', 'b_v');
Notice that one row in a
may contribute to multiple rows in
b_v
. Only one row in b
contribute to one row in
b_v
Also, many rows in c
contribute to a single row
in b_v
. The primary key of b_v
is
b_id
. Also, the view has a dependence on the mutable function
now()
.
The mv_refresh_row
function is defined as follows.
CREATE FUNCTION b_mv_refresh_row(b_mv.b_id%TYPE) RETURNS VOID SECURITY DEFINER LANGUAGE 'plpgsql' AS ' BEGIN DELETE FROM b_mv WHERE b_id = $1; INSERT INTO b_mv SELECT * FROM b_v WHERE b_id = $1; RETURN; END ';
The mv_refresh
function is defined as follows. If it weren't
for the dependence on now()
, this would be unnecessary. We store
the last time this was refreshed in the matviews
table.
CREATE FUNCTION b_mv_refresh() RETURNS VOID SECURITY DEFINER LANGUAGE 'plpgsql' AS ' BEGIN PERFORM b_mv_refresh_row(b_id) FROM b, matviews WHERE matviews.mv_name = ''b_mv'' AND b.expires >= matviews.last_refresh AND b.expires < now(); UPDATE matviews SET last_refresh = now() WHERE mv_name = ''b_mv''; RETURN; END ';
The trigger function definitions are long and tedious, but this is where all the magic is.
-- a triggers CREATE FUNCTION b_mv_a_ut() RETURNS TRIGGER SECURITY DEFINER LANGUAGE 'plpgsql' AS ' BEGIN IF OLD.a_id = NEW.a_id THEN PERFORM b_mv_refresh_row(b.b_id) FROM b WHERE b.a_id = NEW.a_id; ELSE PERFORM b_mv_refresh_row(b.b_id) FROM b WHERE b.a_id = OLD.a_id; PERFORM b_mv_refresh_row(b.b_id) FROM b WHERE b.a_id = NEW.a_id; END IF; RETURN NULL; END '; CREATE TRIGGER b_mv_ut AFTER UPDATE ON a FOR EACH ROW EXECUTE PROCEDURE b_mv_a_ut(); CREATE FUNCTION b_mv_a_dt() RETURNS TRIGGER SECURITY DEFINER LANGUAGE 'plpgsql' AS ' BEGIN PERFORM b_mv_refresh_row(b.b_id) FROM b WHERE b.a_id = OLD.a_id; RETURN NULL; END '; CREATE TRIGGER b_mv_dt AFTER DELETE ON a FOR EACH ROW EXECUTE PROCEDURE b_mv_a_dt(); CREATE FUNCTION b_mv_a_it() RETURNS TRIGGER SECURITY DEFINER LANGUAGE 'plpgsql' AS ' BEGIN PERFORM b_mv_refresh_row(b.b_id) FROM b WHERE b.a_id = NEW.a_id; RETURN NULL; END '; CREATE TRIGGER b_mv_it AFTER INSERT ON a FOR EACH ROW EXECUTE PROCEDURE b_mv_a_it(); -- b triggers CREATE FUNCTION b_mv_b_ut() RETURNS TRIGGER SECURITY DEFINER LANGUAGE 'plpgsql' AS ' BEGIN IF OLD.b_id = NEW.b_id THEN PERFORM b_mv_refresh_row(NEW.b_id); ELSE PERFORM b_mv_refresh_row(OLD.b_id); PERFORM b_mv_refresh_row(NEW.b_id); END IF; RETURN NULL; END '; CREATE TRIGGER b_mv_ut AFTER UPDATE ON b FOR EACH ROW EXECUTE PROCEDURE b_mv_b_ut(); CREATE FUNCTION b_mv_b_dt() RETURNS TRIGGER SECURITY DEFINER LANGUAGE 'plpgsql' AS ' BEGIN PERFORM b_mv_refresh_row(OLD.b_id); RETURN NULL; END '; CREATE TRIGGER b_mv_dt AFTER DELETE ON b FOR EACH ROW EXECUTE PROCEDURE b_mv_b_dt(); CREATE FUNCTION b_mv_b_it() RETURNS TRIGGER SECURITY DEFINER LANGUAGE 'plpgsql' AS ' BEGIN PERFORM b_mv_refresh_row(NEW.b_id); RETURN NULL; END '; CREATE TRIGGER b_mv_it AFTER INSERT ON b FOR EACH ROW EXECUTE PROCEDURE b_mv_b_it(); -- c triggers CREATE FUNCTION b_mv_c_ut() RETURNS TRIGGER SECURITY DEFINER LANGUAGE 'plpgsql' AS ' BEGIN IF OLD.b_id = NEW.b_id THEN PERFORM b_mv_refresh_row(NEW.b_id); ELSE PERFORM b_mv_refresh_row(OLD.b_id); PERFORM b_mv_refresh_row(NEW.b_id); END IF; RETURN NULL; END '; CREATE TRIGGER b_mv_ut AFTER UPDATE ON c FOR EACH ROW EXECUTE PROCEDURE b_mv_c_ut(); CREATE FUNCTION b_mv_c_dt() RETURNS TRIGGER SECURITY DEFINER LANGUAGE 'plpgsql' AS ' BEGIN PERFORM b_mv_refresh_row(OLD.b_id); RETURN NULL; END '; CREATE TRIGGER b_mv_dt AFTER DELETE ON c FOR EACH ROW EXECUTE PROCEDURE b_mv_c_dt(); CREATE FUNCTION b_mv_c_it() RETURNS TRIGGER SECURITY DEFINER LANGUAGE 'plpgsql' AS ' BEGIN PERFORM b_mv_refresh_row(NEW.b_id); RETURN NULL; END '; CREATE TRIGGER b_mv_it AFTER INSERT ON c FOR EACH ROW EXECUTE PROCEDURE b_mv_c_it();
The lazy materialized view would record which rows in the materialized views need to be updated, and update them when the transaction is committed. This useful if many change will affect the same rows, and will also allow those changes to be made much more quickly.
Currently, I know of no way to put a hook in when the transaction is committed. Pending that development, this materialized view scheme cannot be implemented.
Very lazy materialized views would record which rows in the materialized view need to be updated, but won't update until directed to. This would be useful if you are committing multiple transactions that affect the materialized views, but don't want to actually update the materialized view until later. Does this sound familiar? It should, because it is functionally equivalent to the snapshot materialized view I described earlier.
CREATE TABLE matview_changes ( mv_oid OID PRIMARY KEY , pkey INTEGER NOT NULL );
matview_changes
Replacing the mv_refresh_row
function with one that merely
records the change to be made at a later time is pretty easy. First, we need a
table to store our changes -- one that could store any change to any
materialized view (assuming the primary key is a single column of integers).
matview_queue_refresh_row()
FunctionNext, we need to create a generic matview_queue_refresh_row()
function that inserts the primary key into matview_changes
if it
is not already present.
CREATE FUNCTION matview_queue_refresh_row(NAME, INTEGER) RETURNS VOID SECURITY DEFINER LANGUAGE 'plpgsql' AS ' DECLARE mv OID; test INTEGER; BEGIN SELECT INTO mv oid FROM matviews WHERE mv_name = $1; SELECT INTO test pkey FROM matview_changes WHERE matview_changes.mv_oid = mv AND matview_changes.pkey = $2; IF NOT FOUND THEN INSERT INTO matview_changes (mv_oid, pkey) VALUES (mv, $2); END IF; RETURN NULL; END ';
The very lazy materialized view works mostly like the eager materialized view system, with a few modification. Apply the following changes on top of the eager materialized view setup above.
matview_queue_refresh_row()
rather than
mv_refresh_row()
.mv_refresh()
so
that it first applies the mutable-function dependencies first, and then
performs the actual changes in bulk with mv_refresh_row()
,
deleting all the changes from the matview_changes
table.Now that refreshing the materialized view won't be so expensive, we can look at some alternative refresh strategies that are out of the question for snapshot materialized views.
Borrowing from our eager example, we are now going to implement the
triggers and mv_refresh()
function so that it uses the very lazy
updating technique. It also implements a rule that will refresh the
materialized view if 10 seconds have passed since the last refresh.
-- b_mv_refresh() CREATE OR REPLCE FUNCTION b_mv_refresh() RETURNS VOID SECURITY DEFINER LANGUAGE 'plpgsql' AS ' DECLARE mv OID; BEGIN SELECT INTO mv oid FROM matviews WHERE mv_name = ''b_mv''; PERFORM b_mv_refresh_row(b_id) FROM b, matviews WHERE matviews.oid = mv AND b.expires >= matviews.last_refresh AND b.expires < now(); PERFORM b_mv_refresh_row(pkey) FROM matview_changes WHERE mv_oid = mv; UPDATE matviews SET last_refresh = now() WHERE mv_name = ''b_mv''; END '; -- a triggers CREATE FUNCTION b_mv_a_ut() RETURNS TRIGGER SECURITY DEFINER LANGUAGE 'plpgsql' AS ' BEGIN IF OLD.a_id = NEW.a_id THEN PERFORM matview_queue_refresh_row(''b_mv'', b.b_id) FROM b WHERE b.a_id = NEW.a_id; ELSE PERFORM matview_queue_refresh_row(''b_mv'', b.b_id) FROM b WHERE b.a_id = OLD.a_id; PERFORM matview_queue_refresh_row(''b_mv'', b.b_id) FROM b WHERE b.a_id = NEW.a_id; END IF; RETURN NULL; END '; CREATE TRIGGER b_mv_ut AFTER UPDATE ON a FOR EACH ROW EXECUTE PROCEDURE b_mv_a_ut(); CREATE FUNCTION b_mv_a_dt() RETURNS TRIGGER SECURITY DEFINER LANGUAGE 'plpgsql' AS ' BEGIN PERFORM matview_queue_refresh_row(''b_mv'', b.b_id) FROM b WHERE b.a_id = OLD.a_id; RETURN NULL; END '; CREATE TRIGGER b_mv_dt AFTER DELETE ON a FOR EACH ROW EXECUTE PROCEDURE b_mv_a_dt(); CREATE FUNCTION b_mv_a_it() RETURNS TRIGGER SECURITY DEFINER LANGUAGE 'plpgsql' AS ' BEGIN PERFORM matview_queue_refresh_row(''b_mv'', b.b_id) FROM b WHERE b.a_id = NEW.a_id; RETURN NULL; END '; CREATE TRIGGER b_mv_it AFTER INSERT ON a FOR EACH ROW EXECUTE PROCEDURE b_mv_a_it(); -- b triggers CREATE FUNCTION b_mv_b_ut() RETURNS TRIGGER SECURITY DEFINER LANGUAGE 'plpgsql' AS ' BEGIN IF OLD.b_id = NEW.b_id THEN PERFORM matview_queue_refresh_row(''b_mv'', NEW.b_id); ELSE PERFORM matview_queue_refresh_row(''b_mv'', OLD.b_id); PERFORM matview_queue_refresh_row(''b_mv'', NEW.b_id); END IF; RETURN NULL; END '; CREATE TRIGGER b_mv_ut AFTER UPDATE ON b FOR EACH ROW EXECUTE PROCEDURE b_mv_b_ut(); CREATE FUNCTION b_mv_b_dt() RETURNS TRIGGER SECURITY DEFINER LANGUAGE 'plpgsql' AS ' BEGIN PERFORM matview_queue_refresh_row(''b_mv'', OLD.b_id); RETURN NULL; END '; CREATE TRIGGER b_mv_dt AFTER DELETE ON b FOR EACH ROW EXECUTE PROCEDURE b_mv_b_dt(); CREATE FUNCTION b_mv_b_it() RETURNS TRIGGER SECURITY DEFINER LANGUAGE 'plpgsql' AS ' BEGIN PERFORM matview_queue_refresh_row(''b_mv'', NEW.b_id); RETURN NULL; END '; CREATE TRIGGER b_mv_it AFTER INSERT ON b FOR EACH ROW EXECUTE PROCEDURE b_mv_b_it(); -- c triggers CREATE FUNCTION b_mv_c_ut() RETURNS TRIGGER SECURITY DEFINER LANGUAGE 'plpgsql' AS ' BEGIN IF OLD.b_id = NEW.b_id THEN PERFORM matview_queue_refresh_row(''b_mv'', NEW.b_id); ELSE PERFORM matview_queue_refresh_row(''b_mv'', OLD.b_id); PERFORM matview_queue_refresh_row(''b_mv'', NEW.b_id); END IF; RETURN NULL; END '; CREATE TRIGGER b_mv_ut AFTER UPDATE ON c FOR EACH ROW EXECUTE PROCEDURE b_mv_c_ut(); CREATE FUNCTION b_mv_c_dt() RETURNS TRIGGER SECURITY DEFINER LANGUAGE 'plpgsql' AS ' BEGIN PERFORM matview_queue_refresh_row(''b_mv'', OLD.b_id); RETURN NULL; END '; CREATE TRIGGER b_mv_dt AFTER DELETE ON c FOR EACH ROW EXECUTE PROCEDURE b_mv_c_dt(); CREATE FUNCTION b_mv_c_it() RETURNS TRIGGER SECURITY DEFINER LANGUAGE 'plpgsql' AS ' BEGIN PERFORM matview_queue_refresh_row(''b_mv'', NEW.b_id); RETURN NULL; END '; CREATE TRIGGER b_mv_it AFTER INSERT ON c FOR EACH ROW EXECUTE PROCEDURE b_mv_c_it();
With the right support functions, and a column to track the type of
materialized view, it should be possible to switch between the various
techniques. Right now, I don't have a way to do this because generating the
triggers, the mv_refresh_row
function, and the
mv_refresh
function is not generic. However, should I discover a
way to make them generic, then it should be possible to track what type of
technique the materialized view is using in the matviews
table.
Then, I would write a generic function to apply the necessary changes to
switch between techniques.
Let's review the various techniques and their benefits and disadvantages.
The snapshots are really easy to implement. However, they take a lot of work to regenerate the data. This is good if it is going to take a lot of work anyway, but bad if there are only a few small changes to make at each update. The refresh function must be called regularly.
The data will be out-of-sync with the view as soon as the data starts to change. This may be good or bad, depending on what you want.
The eager updated materialized views are always updated - even in a transaction. This comes at a cost - changes that affect the materialized view are going to be more expensive. This is fine if you don't change the data much, but can lead to problems when you do bulk imports or modifications.
The data may fall out of sync if there is a dependence on mutable functions. A regular specialized refresh function can be called to remedy the data. However, discovering the correct algorithm for this function can be difficult.
Lazy materialized views offer a balance between the eager and snapshot types. Changes to the data are not instantly propagated, allowing multiple changes to the same records to be applied in one shot rather than mutliple shots. However, the data is not consistent during a transaction, and so it must be carefully handled.
Like eager materialized views, the data may fall out of sync if there is a dependence on a mutable function. A similar solution must be implemented.
The only drawback to lazy materialized views is that I don't know how to implement them yet, as I can't put a hook in before the transaction is committed.
Very lazy materialized views are like snapshots, except the updates can be lighter-weight. This is good if there are few updates to the data. Like snapshots, the data will begin to be out of sync as soon as the data changes, but the refresh function is much faster and uses less resources.
It doesn't take a genius to realize that materialized views are not simple. If you can find a way to improve your database performance with partitions or indexes, by all means, do so. However, certain situations are best served with materialized views. These situations are obvious in that there are relatively few data modifications compared to the queries being performed, and the queries are very complicated and heavy-weight. It also goes without saying that the different kinds of materialized views are useful in different situations.
Avoid mutable functions if you can. Sometimes, like the example above with
the expires
column, the solution is obvious. Othertimes, it is
not. If possible, summarize the mutable function in a column in the table. For
instance, we could've added a column called expired
, set it to a
boolean, and ran a query to update that column with respect to the
expires
column nightly. Then we could create the view and thus
the materialized view with a dependence on the expired
column,
and ignore the expires
column. This would've made the
materialized view simpler and always synchronized with the data.
Due to the sheer volume of SQL code required to write a materialized view that is either eagerly updated or very lazily updated (nearly 1,000 in one instance I have implemented), it is important to combine all the functions together into a script that is executed in one transaction. I also like to have a partner script to back out all the changes. This way, when I create the materialized view quickly, and I can quickly back out if things go bad. It also helps me to keep track of all the changes in one neat bundle.
I would like to begin work on generic functions to implement the
mv_refresh_row()
functions, as well as the triggers and trigger
functions. I think I will need to begin writing code in C for this to work
properly.
If the generic functions become available, I would like to drop the
snapshot method and instead replace the mv_refresh()
function
with one that will determine if it makes more sense to delete all the rows and
reselect them, or apply the changes incrementally.
I would also like to investigate how to put a hook into the transaction process. Having that ability would add a powerful materialized view to the existing arsenal.
I welcome your comments on this document, as well as any questions you may have. Please email me at jgardner@jonathangardner.net and CC the SQL, Performance, or Hackers list on the PostgreSQL site.