varlena
varlena
PostgreSQL Training,
Consulting & Support
General Bits
By A. Elein Mustain

3-Oct-2005 Issue: 120

Archives | General Tidbits | Google General Bits | Docs | Castellano | Português | Subscriptions | Notifications | | Prev | Next

General Bits is a column loosely based on the PostgreSQL mailing list pgsql-general.
To find out more about the pgsql-general list and PostgreSQL, see www.PostgreSQL.org.

Counting Rows
Accurate or Approximate Row Counts 01-Oct-2006

It is often the case that you want to know how many rows are in a table. If you find yourself in this situation, you should also ask how accurate that count should be. If you are doing accounting, you want it to be exactly accurate. If you are decorating a web page with counts, perhaps it would be OK to be off by a few.

COUNT(*)

The traditional method of counting rows in a table is to do a select count(*) from the table. COUNT(*) is notoriously slow, especially on large tables, because it checks each row.
	webstat=# select count(*) from rawlog;
  	count
	---------
 	2058704
	(1 row)

	Time: 7202.873 ms
You should avoid count(*) queries if possible but it is the definitive answer for accuracy.

Reltuples

One alternative for count(*) is to select the estimated number of tuples from the pg_class table. This value is updated with each vacuum of the table. If your count of the number of tuples can be off by the number of tuples you add or delete between vacuums, this is the best choice. Don't use this method for accounting purposes, though. The more often vacuum passes over your table the more accurate the number is.

This number is stored in pg_class.reltuples. To see all of the row count estimates for all the tables in the public schema of your database use the following:

	SELECT relname, reltuples 
	FROM pg_class r JOIN pg_namespace n ON (relnamespace = n.oid) 
	WHERE relkind = 'r' AND n.nspname = 'public';
The kind of relation is a table relation ('r') and the namespace name is 'public'. Obviously, to just see one named table, use:
	SELECT reltuples 
	FROM pg_class r 
	WHERE relkind = 'r' AND relname = 'mytable';
where mytable is the name of the table in question.

Simple Count Triggers

If you must have an accurate count and using count(*) is prohibitive, then you may want to consider spreading the cost of maintaining the count through triggers. This technique involves creating an INSERT TRIGGER that increases the count and a DELETE TRIGGER which decrements the count. The count can be stored in a separate table.

Create a table called, for instance, row_counts. row_counts would consist of the table name (relname) and the row count for that table. First you would create table, then the triggers and then initialize the row_counts table.

	CREATE TABLE row_counts (
   relname  text PRIMARY KEY,
   reltuples   numeric);

Being properly lazy, I've written one trigger function which handles both the insert and delete cases on the table. This is easy to do using the TG_OP constant which says what the operation is and the TG_RELNAME constant which contains the table name. These are trigger specific contants. And thank goodness, once more, for dollar quoting.

	CREATE OR REPLACE FUNCTION count_trig()
	RETURNS TRIGGER AS
	$$
	   DECLARE
	   BEGIN
	   IF TG_OP = 'INSERT' THEN
	      EXECUTE 'UPDATE row_counts set reltuples=reltuples +1 where relname = ''' || TG_RELNAME || '''';
	      RETURN NEW;
	   ELSIF TG_OP = 'DELETE' THEN
	      EXECUTE 'UPDATE row_counts set reltuples=reltuples -1 where relname = ''' || TG_RELNAME || '''';
	      RETURN OLD;
	   END IF;
	   END;
	$$
	LANGUAGE 'plpgsql';

Still being lazy I wrote a function to add the triggers to all of my tables in the public schema. You may or may not want to do this. If you do, I suggest writing a corresponding drop trigger function (which is left as an exercise for the reader :).

	CREATE OR REPLACE FUNCTION add_count_trigs()
	RETURNS void AS
	$$
	   DECLARE
	      rec   RECORD;
	      q     text;
	   BEGIN
	      FOR rec IN SELECT relname
	               FROM pg_class r JOIN pg_namespace n ON (relnamespace = n.oid)
	               WHERE relkind = 'r' AND n.nspname = 'public' LOOP
	         q := 'CREATE TRIGGER ' || rec.relname || '_count BEFORE INSERT OR DELETE ON ' ;
	         q := q || rec.relname || ' FOR EACH ROW EXECUTE PROCEDURE count_trig()';
	         EXECUTE q;
	      END LOOP;
	   RETURN;
	   END;
	$$
	LANGUAGE 'plpgsql';
Initialize the row_counts table immediately after a vacuum using:
	insert into row_counts select relname, reltuples from pg_class;
This creates a small window of error, however. Any transactions completed between the vacuum and the creation of the row_counts table and triggers would be missed. To be absolutely sure of the counts, you must also stop activity on the server.

Vacuum the tables explicitly if you can. I found a case where the count was off by one on a quiescent database after a normal vacuum. It corrected itself after an explicit "vacuum mytable".

Even though you should run vacuum for this case explicitly on each table, if for any reason you are not sure of when your vacuum has run, you can write a function to do the count(*) initialization instead of using vacuum's counts to start off with. This function is slower and can still be off by some if your database is active during this time.

	CREATE OR REPLACE FUNCTION init_row_counts()
	RETURNS void AS
	$$
	   DECLARE
	      rec   RECORD;
	      crec  RECORD;
	   BEGIN
	      FOR rec IN SELECT relname
	               FROM pg_class r JOIN pg_namespace n ON (relnamespace = n.oid)
	               WHERE relkind = 'r' AND n.nspname = 'public' LOOP
	         FOR crec IN EXECUTE 'SELECT count(*) as rows from '|| rec.relname LOOP
	            -- nothing here, move along
	         END LOOP;
	         INSERT INTO row_counts values (rec.relname, crec.rows) ;
	      END LOOP;
	
	   RETURN;
	   END;
	$$
	LANGUAGE 'plpgsql';
This function selects the table names from the public schema from the pg_class table. For each table it finds, it executes a count(*) and stores the result in the row_counts table. The FOR crec IN EXECUTE...LOOP is required in order to get the result from an EXECUTE statement. This is the common workaround for that lack of functionality.

Putting it all together, this is the order of events:

  • Create row counts table
  • Create trigger function
  • Stop server activity if possible
  • Vacuum tables -- cannot be done in a transaction
  • In a transaction
    • Add triggers
    • Initialize row counts
From then on, you should be able to see your current row counts by selecting the table name at any time from the row_counts table.

Contributors: elein at varlena.com


Comments and Corrections are welcome. Suggestions and contributions of items are also welcome. Send them in!
Copyright A. Elein Mustain 2003, 2004, 2005, 2006, 2007, 2008, 2009

Top
Google
Search General Bits & varlena.com Search WWW