There were many great responses to our survey in PostgreSQL General Bits issue #56. Many thanks to all who took the time to respond. Your thoughtful and sometimes funny and pointed responses are very insightful.
The first installment of the survey analysis is available in PostgreSQL General Bits issue #58 and it covers the platforms and versions that we use.
This installment will cover the features we use and the features we are looking forward to.
Features We Use
2. To what extent do you or your company use PostgreSQL for its object or object relational features?
The specific feature that we use the most is user defined functions written in plpgsql. plpgsql was mentioned by all who use user defined functions followed closely by plperl and sql. plpython and c made a fair showing, as well. Very surprisingly pltcl did not show up at all until later discussions of the survey.
The spatial/geometric datatypes are used quite commonly, usually with RTREE indexes. LTree, GiST, tsearch2 and functional indexing are also used. Inheritance had a couple of proponents.
People did not use extensibility features for several reasons. plpgsql was deemed "too hard". Certainly this 4GL is not the most elegant of languages and requires six lines for, say, plperl's two; however, it is generally thought to be simple. It seems that we need more examples of user defined functions available.
It was suggested that object-relational concepts might be considered immature. This latter suggestion is unfounded. They may not be wide spread, but they are spreading. Object-relational databases have been around for 15+ years and have heavily influenced the latest SQL standards. This possible misconception shows us that we do not explain very well what these concepts are and how they are used.
The 7.4 features we are using are mostly behind the scenes performance improvements, particularly, the NOT IN query improvements. The visible features mentioned were the completed domain support, autovacuum and IPv6 support. Also mentioned were the new client/server protocol with SSL handling in libpq.
In the list of general features, "all of the above" (see #14 above) was the leader. Pointy-Haired-Bosses and the business side of the projects all preferred the cost. Extensibility in terms of working well with middle ware (ODBC and JDBC) as well as standard client ware was lauded. The extensibility and openness of PostgreSQL enables many ways to connect and use the database in different environments. Usually extensibility is thought of as the user defined functions and user defined data types and inheritance, but this aspect of extensibility as "plays well with others" is also a good way to look at it.
The PostgreSQL community was mentioned as a feature of PostgreSQL. This is certainly a draw to Postgres. Our relatively good natured and focused mailing list support for users and developers are very important and sometimes taken for granted.
Features We Want and Recommend
7. What features would make PostgreSQL more competitive with Oracle or DB2?
Far and away the most requested feature is a good and easy replication for master/slave and intended for hot fail over situations. Multi-slave and clustering variations were are also mentioned. Current replication options do not fit the bill as "easy" or hot fail over.
Point in time recovery (PITR), including an urgent plea, was the next most requested feature, followed closely by the native win32 port which is seen as an important marketing strategy.
Nested transactions and inheritance of foreign keys and indexes were also on the list as well. Update in place (as opposed to upgrading versions using dump and restore) got a worthy mention and further improvement on the dependency issues in pg_dump were also mentioned.
Some features that we do support already were also on the list, for example, prepared plans and the ability to return a record set. Incremental backup was also mentioned. This can be implemented in any given database by the application developer with the tools we do have. (Hint: timestamps) This again leads to the conclusion that there is a problem with our ability to propagate information to the user community.
In addition to these desirable features, there was a set of recommendations with regards to the documentation. People want published tutorials in online journals (e.g. PostgreSQL General Bits :-) and a newbie guide was requested. Two flavors of newbie guides would be helpful on further thought--one for those who are new to databases and one for those transitioning from other databases to PostgreSQL. We do have a tutorial in the documentation (Chapter 1), however, it does not seem to have the visibility or clout that maybe it should. There was praise for the Performance Guides as a good start for improved documentation.
Better GUI administration, visualization and design tools were requested, including the ability to use a GUI to set the optimization configurations. These are particularly useful to new users and I've received a hint that perhaps one is being spec'd out. There was also complaints about the often discussed low default optimization settings where on installation, the database configurations are set notoriously low.
A request for an application builder for PHP was requested. I am currently reviewing CodeCharge Studio myself and would like to hear others' opinions on this and other application development tools like this.
Packaged solutions for small businesses were suggested as a very good way to get into the market place. This will be our goal as users and developers of applications with PostgreSQL.
And finally, the key to attracting new users: simplicity. PostgreSQL is not a simple product. It handles large amounts of complex data efficiently and effectively and with absolution data integrity. However, there is always room for simple interfaces and tools to Postgres for those who want and need them.
Training, Support, Hosting and Certification
4. Should there be a standard worldwide PostgreSQL training course?
There was a general positive response with regards to standardized PostgreSQL training--if it were available. There were no negative responses but there were a handful of "don't know" or "don't care" responses. Training continues to be an issue that is growing more pertinent.
The reaction to certification was generally negative, as would be expected in a crowd of programmers. Several people acknowledged that their companies would be interested and that it would be helpful as a learning tool to show accomplishment. This issue has been discussed on several lists and off-lists all around. There are several companies preparing to propose certification tests for PostgreSQL. This may assuage the business community's need for a feeling of comfort and stability.
Most people supported their own server machines without the help of a PostgreSQL hosting company. Some of these also did not require outside access to their data and so did not feel the need of an outside hosting service. There were several people without the infrastructure of an established company or data center who did either use a hosting company or were considering doing so. The only hosting company mentioned explicitly was PgHoster.
No one who responded actually had purchased support. One company provided support and others use the mailing lists, the Internet and PostgreSQL General Bits. I think that this shows two things: the level of postgresql user who responded to the survey and the excellent level of support provided by our mailing lists.
More results will be available in future issues.
Backups and Fail over
(Questions: 19, 20)
19. What is your backup strategy for production?
Almost everyone uses pg_dump from every four hours up to once a week. Daily dumps were the most common. pg_dump is usually called in a cron job and the results backed up with the system backup to tape or another disk.
For fail over strategies, prayer, hope, crossed fingers and running around in circles tearing hair out stood out for some reason. Using high quality servers in conjunction with hope seemed to be a good plan. In an oblique way this tells us that PostgreSQL is not failing often. If it were, no one would rely on these techniques alone. Prudence insists that we do have a fail over methodology available. Delayed dumps were the most popular showing that it is important to evaluate the acceptable downtime and lost data time when planning fail over methodologies. Hot slave replication and redundancy were also mentioned as fail over techniques.
Overall, this survey conveyed that our user community is fairly knowledgeable in spite of the fact that our features and advantages are not always well known.
I would like to thank each person who responded to the survey as well as the San Francisco PostgreSQL Users Group who allowed me to sound out the results at our recent meeting.
In the future, smaller, interactive polls will be available with PostgreSQL General Bits and I always look forward to feedback and comments from you.
It is very simple to add a column representing an modification timestamp to any row in any table. This question is asked fairly often, particularly from MySQL converts. MySQL supports a feature where the first timestamp typed column in a table is automatically updated to the current timestamp any time a row is inserted or updated. To avoid the automatic update in MySQL you must use a datetime type instead of the timestamp type. (Some believe it is sacrilegious for the server to update data without specific requests, but the feature is useful.)
If you have a modification timestamp column with the DEFAULT set to now(), then each INSERT will record the current timestamp. This only works for INSERTS and not UPDATES since the DEFAULT values are only evaluated for INSERTS.
To update the modification timestamp on update, you will need a very simple function and a very simple trigger. If you had several tables with the same column name for the modification timestamp, you can reuse the function, but you will need to define the trigger for each table explicitly.
The trigger function, simply sets the NEW column to now().
create or replace function ts_row() returns TRIGGER as ' BEGIN NEW.mod_date = now(); RETURN NEW; END; ' language 'plpgsql';
For any table which has a timestamp column named mod_date, this function can be invoked as an UPDATE trigger.
create trigger upd_any_table BEFORE UPDATE ON any_table for each row EXECUTE PROCEDURE ts_row(); create trigger upd_other_table BEFORE UPDATE ON other_table for each row EXECUTE PROCEDURE ts_row();
Further discussion on this topic included a discussion on the implementation of the modification timestamp column as a RULE instead of a TRIGGER and a proposal to add an "ON UPDATE" qualification to the DEFAULT specification. RULES require more mechanics since they require an updatable view and the rule needs to be qualified so that it is not recursive. It was also pointed out that adding an ON UPDATE to the DEFAULT specification would not (necessarily) override explicit updates to the modification timestamp field.
Timestamping of rows in conjunction with the system of no deletions used to be a part of the University Postgres and was called "timetravel". Timetravel was the ability (pre-vacuum) to qualify a data set based on a time frame and enabled you to ask what the data values were for a week ago at 2:00pm. Timetravel was removed from PostgreSQL because the overhead of row size and update costs was considered too high. Timetravel is not too hard to reconstruct, however, The logging of a modification timestamp as described above is the key requirements.
A pg_dumpall which is to be loaded into a new installation often has "little" problems, particularly between versions.
On of the more prominent problems is the location of plpgsql.so. The dump file has an explicit path to the library so if the location of this library has changed--and it has when you have a new version or installation, then you must change the dump file by hand to update the path.
When plpgsql.so is not found, then you should see an error message complaining about a "missing plpgsql handler". The repercussions of this is that your plpgsql procedures will not be loaded into your new database.
A bug in 7.0.3 causes a 7.4.1 dump to complain about the environment variable LOCATION. This was corrected in Tom-Lane-speed with a patch to the 7.4.1 dump program so that you can use the 7.4.1 dump to dump your older database. The patch should be available shortly.
Another issue was that older versions of pg_dump could not understand a CR (carriage return) instead of \r in the file. To work around this problem, you could perl the file or you could dump the data using INSERTS instead of COPY. The latter of course would be slower.
Keep the problems rolling in as bugs and little by little each of these incompatibilities will be fixed. It will also raise consciousness about the requirement for in place upgrades and better integration between versions.
Comments and Corrections are welcome. Suggestions and contributions of items are also welcome. Send them in!
Copyright A. Elein Mustain 2003, 2004, 2005, 2006, 2007, 2008, 2009