varlena
varlena
PostgreSQL Training,
Consulting & Support
General Bits
By A. Elein Mustain

23-May-2005 Issue: 110

Archives | General Tidbits | Google General Bits | Docs | Castellano | PortuguÍs | Subscriptions | Notifications | | Prev

General Bits is a column loosely based on the PostgreSQL mailing list pgsql-general.
To find out more about the pgsql-general list and PostgreSQL, see www.PostgreSQL.org.

Questions and Answers Problem
Accessing Inherited Table Data 20-May-2005

A problem I have come across in many different applications is that of questions and answers.

Questions can be grouped in many conventional ways, creating questionnaires tied to askers of questions, answerers of questions and so on. You can have college applications, polls, job applications, sign up forms for all kinds of things. This is a pervasive issue.

The interesting problem is that a group of questions has many answers and those answers have many types. You might have a date type as an answer to "When is your birthday?". You might have a numeric type to answer "How much money do you have?". And then you want to see the answers for each question.

Generally the question and answer problem is solved by storing all answers as text. This usually works fine until you need to do reporting and statistics on the questions and answers.

    How many people who answered this questionnaire were born in 1960?
    What is the average salary of the people who answered?
    What is the average salary of the people who were born in 1960 who answered?

There are two issues to address--the first is to be able to look at any answers to any questions at once, regardless of type. This means that all types must be either text types or folded to text type. The second is to be able to query and aggregate the answers in their native type semantics.

I am proposing one solution to the question and answer problem that addresses both issues. It can present any questions and any answers consistently. And it allows querying and manipulation of the native type of the answer. In addition, it allows answers which are multi-valued, for example, a date range which would be the answer to the question "When is your vacation planned?" The multi-valued capabilities can be expanded further to include complex types which are commonly used, for example an address type. The address fields can be an answer to a question.

The solution to the question and answer problem that I am about to propose is also a "killer app" for table inheritance. In the course of these proposal I will suggest naming conventions for reasons I hope are apparent. You can, of course, choose your own naming conventions to meet your own needs, but do try to understand the reasoning behind the original naming first.

Suppose there is a question table. The questions may be grouped into questionnaires by askers.

Each question has an answer type with a known number of columns. Each question gets answered by answerers. We are tying answers directly to questions and not questionnaires, although that would also be possible if desired.
So far that looks almost ordinary. Now for the fun part. answer_values is a parent table and has no rows. Many children tables are defined under that parent table. Each child table also carries with it the base data type, the number of columns involved in the answer. This is information that is mirrored in the question table in order to tie the answer to a specific expected type. The idea is to ensure that any piece of the application looking for answers via the questions table has enough information to find them.

Each answer_value child table is altered immediately after creation in order to share the sequence which assigns unique ids to the answers. It is also altered so that the primary key (orid, qid, avid) (or the answerer id, the question id and the answer value id) is the same for each child table.

The child tables are named intentionally so that the information we are providing gives the application enough information to find them given the type and the number of columns. The text answer is called answer_text. The date range answer is answer_date_2. The complex type is just named by its type name answer_addr.

Also in this presentation not all obvious child types have been added, for example, an integer range is missing.

Indexes on the child table, in addition to the previously added primary key indexes should probably include an index on the question identifier, qid and the answerer identifier, orid.

Given this set up, our tasks are to

  • Create an any question any answer view
  • Insert, Update, Delete from the children tables directly and Query the children tables in their native types.

Below is a selection from a special view which returns the question id, the question and all of the answers of various types as text. Notice that answer types with one column are returned as text. Answers types with multiple columns are returned as lists for easy parsing of the perl, python, etc.

   db=# select qid, question, av_v.avalue
   db-# from av_v JOIN questions USING (qid);
    qid |                         question                          |                avalue
   -----+-----------------------------------------------------------+--------------------------------------
      1 | What is your name?                                        | Elein
      2 | What is your birthdate?                                   | 31-dec-2059
      3 | When did you start and end?                               | (01-may-2005,06-may-2005)
      5 | When much money did you have at the start and at the end? | (25.00,10.00)
      6 | What is your address?                                     | ("4514 Cherry St.",Oakland,CA,94609)
      (5 rows)
This view av_v is a UNION of all of the children tables. Date and numeric values use to_char() functions to format them. When there are more than one column associated with the answer value then they are composed into the text values of a row. You can see the view here.

Every answer type requires a child table and needs a paragraph added to the av_v view. This assures that all answers are available as text. The addition of a new type requires only the "monkey see, monkey do" technique of development and formatting numbers and dates to your own taste.

Because the answer values are stored in separate children tables, each field in each answer can also be accessed directly as its native type. The names of the fields are set up to be as consistent as possible since the answers are used for various questions. In the single column tables all values are avalue. In two column answers where the type is the same, the values are astart and aend. In complex types, it is up to you to choose appropriate names.

It is a key concept that the client program is able to construct the correct table name of the answer to a question. We may have an over abundance of places where we flag the type and number of columns, but it has not yet been determined conclusively which we could eliminate.

To facilitate queries on answers we have created a view here called qna which joins the answer view to the questions table.

Here are some examples of queries. We will look at some text queries and some queries using the actual value types. Some queries can be written in both ways; some cannot without some further wrangling.

   --
   -- of all addressed entered for any question,
   -- how many are in zip code 94609?
   --
   db=# select question, addr, city, zip
   db-# from questions JOIN answer_addr USING (qid)
   db-# where zip = '94609';
          question        |      addr       |  city   |  zip
   -----------------------+-----------------+---------+-------
    What is your address? | 4514 Cherry St. | Oakland | 94609

   --
   -- What question used a date range?
   --
   =# select qid, question from questions where atype = 'date' and ncols = 2;
    qid |          question
   -----+-----------------------------
      3 | When did you start and end?

   --
   -- What are the date ranges for question 3?
   -- (sorry not a lot of test data here :)
   --
   db=# select qid, question, avalue from qna where qid = 3;
    qid |          question           |          avalue
   -----+-----------------------------+---------------------------
      3 | When did you start and end? | (01-may-2005,06-may-2005)

   db=# select qid, question, astart, aend from questions q JOIN answer_date_2 using (qid);
    qid |          question           |   astart   |    aend
   -----+-----------------------------+------------+------------
      3 | When did you start and end? | 2005-05-01 | 2005-05-06

   --
   -- Who answered a start date after 4/1/05 for question 3
   --
   db=# select orid, orname, qid, question, astart, aend
   db-# from questions q JOIN answer_date_2 using (qid)
   db-#    JOIN answerers using (orid)
   db-# where astart > '4/1/05' and qid = 3;
    orid |    orname    | qid |          question           |   astart   |    aend
   ------+--------------+-----+-----------------------------+------------+------------
       1 | the_answerer |   3 | When did you start and end? | 2005-05-01 | 2005-05-06

   --
   -- What was the average amount of start and end for question 5?
   --
   db=# select avg(astart), avg(aend)
   db-# from questions JOIN answer_numeric_2 using (qid)
   db-# where qid = 5;
            avg         |         avg
   ---------------------+---------------------
    25.0000000000000000 | 10.0000000000000000

This data model is good for applications which need both general access to questions and answers and detailed, aggregatable questions and answers. It is flexible enough to support any answers, simple and complex. And it is solid enough to be an extensible model in a simple and consistent manner. It is tunable to your own needs. The costs involve maintenance when updating the views and children tables when adding new answer types, so choose your answer types in advance well. Also the client program must be flexible enough to understand the underlying name of the child table for insert, update and delete as well as querying. You can and probably will want use partial indexes on the underlying children tables for greater speed for warehousing and reporting.

The code to implement this model and example can be found in General Tidbits under examples or here:

Share and enjoy.

Contributors: elein at varlena.com


Comments and Corrections are welcome. Suggestions and contributions of items are also welcome. Send them in!
Copyright A. Elein Mustain 2003, 2004, 2005, 2006, 2007, 2008, 2009

Top
Google
Search General Bits & varlena.com Search WWW