Varlena, LLC | PostgreSQL General Bits Newsletter

PostgreSQL Training,
Consulting & Support

By A. Elein Mustain

31-Oct-2005 Issue: 123

Archives |

General Tidbits |

Google General Bits |

Prev |

General Bits is a column loosely based on the PostgreSQL mailing list pgsql-general.
To find out more about the pgsql-general list and PostgreSQL, see www.PostgreSQL.org.

Intuitive Design of a Database
Thoughts on Database Designing	28-Oct-2005

Designing databases gets to be intuitive if you do it long enough. There are guidelines to follow with regards to normalization--primarily don't duplicate data attributes. Generally, the topics to keep in mind are objects, attributes of objects and relationships of objects. Again there are known technique and guidelines to follow for constructing these things using UML and ER diagrams. In this article we will use an intuitive approach for normalized data and to specify objects, attributes and relationships. Resulting objects will be represented as tables, attributes are columns in tables and relationships are foreign keys with or without aggregate tables.

Note that this angle of design is wordy as each issue and question arises and is answered. The written form of a thought process can be tedious, but here is a guideline to my intuitive database design process.

First we will identify the primary objects in our database by asking questions that we will want to know about from the database. Then for each object we will determine what uniquely identifies that object and define a primary key. Then we will look at the relationships of that object to other objects and define those relationships. This is the intuitive approach I will follow.

Of course you can create a database to store data on any kind of objects. In this case we will store information about houses. This may look different to you if you think about houses differently than I do, but go with the flow. Also note that this small example could represent a part of a larger database.

First let us think of questions or queries that we will want to ask about the houses.

What houses are there? Who owns them?
Who lives there Who lives where?
How many rooms does any or each house have?
What kinds of rooms do any or each house have?
How many TVs are in this house or all houses?
What kind of pets live in any or each house?
What furniture and appliances are in the living room?

Then let us think about what describes a house: Address, Owner, Rooms, Residents, Furniture and Appliances, and Pets come to mind from this sample query list.

The first object is the house itself. What is a unique characteristic of a house? An address. But we know from experience that addresses are not good to use as primary keys because in real life they are often variably described, like names. Let us then choose an artificial key for a house, house_id.

Of the list of house attributes, what are objects in their own right? All of them except address are objects that can use further description. That leads us to having additional tables for the other objects.

So far we have tables:

Houses ( house_id SERIAL, address, owner )
Rooms ()
Residents ()
Furniture_Appliances ()
Pets ()

It is good to get the major objects identified early in the design.

A house has an owner. Let us say, in this case houses can only have one owner and that owner is a resident of the house. How do we represent this? The standard way to do that is to put the key of the resident table into the house table as the owner column and define a foreign key relationship on the resident_id (that we have not thought about until this minute).

Hold the thoughts on Houses for a minute and let us think for a moment on Residents. What describes a Resident? Name, age and job are what come to mind. None of these seems to be an object in its own right, so they are attributes of residents. But wait. Age is variable. Let us use birth date instead so that the age can be calculated at any time.

What uniquely identifies residents? The name does. However, names, we know are not good unique identifiers because they vary in real life. So, like Houses, we'll give it an artificial key of resident_id.

Houses (
- house_id SERIAL,
- address,
- owner (FK resident(resident_id))
Residents (
- resident_id SERIAL,
- res_name,
- res_bdate,
- res_job )

Thinking about Residents and Houses, we muse that many residents may reside in a house. The owner is a special case which is noted by being stored in the Houses table. How is the residence of the residents stored in general? Aha! We forgot that the house one lives in in an attribute of a Resident. Let us add house_id to the Residents table and connect it to the Houses table with a foreign key.

Houses (
- house_id SERIAL,
- address,
- owner (FK resident(resident_id))
Residents (
- resident_id SERIAL,
- house_id (FK house(house_id),
- res_name,
- res_bdate,
- res_job )

Moving onto Rooms, what uniquely describes a Room? Well, we know a room is in a house, so house_id will be fitting into it somewhere. But we also know that some houses have two bathrooms and many bedrooms so room kind alone could not do it. Hmmmm. OK, hold that. What describes a room? The house and the kind of room. And the room may have a familiar name like "front bathroom", "center bedroom", "upstairs bath", "office", etc. So, the familiar name seems like a good key for the Rooms table along with house_id. In this we are choosing an text field for our key rather than an integer field. Some people created artificial integer fields for all primary keys. However it is much better to use the key the makes the most sense for the object, except for fields like people names and addresses which do vary. It is better because then the table represents more accurately the object it describes.

So the room table will have a two part primary key, the house_id and the familiar name, room_name.

Houses (
- house_id SERIAL,
- address,
- owner (FK resident(resident_id))
Rooms (
- house_id <(FK houses(house_id)),
- room_name,
- room_kind )
I was thinking about room kind as a standard kind of room, office, bathroom, bedroom, etc. This is different from the familiar name of the room which uniquely identifies it in the house. We want a standard list of room kinds so we can query across houses to see how many bedrooms they have, for example. We want to query on the kind 'bedroom' not the details of the familiar name like 'Chris' room'.
When you have a list of standard values for a field, the field needs to be limited to that set of values. You can do this by setting a constraint on the table or by creating a lookup table. You would choose a constraint if the values were few and changed almost never. For a longer list that you could add to in the future, you would use a lookup table.
We'll create a look up table called room_kinds. The sole purpose of this table will be to validate the room_kind field in the rooms table and the room kinds will be unique in the room_kinds table. There is no need to create an alternative key for Room_kinds since the short value itself is unique. Then we just have to make a foreign key reference from rooms.room_kind to room_kinds.room_kind.
This make make you say, "wait! what about duplicated data?". But only the key is duplicated and that is alright. Room_kinds can at a later time have more attributes, but until then we also get the benefit of not having to JOIN it in on queries.
- Houses (
  - house_id SERIAL,
  - address,
  - owner (FK resident(resident_id))
- Room_Kinds (
  - room_kind)
- Rooms (
  - house_id (FK houses(house_id)),
  - room_name,
  - room_kind (FK room_kinds(room_kind)))
Rooms have appliances and furniture in them. What uniquely identifies appliances or furniture? The house and the room keys should figure into this. But I can't think of anything, can you? OK, let us make the primary key an artificial key. What else describes appliances and furniture? Type, make, model? That seems to be too much information. How about what house it is in, what room it is in and what kind it is? That is not narrow enough. A room may have two sofas. Let us make the artificial id the primary key and have one foreign key on the unique combination of house_id and room_name. We can have a look up table on appliances and furniture like we do with room_kinds and be sure to include "other". Notice that the appliances and furniture relate to the room table, not to houses and room table separately.
- Houses (
  - house_id SERIAL,
  - address,
  - owner (FK resident(resident_id))
- Room_Kinds (
  - room_kind)
- Rooms (
  - house_id (FK houses(house_id)),
  - room_name,
  - room_kind (FK room_kinds(room_kind)))
- AF_Kinds (
  - af_kind )
- App_Furn (
  - fa_id SERIAL,
  - house_id,
  - room_name
  - af_kind (FK af_kinds(af_kind))
  - (FK rooms(house_id, room_name)))
One last table needs to be defined: Pets. We're getting the hang of this. A pet is uniquely defined by its owner and name. There are also kinds of pets. So the primary key is the owner or resident id and the pet name. The kinds of pets can follow the model of room_kinds and af_kinds.
- Houses (
  - house_id SERIAL,
  - address,
  - owner (FK resident(resident_id))
- Residents (
  - resident_id SERIAL,
  - house_id (FK house(house_id),
  - res_name,
  - res_bdate,
  - res_job )
- Pet_Kinds (
  - pet_kind )
- Pets (
  - owner (FK resident(resident_id)),
  - pet_name,
  - pet_kind (FK pet_kinds(pet_kind))
  Now we can write the CREATE TABLE statements. We will choose the obvious type for each column. Most are text types, SERIALS or int8s which are references to SERIALS.
  Immediately, we are forced to notice that there is a circular dependency between residents and houses on the owner field. We could re-design this relationship to avoid this circular reference by pushing the ownership notation as a boolean into the residents table. This would also allow us to have more than one owner per house which seems like a better model. Houses and residents would now look like this:
  - Houses (
    - house_id SERIAL,
    - address)
  - Residents (
    - resident_id SERIAL,
    - house_id (FK house(house_id),
    - res_owner,
    - res_name,
    - res_bdate,
    - res_job )
  That is better. Now back to the CREATE TABLE statements. The entire list looks like this:
  - Houses (
    - house_id SERIAL,
    - address)
  - Residents (
    - resident_id SERIAL,
    - house_id (FK house(house_id),
    - res_owner,
    - res_name,
    - res_bdate,
    - res_job )
  - Room_Kinds (
    - room_kind)
  - Rooms (
    - house_id (FK houses(house_id)),
    - room_name,
    - room_kind (FK room_kinds(room_kind)))
  - AF_Kinds (
    - af_kind )
  - App_Furn (
    - fa_id SERIAL,
    - house_id,
    - room_name
    - af_kind (FK af_kinds(af_kind))
    - Pet_Kinds (
      - pet_kind )
    - Pets (
      - owner (FK resident(resident_id)),
      - pet_name,
      - pet_kind (FK pet_kinds(pet_kind))
      It is highly recommended that you write the script in a file and then execute the script in a transaction. This way, while you find all of your typos it is easier to start over by rolling back the transaction. This is also the time to straighten out inconsistent names.
      The create table script created these tables and relations.
      
      This diagram was created using pgaccess
      Our next step is to populate the look up tables, af_kinds, pet_kinds and room_kinds. Always be sure to include "other" just in case. The retrieval of the data will likely involve answering the questions originally asked. A good test would be to write those queries and test them on realistic data. Then you are ready to write your application to do what computer programs always do: store information, munch it around and retrieve it in a sensible format.
      Contributors: elein at varlena.com

Comments and Corrections are welcome. Suggestions and contributions of items are also welcome. Send them in!

Top