Intuitive Design of a Database
|
Thoughts on Database Designing
|
28-Oct-2005
| |
Designing databases gets to be intuitive if you do it
long enough. There are guidelines to follow with regards
to normalization--primarily don't duplicate data attributes.
Generally, the topics to keep in mind are objects, attributes of objects and relationships of
objects. Again there are known technique and guidelines
to follow for constructing these things using UML and ER diagrams.
In this article we will use an intuitive approach for normalized data
and to specify objects, attributes and relationships.
Resulting objects will be represented as tables, attributes are
columns in tables and relationships are foreign keys
with or without aggregate tables.
Note that this angle of design is wordy as each issue and
question arises and is answered. The written form
of a thought process can be tedious, but here is a
guideline to my intuitive database design process.
First we will identify the primary objects in our database by
asking questions that we will want to know about from the
database. Then for each object we will determine what uniquely
identifies that object and define a primary key. Then we will look
at the relationships of that object to other objects and define those relationships.
This is the intuitive approach I will follow.
Of course you can create a database to store data on
any kind of objects. In this case we will store
information about houses. This may look different to
you if you think about houses differently than I do, but
go with the flow. Also note that this small example could
represent a part of a larger database.
First let us think of questions or queries that we will
want to ask about the houses.
- What houses are there? Who owns them?
- Who lives there Who lives where?
- How many rooms does any or each house have?
- What kinds of rooms do any or each house have?
- How many TVs are in this house or all houses?
- What kind of pets live in any or each house?
- What furniture and appliances are in the living room?
Then let us think about what describes a house:
Address, Owner, Rooms, Residents, Furniture and Appliances, and Pets
come to mind from this sample query list.
The first object is the house itself.
What is a unique characteristic of a house? An address.
But we know from experience that addresses are not good to
use as primary keys because in real life they are often
variably described, like names. Let us then choose an
artificial key for a house, house_id.
Of the list of house attributes, what are objects in their own
right? All of them except address are objects that can use further
description. That leads us to having additional tables for the other objects.
So far we have tables:
- Houses ( house_id SERIAL, address, owner )
- Rooms ()
- Residents ()
- Furniture_Appliances ()
- Pets ()
It is good to get the major objects identified early in the design.
A house has an owner. Let us say, in this case houses can only have
one owner and that owner is a resident of the house. How do we represent this?
The standard way to do that is to put the key of the resident table into
the house table as the owner column and define a foreign key relationship
on the resident_id (that we have not thought about until this minute).
Hold the thoughts on Houses for a minute and let us think for a
moment on Residents. What describes a Resident?
Name, age and job are what come to mind. None of these seems to
be an object in its own right, so they are attributes of residents.
But wait. Age is variable. Let us use birth date instead so that the
age can be calculated at any time.
What uniquely identifies residents? The name does. However, names,
we know are not good unique identifiers because they vary in real
life. So, like Houses, we'll give it an artificial key of resident_id.
- Houses (
- house_id SERIAL,
- address,
- owner (FK resident(resident_id))
- Residents (
- resident_id SERIAL,
- res_name,
- res_bdate,
- res_job )
Thinking about Residents and Houses, we muse that many residents may
reside in a house. The owner is a special case which is noted by
being stored in the Houses table. How is the residence of the residents
stored in general? Aha! We forgot that the house one lives in in an
attribute of a Resident. Let us add house_id to the Residents table
and connect it to the Houses table with a foreign key.
- Houses (
- house_id SERIAL,
- address,
- owner (FK resident(resident_id))
- Residents (
- resident_id SERIAL,
- house_id (FK house(house_id),
- res_name,
- res_bdate,
- res_job )
Moving onto Rooms, what uniquely describes a Room? Well, we know a room
is in a house, so house_id will be fitting into it somewhere. But we also
know that some houses have two bathrooms and many bedrooms so room kind
alone could not do it. Hmmmm. OK, hold that. What describes a room?
The house and the kind of room.
And the room may have a familiar name like "front bathroom", "center bedroom",
"upstairs bath", "office", etc. So, the familiar name seems like a good
key for the Rooms table along with house_id. In this we are choosing an
text field for our key rather than an integer field. Some people
created artificial integer fields for all primary keys.
However it is much better to use the key the makes the most sense for the
object, except for fields like people names and addresses which do vary.
It is better because then the table represents more accurately the object it
describes.
So the room table will have a two part primary key, the house_id and the familiar
name, room_name.
- Houses (
- house_id SERIAL,
- address,
- owner (FK resident(resident_id))
- Rooms (
- house_id <(FK houses(house_id)),
- room_name,
- room_kind )
I was thinking about room kind as a standard kind of room, office, bathroom,
bedroom, etc. This is different from the familiar name of the room which
uniquely identifies it in the house. We want a standard list of room kinds
so we can query across houses to see how many bedrooms they have, for example.
We want to query on the kind 'bedroom' not the details of the familiar name
like 'Chris' room'.
When you have a list of standard values for a field, the field needs to
be limited to that set of values. You can do this by setting a constraint on
the table or by creating a lookup table. You would choose a constraint if the
values were few and changed almost never. For a longer list that you could
add to in the future, you would use a lookup table.
We'll create a look up
table called room_kinds. The sole purpose of this table will be to validate
the room_kind field in the rooms table and the room kinds will be unique
in the room_kinds table. There is no need to create an alternative key for
Room_kinds since the short value itself is unique. Then we just have to make a
foreign key reference from rooms.room_kind to room_kinds.room_kind.
This make make you say, "wait! what about duplicated data?".
But only the key is duplicated and that is alright. Room_kinds can at
a later time have more attributes, but until then we also get the
benefit of not having to JOIN it in on queries.
- Houses (
- house_id SERIAL,
- address,
- owner (FK resident(resident_id))
- Room_Kinds (
- Rooms (
- house_id (FK houses(house_id)),
- room_name,
- room_kind (FK room_kinds(room_kind)))
Rooms have appliances and furniture in them. What uniquely identifies appliances or furniture?
The house and the room keys should figure into this. But
I can't think of anything, can you? OK, let us make the primary key an artificial
key. What else describes appliances and furniture? Type, make, model? That seems
to be too much information. How about what house it is in, what room it is in and
what kind it is? That is not narrow enough. A room may have two sofas. Let us
make the artificial id the primary key and have one foreign key on
the unique combination of house_id and room_name.
We can have a look up table on appliances and furniture like we
do with room_kinds and be sure to include "other".
Notice that the appliances and furniture relate to the room table, not
to houses and room table separately.
- Houses (
- house_id SERIAL,
- address,
- owner (FK resident(resident_id))
- Room_Kinds (
- Rooms (
- house_id (FK houses(house_id)),
- room_name,
- room_kind (FK room_kinds(room_kind)))
- AF_Kinds (
- App_Furn (
- fa_id SERIAL,
- house_id,
- room_name
- af_kind (FK af_kinds(af_kind))
- (FK rooms(house_id, room_name)))
One last table needs to be defined: Pets. We're getting the hang of this.
A pet is uniquely defined by its owner and name. There are also kinds of pets.
So the primary key is the owner or resident id and the pet name. The kinds
of pets can follow the model of room_kinds and af_kinds.
- Houses (
- house_id SERIAL,
- address,
- owner (FK resident(resident_id))
- Residents (
- resident_id SERIAL,
- house_id (FK house(house_id),
- res_name,
- res_bdate,
- res_job )
- Pet_Kinds (
- Pets (
- owner (FK resident(resident_id)),
- pet_name,
- pet_kind (FK pet_kinds(pet_kind))
Now we can write the CREATE TABLE statements. We will choose the obvious type for each
column. Most are text types, SERIALS or int8s which are references to SERIALS.
Immediately, we are forced to notice that there is a circular dependency
between residents and houses on the owner field. We could re-design this relationship
to avoid this circular reference by pushing the ownership notation as a boolean into
the residents table. This would also allow us to have more than one owner per
house which seems like a better model. Houses and residents would now look like this:
- Houses (
- house_id SERIAL,
- address)
- Residents (
- resident_id SERIAL,
- house_id (FK house(house_id),
- res_owner,
- res_name,
- res_bdate,
- res_job )
That is better. Now back to the CREATE TABLE statements. The entire list
looks like this:
- Houses (
- house_id SERIAL,
- address)
- Residents (
- resident_id SERIAL,
- house_id (FK house(house_id),
- res_owner,
- res_name,
- res_bdate,
- res_job )
- Room_Kinds (
- Rooms (
- house_id (FK houses(house_id)),
- room_name,
- room_kind (FK room_kinds(room_kind)))
- AF_Kinds (
- App_Furn (
- fa_id SERIAL,
- house_id,
- room_name
- af_kind (FK af_kinds(af_kind))
- Pet_Kinds (
- Pets (
- owner (FK resident(resident_id)),
- pet_name,
- pet_kind (FK pet_kinds(pet_kind))
It is highly recommended that you write the script in a file
and then execute the script in a transaction. This way, while you find all of your
typos it is easier to start over by rolling back the transaction.
This is also the time to straighten out inconsistent names.
The create table script created
these tables and relations.
This diagram was created using pgaccess
Our next step is to populate the look up tables, af_kinds, pet_kinds and room_kinds. Always be
sure to include "other" just in case. The retrieval of the data will likely involve
answering the questions originally asked. A good test would be to write those queries
and test them on realistic data.
Then you are ready to write your application to do what computer programs always do: store
information, munch it around and retrieve it in a sensible format.
Contributors:
elein at varlena.com
|