Tips for better analysing the needs of your customer: Has data modelling become old-fashioned?

When Object-Oriented programming and UML was becoming popular, there was a trend to focus on OO principles like polymorphism, encapsulation an other design patterns. Hence data modelling became less popular with developers. Although we have in UML class diagrams, the old-fashioned data modelling principles are not fashionable anymore. Too much emphasis was (and is being) laid on the functional side of the analysis (using Activity diagrams and Use Case diagrams) and on OO and CBD design principles and patterns.

But I have seen too many projects where the only data analysis techniques that where applied were: we need a table to store some information, I need a primary key (preferably a system-generated one), some columns and that's all. If I asked then what is the logical unique key, they look at me as if I came directly from Mars. "That is something we will handle in our code!". Or "We have a business rule engine for handling that!".

This article wants to promote some simple principles that may seem old-fashioned, but will guarantee that your data is stored properly, allowing you to develop better/easier code and offering you easier retrieval and manipulation of your data. And when choosing a data modelling tool, it will give you insight on features that make the difference.

Logical and Physical Model

Although UML class diagrams support a great deal of the classical data modelling techniques, they do serve a different purpose and lack the flexibility of data modelling tools that work with a logical and physical model. This twofold approach is giving you greater flexibility and less headache if you want to change relations or logical keys or try to keep the logical model in sync with your physical model. It also offers the possibility to focus on logical decisions in the logical model and to postpone physical decisions to the physical model.

Suppose you have a classical Ordering database with tables like customer, order, order line, delivery, delivery line, invoice, invoice line, product, product category, service, discount, stock, warehouse. Most of these tables are all linked together with foreign keys. During development you discover you made a crucial error in one of the master tables product: the product code is only unique in combination with the product category. The impact this might have on all tables linked with a foreign key may be huge.

Some less experienced analysts try to overcome this problem by creating tables with one primary, meaningless key, generated by the system that is used as foreign key in the linked tables. This is a good physical approach but these analysts skip one important step: performing proper data analysis.

Logical Model

In a logical (data) model, you can change relationships, promote identifiers to become primary identifiers and demote primary identifiers to become secondary identifiers, without any impact on related entities. Something you cannot do on a physical model or even worse, on a physical database containing company data. In a logical model, you focus on logical decisions. You define entities, attributes, relations, candidate identifiers, deciding which one is primary and which are secondary. And you try to model your entities until the Third Normalisation Form (see Normalisation process).

The relation with Order and the attribute Line Nr is the primary identifier for an Order Line
Two different Order Lines on the same Order cannot order the same Product. The relation with Product and the attribute Line Nr is a unique identifier.

You do not bother about system-generated primary keys: they are a physical decision and belong to the physical model.

Physical Model

Once you are satisfied with the result, you can transform the logical model into a physical model:

entities will become tables
attributes become columns
relations become foreign keys
primary identifiers become primary keys
secondary identifiers become unique keys

The transformation will result for the table Order Line in:

A primary key, consisting of two columns: Order Nr and Line Nr
A unique key, consisting of two columns: Order Nr and Product Code
A foreign key column Order Nr
A foreign key column Product Code

This might be a very nice solution from a logical point of view, but for technical reasons this is, as we saw, not a good solution:

What happens if a the code changes in the Product table? Also change it in the Order Line?
Since Order Line will be linked to other tables like Discount and Invoice Line, these tables would inherit all the columns of the primary key (Order Nr and Line Nr) as a foreign key in their table. And what if one of those numbers are changed.

This is were your data modelling tool can prove added value. The thing you want to achieve when transforming your logical model into a physical model are:

Create for all tables a meaningless system-generated primary key
Use this primary key as foreign key column to all related child tables
Translate the primary identifier as unique key
Translate the secondary identifier(s) as unique key(s)

The end result should look similar to this:

Of course you can perform these tasks yourself, but this is very cumbersome and you are bound to make mistakes.

Tool Features

So when considering a data modelling tool, following features are indispensable:

Logical Model

Promote unique identifiers to become primary identifiers
Demote primary identifiers to become unique identifiers
Ability to define multiple unique identifiers per entity based on attributes, relations or a combination of both. Even an expensive and renowned tool like PowerDesigner does not allow defining a relationship as part of a unique identifier, only as part of the primary identifier.

Transformation

Generate all relationships as foreign keys in the physical model
Translate Primary Identifiers as Primary keys
Translate Unique Identifiers as Unique Keys
Ability to generate a meaningless primary key for each table, leaving all unique keys coming from the logical model intact.
Apply changes in the Logical Model to the Physical Model to keep both synchronised
Generate DDL statements (CREATE, ALTER, ...) for your preferred database.

Reference

A nice discussion regarding the same issue can be found on https://forums.oracle.com/forums/thread.jspa?threadID=1665886.

On http://www.databaseanswers.org/modelling_tools.htm you have a complete list of data modelling tools. I preferred Oracle Designer but it is quite expensive!

If you want to know how you have to read ERD diagrams using the Barker notation and specifically their relationships, please refer to http://www.essentialstrategies.com/publications/modeling/barker.htm.

2 comments:

Anonymous15 February 2012 at 12:23
Nice article, though I don't agree with creating surrogate keys all over the place just in case a code changes.
I checked out what you said about PowerDesigner. What you said is correct, but not complete. Via the relationship definition, selecting the 'Dependent' option adds the relationship to the primary identifier in the child entity. If you want the relationship to participate in a non-primary identifier, you have to edit the identifier in the child entity.
Either:
1. Change the existing identifier to 'non-primary'
Or:
2. Create a new non-primary identifier, and add the FK attribute(s)via the 'attributes' tab on the identifier's property sheet.

Tips for better analysing the needs of your customer

Pages

Tuesday 14 February 2012

Has data modelling become old-fashioned?

Logical and Physical Model

Logical Model

Physical Model

Tool Features

Logical Model

Transformation

Reference

Reference

2 comments: