Most
seasoned developers and database
administrators know that primary
key values should be intrinsically
meaningless. They exist for internal
use only as permanent record identifiers.
They really shouldn't be shown
to users.
By such reasoning,
the primary key of an Order record
should be completely arbitrary.
It shouldn't matter if it is "1234567" or "96b28a7b-a328-4bf3-95a0-06a29e45f582" (which
many of you recognize as the
string representation of a "Globally
Unique Identifier" or "Guid").
Our customers
- the end users of our applications
- are accustomed to referring
to Orders by a unique reference
number. Over the years, we've
trained them to like calling
an order by its number, "1234567".
Thus acclimated, they start to
expect that the next invoice
created will be numbered "1234568".
They start to worry if there
are gaps in the sequence - if
the order jumps to "1234570",
for example. "What happened to
'68' and '69' ?", they cry. They'll
have no patience at all for "96b28a7b-a328-4bf3-95a0-06a29e45f582" or
its successor, "81063553-6686-41f4-8b50-130a4deb444d".
Key
Misery
The customer
demand for an uninterrupted progression
of reference numbers is at cross
purposes with our need as developers
for programming flexibility and
primary key stability.
We want to
set a record's primary key upon
insertion and never change it
again. If the user creates a
temporary record and then deletes
it, we don't want to care that
we just burned the next id.
It's a dead
certainty that the customers'
affection for meaningful primary
keys is going to plague us. Next
thing you know, they're going
to ask us to fill in the sequence
gaps. Then they will want to
prepend the reference number
with the year of the order ("06-1234567")
and suffix it with the customer
initials ("06-1234567-IB"). If
the customer changes its name,
they will want us to update the
initials ("06-1234567-JC").
Did I mention
that the customer wants to sort
the orders by reference number?
Imagine the fun we'll have sorting
a string consisting of a mix
of alpha decorations and an integer
counter; one of my clients requires
that orders sort by integer value,
regardless of prefix, yielding
sequences such as "A23", "A123", "B124", "A1123".
No "leading zero" trick allowed!
"Surrogate" vs. "Natural" Keys
The customer
is always right.
Sure, the
customer vaguely understands
that our pain will become his
pain if his requirements become
too demanding. He sort of understands
that messing with the primary
key could lead to bugs and data
integrity problems. But, at the
end of the day, that is our problem,
not his.
Our best hope
lies in damage control. Damage
control begins with a distinction
between the "surrogate key" and
the "natural key".
The "surrogate
key" is the permanent, arbitrary
primary key that keeps our code
and data consistent and simple.
The "natural key" is the protean
beast our customer loves. We
can finesse the key misery problem
if we can store both values in
our database tables.
Woe to the
developer whose database schema
is fixed. I offer no words of
comfort, no path to salvation.
If you are
fortunate and can change the
database even a little, this
tip is for you.
Separating
Surrogate and Natural Keys
I'm going
to assume that you have an existing
database and that you want to
make as few changes as necessary.
In most applications,
there are only a small number
of tables with natural keys;
the customer doesn't care about
the ids of code tables (e.g.,
OrderStatus) or child tables
(OrderItemDetail). The tables
with natural keys tend to have
comparatively few records; there
are many more OrderItemDetail
records than parent Order records.
So our approach
will waste some table space in
order to achieve simplicity and
flexibility.
Create a script
to do the following to each table
with a natural key:
- Commit to yourself and the
world that the current primary
key is permanent and inviolate.
- Add a Natural Key column,
preferrably a string.
- Add a Natural Key Sort column
(typically an integer type).
- Copy the primary key values
into the two new columns (converting
the primary key to string for
the Natural Key column).
- Constrain the new columns
to be non-null.
- Constrain the new columns
to be unique.
- Add an insert trigger to set
the values of the new columns
(more about this shortly).
You play your
scripts and update your development
database. You confirm that you
get exactly what you expected.
You check the scripts into source
control.
The
Insert Trigger
I'll get to
the definition of the insert
trigger in a moment. Let's establish
the game plan first.
While a natural
key may change, it isn't going
to change often. Our objective
is to establish a routine for
creating the natural key when
we insert a new record into the
table.
The new value
should be unique and should avoid
sequence gaps. The easiest way
to satisfy these constraints
is to let the data tier calculate
the natural key at the moment
of insertion. That's the job
of the trigger.
DevForce will
play along nicely as we'll soon
see.
Rebuild
the Business Object Model
You have to
rebuild the business model to
pick up both new natural key
column properties. That's easy.
- Launch the DevForce Object
Mapper.
- Confirm that you want it
to update the model based on
the schema changes.
- Include the two new columns.
- Save and close the mapper.
- Rebuild the Model project.
Update
the UI to use the Natural Key
Property
You are probably
displaying the primary key right
now. You want to switch to the
natural key.
Fortunately,
your application binds your UI
controls to business object properties
and these properties are easy
to find. You should be able to
search and replace references
to the primary key property with
the natural key property, as
in replacing "OrderId" with "OrderNumber".
Obviously
you'll be careful about preserving
the few places where you really
want the primary key.
Saving
a New Business Object
This part
is automatic. You won't have
to change your code at all. Here's
the save sequence.
- The client application indicates
its intention to save the new
object (e.g., user presses
the "Save" button).
- DevForce id generation sets
the permanent primary key of
the new object (the key was
temporary until this point).
- DevForce id fixup corrects
all other entity references
to the new object so that their
foreign key values now reflect
the new, permanent id.
- The natural key properties
in the new object may be null
or may have temporary values; it
doesn't matter.
- DevForce requests that the
database insert the new object.
- The database insert trigger
fires, calculates the natural
key and natural key sort values,
and replaces those values in
the newly inserted record.
- DevForce re-reads the inserted
record.
- DevForce converts the updated record
back into a business object.
- DevForce returns the business object
to the client.
- The client UI refreshes,
revealing the natural key.
Step #7 is
what makes this process so easy.
DevForce always re-reads the
entity after saving - both on
inserts and updates - in case
a database trigger updates the
record. That's what happens here.
Inside
the Insert Trigger
Your trigger
will be simple if you're lucky.
Suppose the
customer requires that the order
reference number be an integer
that increments without gaps.
The ideal solution is to define
the Natural Key Sort column as
an autoincrement column (assuming
MS SQL Server).
Note:
Add a step to
your SQL script: you will
initiallize the Natural
Key Sort Column the seed
to the next available integer.
|
The insert
operation automatically sets
the next Natural Key Sort Column
value via the autoincrement;
the insert trigger merely converts
this to a string and inserts
it into the Natural Key Column.
A slight
variation on this scheme will
satisfy the scenarios we considered
earlier in which extra information
is prepended (the year) and appended
(the customer initials) to an
integer value that otherwise
increments by one with each insert.
Your task is to write a function
that reads the Natural Key Sort
Column value and returns the
calculated string; the trigger
stores this string into the Natural
Key Column.
More exotic
scenarios require more extreme
calculations. You may have to
abandon the autoincrementing
Natural Key Sort Column in favor
of a next-integer-key lookup
table. This is why they pay you
the big money.
The essential
points are (a) that you have
separated natural key generation
from surrogate key generation
and (b) you've consolidated the
logic into a function and trigger
on the data tier.
Why
both a Natural Key and Natural
Key Sort Column?
The Natural
Key is superflous in the simple
case wherein the reference number
is an integer. You don't really
need both columns; you can convert
the Natural Key Sort column into
a string in the client UI. You
can add a custom property ("OrderNumber")
to the business object for this
purpose.
The wheels
will come off when your customer
demands complex decoration of
the reference number.
You may think
you are still in luck if you
can calculate the reference number
on the fly (as you could in the
scenarios we've described). Your
Natural Key property is a tad
more complicated but not by much.
This approach
can get you in trouble. The reference
number will change if the user
modifies the order date or the
customer name; "06-1234567-IB" becomes "07-1234567-JC" .
That may be ok but I'll bet your
customer is not going to be happy
when reconciling printed invoices
and payments to the altered order
reference number.
Note:
At least your
program is safe because
the data for orders, invoices,
and payments are linked
internally via the permanent
surrogate keys of these
records!
|
I suggest
that you inscribe the calculated
Natural Key value into the record
rather than calculate it in the
UI. You can run a database scrub
process to update the Natural
Key column when (and if) your
customer decides to "corrrect" the
Natural Key to reflect year and
customer name changes.
Keep
the Natural Key Sort Column
You may be
tempted to keep the Natural Key
column and do away with the Natural
Key Sort column. After all, once
you've generated the Natural
Key, it can be sorted easily
right?
My bet is
your luck will not hold and the
customer is going to require
complex sorting rules that are
tough to implement. A designated
sort column makes life easier
for your application and for
other applications (including
server side reports) that should all
apply the same sorting rules
consistently. It doesn't cost
much; leave it in the data.
Conclusion
It's never
easy to keep your customer happy
and keep your sanity. Separating
surrogate keys from natural keys
can help. Use a trigger, rely
on Object Mapping, and let the
DevForce persistence implementation
carry the load.
Happy Coding!
|