GUID
values used for primary keys have many adherents because of their guaranteed universal
uniqueness.
You can give a new record a GUID primary key value on your desktop machine and
save the record to the corporate database without concern that another local
user will duplicate your key value. Since you don’t have to mess with temporary
primary key values, you also get to skip the work of updating the foreign keys
of other records that point to those new local records (with their temporary
keys).
Of course, DevForce includes
great technology for managing temporary key values, and that technology takes away
the pain ordinarily associated with permitting disconnected local creation of
new records. But even with that, GUID
keys still have benefits that make them attractive. Suppose, for example, that you have to merge multiple
tables of like items from disparate sources. Unless the tables use GUIDs for
primary keys, it is very unlikely that the primary key values will be unique
across the tables. That means you have to perform an involved process not only
of reassigning primary key values, but also of mapping the foreign keys of
records in related tables to the new values. It can get quite involved. And all
of that work becomes completely unnecessary with GUIDs.
Even within DevForce apps,
GUIDs can make certain strategies a lot easier to implement. Consider
application designs that use multiple PersistenceManagers to maintain separate
editing spaces, or which store cache data in multiple entity sets on the local
machine. The problems of keeping data straight across these separate spaces are
considerably simplified if objects can be guaranteed always to have unique key
values.
On the other hand, GUIDs are
big: 16 bytes compared to 4 bytes for a (SQL Server) int or 8 bytes for a
bigint. Swedish software developer Jimmy Nilsson had listened to lots of
speculation about the performance impact of GUID keys and decided to do a bit
of testing to see the effect for himself.
What he found was a considerable penalty related to the use of GUIDs compared
to integer keys. However, he also discovered that most of the performance
difference between the two seemed to stem, not from the size difference, but
rather from the effect of the GUIDs’ randomness
on the maintenance of clustered indexes.
Consider what happens to
records with identity keys: they sort in the same order in which the key
values are generated (1, 2, 3, etc.). When
a record with such a primary key gets added to a table, the generated key value
is such that the record will always sort to the bottom of the table.
GUID key values, by
contrast, sort randomly. The integer equivalent would be to generate a sequence
of values like 102918, 2, 44763, 5991, 733298, and 843. With such random key
values, you can see that regular major adjustments will be necessary to stick
each new record in the right place in the file to support the clustered index.
Of course, database engines perform wonderful tricks to optimize such position
shuffling, but even with all the tricks, there appears to be a signficant
penalty.
Improving Upon the GUID
Working
from his insight
about the cause of GUIDs’ performance penalty, developer Nilsson designed a
variant of the GUID value that – just like integer identity keys – reflects the
creation order. He did this by replacing
some of the bytes of a GUID value with corresponding bytes that reflect the
time of creation of the key value. He
named his variant a “GUIDCOMB” – short for “GUID combination”. Then he tested the GUIDCOMB keys in various
common operations and found that they yielded data access performance very
nearly as good as uniformly increasing integer keys – and 20 to 30 times better
than regular GUIDs!
As an extra added bonus, the
last record added to a table is also the last record shown in its default sort.
Nilsson
describes both his
tests and his GUIDCOMB key in his article, “The Cost of GUIDs As Primary Keys”,
available at http://www.informit.com/articles/article.aspx?p=25862&rl=1. Highly recommended!
Other
developers, including
IdeaBlade’s Ward Bell, have built upon Nilsson’s work to produce improved
variations of the GUIDCOMB. Below find
Ward’s
version (along with a complementary method that extracts the datetime part back
out of a GUIDCOMB):
C#:
using System;
using System.Collections.Generic;
using System.Text;
namespace ConsoleApplication1 {
/// <summary>
/// Create a Guid.COMB
/// (A COMBination of an ordinary GUID and
the current datetime.)
/// </summary>
/// <remarks>
/// This Guid generator returns sequential
Guids.
/// The why's, how's, and perf tests are in
/// Jimmy Nilsson's article
/// http://www.informit.com/articles/article.asp?p=25862.
/// <para>
/// It appears that NHibernate uses the
implementation
/// described in that article and reproduced
below.
/// We do not know about the perf
characteristics
/// of Guid.Comb vs. Guid in databases other
than
/// MS SQL Server.
/// </para><para>
/// The current datetime should be in
/// SQL Server format. That is days after
/// 1st of Jan 1900 and the no of
milliseconds
/// after midnight, divided by 3.3333.
/// It is the lowest six bytes of the GUID
/// that get exchanged for the current
datetime.
/// </para>
/// </remarks>
public static class GuidComb {
/*********************************
*
/// <summary>Create a new Guid.Comb.</summary>
/// <remarks>
/// Version by
jtmueller, Jul 9, 2002.
/// <![CDATA[
///
http://www.informit.com/discussion/index.asp?postid=a8275a70-0698-46f0-8c8f-bf687464628c&rl=1
/// ]]>
/// The author
wonders:
/// "One issue
is that while the resolution for a DATETIME in SQL Server is 1/300th of a
second,
/// the resolution
for a DateTime in .NET is 1/10000th of a second (100 nanoseconds).
/// This means that
the sequential part of the COMB will change more frequently
/// than in the SQL
Server version, but I'm not sure if this is a good or a bad thing."
/// On the chance
that this is a problem, we've gone with the version below.
/// </remarks>
public static Guid
NewComb() {
byte[] dateBytes
= BitConverter.GetBytes(DateTime.UtcNow.Ticks);
byte[] guidBytes
= Guid.NewGuid().ToByteArray();
// copy the last
six bytes from the date to the last six bytes of the GUID
Array.Copy(dateBytes, dateBytes.Length - 7, guidBytes, guidBytes.Length
- 7, 6);
return new
Guid(guidBytes);
}
*
*********************************/
/// <summary>Create a new Guid.Comb.</summary>
/// <remarks>
/// Version by glapointe, Aug 16, 2002
/// <![CDATA[
///
http://www.informit.com/discussion/index.asp?postid=a8275a70-0698-46f0-8c8f-bf687464628c&rl=1
/// ]]>
/// This one purports to "match exactly
(or pretty darn close) the combs created in SQL Server."
/// However, I modified slightly to use
UtcNow and static BaseDate values.
/// UtcNow gives cross-timezone ordering of
records inserted with Guid.Comb keys.
/// </remarks>
public static Guid NewComb() {
byte[] guidArray = System.Guid.NewGuid().ToByteArray();
DateTime now = DateTime.UtcNow; // was: DateTime now = DateTime.Now;
// Get the days and milliseconds which
will be used to build the byte string
TimeSpan days = new TimeSpan(now.Ticks - msBaseDateTicks);
TimeSpan msecs = new TimeSpan(now.Ticks - (new DateTime(now.Year, now.Month, now.Day).Ticks));
// Convert days and msecs to byte arrays
// SQL Server is accurate to 1/300th of a
millisecond
// .NET DateTime ticks are in milliseconds
// so we divide .NET ticks by 3.333333
byte[] daysArray = BitConverter.GetBytes(days.Days);
byte[] msecsArray = BitConverter.GetBytes((long)(msecs.TotalMilliseconds /
3.333333));
// Reverse the bytes to match SQL Servers
ordering
Array.Reverse(daysArray);
Array.Reverse(msecsArray);
// Copy the bytes into the guid
Array.Copy(daysArray, daysArray.Length - 2, guidArray,
guidArray.Length - 6, 2);
Array.Copy(msecsArray, msecsArray.Length - 4, guidArray,
guidArray.Length - 4, 4);
return new System.Guid(guidArray);
}
/// <summary>Extract datetime part of a Guid.Comb.</summary>
/// <remarks>
/// Please note: Because I modified <see
cref="NewComb"/> to use Utc datetime,
/// this function returns the date in
Universal Time;
/// you may want to convert back to local
time with
/// <code>GetDateFromComb(aGuidComb).ToLocalTime();</code>
/// </remarks>
public static DateTime GetDateFromComb(System.Guid guid) {
byte[] daysArray = new byte[4];
byte[] msecsArray = new byte[4];
byte[] guidArray = guid.ToByteArray();
// Copy the date parts of the guid to the
respective byte arrays.
Array.Copy(guidArray, guidArray.Length - 6, daysArray, 2,
2);
Array.Copy(guidArray, guidArray.Length - 4, msecsArray, 0,
4);
// Reverse the arrays to put them into the
appropriate order
Array.Reverse(daysArray);
Array.Reverse(msecsArray);
// Convert the bytes to ints
int days = BitConverter.ToInt32(daysArray, 0);
int msecs = BitConverter.ToInt32(msecsArray, 0);
DateTime date = msBaseDate.AddDays(days);
date = date.AddMilliseconds(msecs *
3.333333);
return date;
}
private static DateTime msBaseDate = new DateTime(1900, 1, 1);
private static long msBaseDateTicks = msBaseDate.Ticks;
}
}
VB:
Imports Microsoft.VisualBasic
Imports System
Imports System.Collections.Generic
Imports System.Text
''' <summary>
''' Create a Guid.COMB
''' (A COMBination of an ordinary GUID and the current
datetime.)
''' </summary>
''' <remarks>
''' This Guid generator returns sequential Guids.
''' The why's, how's, and perf tests are in
''' Jimmy Nilsson's article
''' http://www.informit.com/articles/article.asp?p=25862.
''' <para>
''' It appears that NHibernate uses the implementation
''' described in that article and reproduced below.
''' We do not know about the perf characteristics
''' of Guid.Comb vs. Guid in databases other than
''' MS SQL Server.
''' </para><para>
''' The current datetime should be in
''' SQL Server format. That is days after
''' 1st of Jan 1900 and the no of milliseconds
''' after midnight, divided by 3.3333.
''' It is the lowest six bytes of the GUID
''' that get exchanged for the current datetime.
''' </para>
''' </remarks>
Public Class GuidComb
'********************************
'*
'''' <summary>Create a new
Guid.Comb.<'summary>
'''' <remarks>
'''' Version by jtmueller, Jul 9, 2002.
'''' <![CDATA[
''''
http:''www.informit.com'discussion'index.asp?postid=a8275a70-0698-46f0-8c8f-bf687464628c&rl=1
'''' ]]>
'''' The author wonders:
'''' "One
issue is that while the resolution for a DATETIME in SQL Server is 1'300th of
a
second,
'''' the resolution for a DateTime in .NET
is 1'10000th of a second (100 nanoseconds).
'''' This means that the sequential part
of the COMB will change more frequently
'''' than in the SQL Server version, but
I'm not sure if this is a good or a bad thing."
'''' On the chance that this is a problem,
we've gone with the version below.
'''' <'remarks>
'Public Shared Function NewComb() As Guid
'
Dim dateBytes As Byte() = BitConverter.GetBytes(DateTime.UtcNow.Ticks)
'
Dim guidBytes As Byte() = Guid.NewGuid().ToByteArray()
'
' copy the last six bytes from the date to the last six bytes of the
GUID
'
Array.Copy(dateBytes, dateBytes.Length - 7, guidBytes, guidBytes.Length
- 7, 6)
'
Return New Guid(guidBytes)
'End Function
'*
'********************************
''' <summary>Create a new Guid.Comb.</summary>
''' <remarks>
''' Version by glapointe, Aug 16, 2002
''' <![CDATA[
'''
http://www.informit.com/discussion/index.asp?postid=a8275a70-0698-46f0-8c8f-bf687464628c&rl=1
''' ]]>
''' This one purports to "match
exactly (or pretty darn close) the combs created in SQL Server."
''' However, I
modified slightly to use UtcNow and static BaseDate values.
''' UtcNow gives cross-timezone ordering
of records inserted with Guid.Comb keys.
''' </remarks>
Private Sub New()
End Sub
Public Shared Function NewComb() As Guid
Dim guidArray As Byte() = System.Guid.NewGuid().ToByteArray()
Dim now As DateTime = DateTime.UtcNow ' was: DateTime now = DateTime.Now;
' Get the days and milliseconds which will
be used to build the byte string
Dim days As TimeSpan = New TimeSpan(now.Ticks - msBaseDateTicks)
Dim msecs As TimeSpan = New TimeSpan(now.Ticks - (New DateTime(now.Year, now.Month, now.Day).Ticks))
' Convert days and msecs to byte arrays
' SQL Server is accurate to 1/300th of a
millisecond
' .NET DateTime ticks are in milliseconds
' so we divide .NET ticks by 3.333333
Dim daysArray As Byte() = BitConverter.GetBytes(days.Days)
Dim msecsArray As Byte() = BitConverter.GetBytes(CLng(Fix(msecs.TotalMilliseconds / 3.333333)))
' Reverse the bytes to match SQL Servers
ordering
Array.Reverse(daysArray)
Array.Reverse(msecsArray)
' Copy the bytes into the guid
Array.Copy(daysArray, daysArray.Length
- 2, guidArray, guidArray.Length - 6, 2)
Array.Copy(msecsArray,
msecsArray.Length - 4, guidArray, guidArray.Length - 4, 4)
Return New System.Guid(guidArray)
End Function
''' <summary>Extract datetime part of a Guid.Comb.</summary>
''' <remarks>
''' Please note: because I modified <see
cref="NewComb"/> to use Utc datetime,
''' this function returns the date in
Universal Time;
''' you may want to convert back to local
time with
''' <code>GetDateFromComb(aGuidComb).ToLocalTime();</code>
''' </remarks>
Public Shared Function GetDateFromComb(ByVal guid As System.Guid) As DateTime
Dim daysArray As Byte() = New Byte(3) {}
Dim msecsArray As Byte() = New Byte(3) {}
Dim guidArray As Byte() = guid.ToByteArray()
' Copy the date parts of the guid to the
respective byte arrays.
Array.Copy(guidArray, guidArray.Length
- 6, daysArray, 2, 2)
Array.Copy(guidArray, guidArray.Length
- 4, msecsArray, 0, 4)
' Reverse the arrays to put them into the
appropriate order
Array.Reverse(daysArray)
Array.Reverse(msecsArray)
' Convert the bytes to ints
Dim days As Integer = BitConverter.ToInt32(daysArray, 0)
Dim msecs As Integer = BitConverter.ToInt32(msecsArray, 0)
Dim [date] As DateTime = msBaseDate.AddDays(days)
[date] = [date].AddMilliseconds(msecs *
3.333333)
Return [date]
End Function
Private Shared msBaseDate As DateTime = New DateTime(1900, 1, 1)
Private Shared msBaseDateTicks As Long = msBaseDate.Ticks
End Class
Upon examing the above code,
you may find yourself wondering why the date bytes are substituted at the end of the GUID value rather than at the
beginning. That’s because SQL Server compares uniqueidentifier
values by looking at byte “groups” right-to-left, and then looking
left-to-right within a byte “group”. The
byte groups are those little clusters you see delimited by hyphens (“-“) when
you examine the data in a uniqueidentifier column. For more information on the SQL Server
uniqueidentifier sort process, see the following short article:
http://blogs.msdn.com/sqlprogrammability/archive/2006/11/06/how-are-guids-compared-in-sql-server-2005.aspx
Exercising the NewComb() Method
Finally, here’s a little
program that uses the NewComb() and GetDateFromComb() methods to generate some
GUIDs and then extract the datetime info from them:
C#:
using System;
using System.Collections.Generic;
using System.Text;
namespace ConsoleApplication1 {
class Program {
static void Main(string[] args) {
GenerateGuidCombs();
}
private static void GenerateGuidCombs() {
for (int i = 0; i < 10; i++) {
Guid aGuid = GuidComb.NewComb();
Console.WriteLine(aGuid.ToString() + " " +
GuidComb.GetDateFromComb(aGuid).ToString(
"g yyyy/MM/dd hh:mm:ss:ffffff tt"));
System.Threading.Thread.Sleep(250);
}
Console.ReadLine();
}
}
}
VB:
Module Module1
Sub Main()
GenerateGuidCombs()
End Sub
Private Sub GenerateGuidCombs()
For i As Integer = 0 To 9
Dim aGuid As Guid = GuidComb.NewComb()
Console.WriteLine(aGuid.ToString() & " " & _
GuidComb.GetDateFromComb(aGuid).ToString( _
"g
yyyy/MM/dd hh:mm:ss:ffffff tt"))
System.Threading.Thread.Sleep(250)
Next i
Console.ReadLine()
End Sub
End Module
The program produces output
similar to the following. Note that the program introduces a quarter-second
delay between generation of each succeeding value. You can see that 250-millisecond
delay reflected in the significant digits of the seconds fraction displayed at
the very end of the time value, just before the “AM”.

If you remove the time delay
in the for loop you’ll get a bunch
of identical GUIDCOMB values generated, because the program can generate many
of them
under the time resolution of the system clock.
Of course, you’re not likely to get records added to your database table
as rapidly as you can generate GUIDCOMBs in a tight loop. But even if you did, it would be just fine
from a sorting perspective: the last record added could still be placed at the
bottom of the table, so there would be no shuffling of records!

Using a GUIDCOMB in an Entity’s Create Method
You use a GUIDCOMB in a Create()
method just as you would use a GUID: you assign it directly to the new object’s
primary key column. Note that primary keys are marked ReadOnly by the DevForce
Object Mapper, so you cannot go through the primary key’s setter. Instead you
use the column indexer:
C#:
public static Tomato Create(PersistenceManager pManager) {
Tomato aTomato =
pManager.CreateEntity<Tomato>();
// Assign GUIDCOMB Id value directly...
aTomato["Id"] = GuidComb.NewComb();
aTomato.AddToManager();
return aTomato;
}
VB:
Public Shared Function Create(ByVal
pManager As PersistenceManager) As Tomato
Dim aTomato As Tomato = pManager.CreateEntity(Of Tomato)()
' Assign GUIDCOMB Id
value directly...
aTomato("Id") = GuidComb.NewComb()
aTomato.AddToManager()
Return aTomato
End Function
Conclusions
To
the widely acknowledged
benefits of GUID keys, GUIDCOMBs add – at least when using SQL Server – a
dramatic performance improvement, and a handy ordering benefit. It seems likely
that a similar performance benefit will be experienced on other database platforms
as well, but as Nilsson says, you should always test in your own environment
to
get results you can bank on.
The only apparent downside
of GUIDCOMBS is an increased chance (relative to straight GUIDs) of producing a
duplicate; however, the probability of that remains exceedingly small. (See Nilsson’s comments on this in his
article.) Looks like a winner!
In
a clustered index, the table rows are physically sorted in the order of the
values in the indexed column or columns. As such, a given table can support
only one clustered index. In SQL Server, by default, primary key indexes are
clustered.
|