DevForce® Classic Tech Tips
Improve Your Guid Key Performance with GuidCombs 



Improve Your Guid Key Performance with GuidCombs Level 200
DevForce Express
Aug 23, 2007

GUID [1] values used for primary keys have many adherents because of their guaranteed universal uniqueness [2] . You can give a new record a GUID primary key value on your desktop machine and save the record to the corporate database without concern that another local user will duplicate your key value. Since you don’t have to mess with temporary primary key values, you also get to skip the work of updating the foreign keys of other records that point to those new local records (with their temporary keys).

Of course, DevForce includes great technology for managing temporary key values, and that technology takes away the pain ordinarily associated with permitting disconnected local creation of new records.  But even with that, GUID keys still have benefits that make them attractive.  Suppose, for example, that you have to merge multiple tables of like items from disparate sources. Unless the tables use GUIDs for primary keys, it is very unlikely that the primary key values will be unique across the tables. That means you have to perform an involved process not only of reassigning primary key values, but also of mapping the foreign keys of records in related tables to the new values. It can get quite involved. And all of that work becomes completely unnecessary with GUIDs.

Even within DevForce apps, GUIDs can make certain strategies a lot easier to implement. Consider application designs that use multiple PersistenceManagers to maintain separate editing spaces, or which store cache data in multiple entity sets on the local machine. The problems of keeping data straight across these separate spaces are considerably simplified if objects can be guaranteed always to have unique key values.

On the other hand, GUIDs are big: 16 bytes compared to 4 bytes for a (SQL Server) int or 8 bytes for a bigint. Swedish software developer Jimmy Nilsson had listened to lots of speculation about the performance impact of GUID keys and decided to do a bit of testing to see the effect for himself.  What he found was a considerable penalty related to the use of GUIDs compared to integer keys. However, he also discovered that most of the performance difference between the two seemed to stem, not from the size difference, but rather from the effect of the GUIDs’ randomness on the maintenance of clustered indexes. [3]

Consider what happens to records with identity keys:  they sort in the same order in which the key values are generated (1, 2, 3, etc.).  When a record with such a primary key gets added to a table, the generated key value is such that the record will always sort to the bottom of the table.

GUID key values, by contrast, sort randomly. The integer equivalent would be to generate a sequence of values like 102918, 2, 44763, 5991, 733298, and 843. With such random key values, you can see that regular major adjustments will be necessary to stick each new record in the right place in the file to support the clustered index. Of course, database engines perform wonderful tricks to optimize such position shuffling, but even with all the tricks, there appears to be a signficant penalty.

Improving Upon the GUID

Working from his insight about the cause of GUIDs’ performance penalty, developer Nilsson designed a variant of the GUID value that – just like integer identity keys – reflects the creation order.  He did this by replacing some of the bytes of a GUID value with corresponding bytes that reflect the time of creation of the key value.  He named his variant a “GUIDCOMB” – short for “GUID combination”.  Then he tested the GUIDCOMB keys in various common operations and found that they yielded data access performance very nearly as good as uniformly increasing integer keys – and 20 to 30 times better than regular GUIDs!

As an extra added bonus, the last record added to a table is also the last record shown in its default sort. [4]

Nilsson describes both his tests and his GUIDCOMB key in his article, “The Cost of GUIDs As Primary Keys”, available at http://www.informit.com/articles/article.aspx?p=25862&rl=1.  Highly recommended!

Other developers, including IdeaBlade’s Ward Bell, have built upon Nilsson’s work to produce improved variations of the GUIDCOMB.  Below find Ward’s version (along with a complementary method that extracts the datetime part back out of a GUIDCOMB):

 

C#:

using System;
using System.Collections.Generic;
using System.Text; 

namespace ConsoleApplication1 {
  /// <summary>
  /// Create a Guid.COMB
  /// (A COMBination of an ordinary GUID and the current datetime.)
  /// </summary>
  /// <remarks>
  /// This Guid generator returns sequential Guids.
  /// The why's, how's, and perf tests are in
  /// Jimmy Nilsson's article
  /// http://www.informit.com/articles/article.asp?p=25862.
  /// <para>
  /// It appears that NHibernate uses the implementation
  /// described in that article and reproduced below.
  /// We do not know about the perf characteristics
  /// of Guid.Comb vs. Guid in databases other than
  /// MS SQL Server.
  /// </para><para>
  /// The current datetime should be in
  /// SQL Server format. That is days after
  /// 1st of Jan 1900 and the no of milliseconds
  /// after midnight, divided by 3.3333.

  /// It is the lowest six bytes of the GUID
  /// that get exchanged for the current datetime.
  /// </para>
  /// </remarks>

  public static class GuidComb { 

    /*********************************
     *
    /// <summary>Create a new Guid.Comb.</summary>
    /// <remarks>
    /// Version by jtmueller, Jul 9, 2002.
    /// <![CDATA[
    /// http://www.informit.com/discussion/index.asp?postid=a8275a70-0698-46f0-8c8f-bf687464628c&rl=1
    /// ]]>
    /// The author wonders:
    /// "One issue is that while the resolution for a DATETIME in SQL Server is 1/300th of a second,
    /// the resolution for a DateTime in .NET is 1/10000th of a second (100 nanoseconds).
    /// This means that the sequential part of the COMB will change more frequently
    /// than in the SQL Server version, but I'm not sure if this is a good or a bad thing."
    /// On the chance that this is a problem, we've gone with the version below.
    /// </remarks>
    public static Guid NewComb() {
      byte[] dateBytes = BitConverter.GetBytes(DateTime.UtcNow.Ticks);
      byte[] guidBytes = Guid.NewGuid().ToByteArray();
      // copy the last six bytes from the date to the last six bytes of the GUID
      Array.Copy(dateBytes, dateBytes.Length - 7, guidBytes, guidBytes.Length - 7, 6);
      return new Guid(guidBytes);
    }
    *
    *********************************/
    /// <summary>Create a new Guid.Comb.</summary>
    ///
<remarks>
    /// Version by glapointe, Aug 16, 2002
    /// <![CDATA[
    /// http://www.informit.com/discussion/index.asp?postid=a8275a70-0698-46f0-8c8f-bf687464628c&rl=1
    /// ]]>
    ///
This one purports to "match exactly (or pretty darn close) the combs created in SQL Server."
    /// However, I modified slightly to use UtcNow and static BaseDate values.
    /// UtcNow gives cross-timezone ordering of records inserted with Guid.Comb keys.
    /// </remarks>
    public static Guid NewComb() {
      byte[] guidArray = System.Guid.NewGuid().ToByteArray(); 

      DateTime now = DateTime.UtcNow; // was: DateTime now = DateTime.Now; 

      // Get the days and milliseconds which will be used to build the byte string
      TimeSpan days = new TimeSpan(now.Ticks - msBaseDateTicks);
      TimeSpan msecs = new TimeSpan(now.Ticks - (new DateTime(now.Year, now.Month, now.Day).Ticks));  

      // Convert days and msecs to byte arrays
      // SQL Server is accurate to 1/300th of a millisecond
      // .NET DateTime ticks are in milliseconds
      // so we divide .NET ticks by 3.333333
      byte[] daysArray = BitConverter.GetBytes(days.Days);
     
byte[] msecsArray = BitConverter.GetBytes((long)(msecs.TotalMilliseconds / 3.333333)); 

      // Reverse the bytes to match SQL Servers ordering
      Array.Reverse(daysArray);
      Array.Reverse(msecsArray); 

      // Copy the bytes into the guid
      Array.Copy(daysArray, daysArray.Length - 2, guidArray, guidArray.Length - 6, 2);
      Array.Copy(msecsArray, msecsArray.Length - 4, guidArray, guidArray.Length - 4, 4); 

      return new System.Guid(guidArray);
    } 

    /// <summary>Extract datetime part of a Guid.Comb.</summary>
    ///
<remarks>
    /// Please note: Because I modified <see cref="NewComb"/> to use Utc datetime,
    /// this function returns the date in Universal Time;
    /// you may want to convert back to local time with
    /// <code>GetDateFromComb(aGuidComb).ToLocalTime();</code>
    ///
</remarks>
    public static DateTime GetDateFromComb(System.Guid guid) {
      byte[] daysArray = new byte[4];
      byte[] msecsArray = new byte[4];
      byte[] guidArray = guid.ToByteArray(); 

      // Copy the date parts of the guid to the respective byte arrays.
      Array.Copy(guidArray, guidArray.Length - 6, daysArray, 2, 2);
      Array.Copy(guidArray, guidArray.Length - 4, msecsArray, 0, 4); 

      // Reverse the arrays to put them into the appropriate order
      Array.Reverse(daysArray);
      Array.Reverse(msecsArray); 

      // Convert the bytes to ints
      int days = BitConverter.ToInt32(daysArray, 0);
      int msecs = BitConverter.ToInt32(msecsArray, 0); 

      DateTime date = msBaseDate.AddDays(days);
      date = date.AddMilliseconds(msecs * 3.333333); 

      return date;
    } 

    private static DateTime msBaseDate = new DateTime(1900, 1, 1);
    private static long msBaseDateTicks = msBaseDate.Ticks;
  }
}

 

VB:

Imports Microsoft.VisualBasic
Imports System
Imports System.Collections.Generic
Imports System.Text 

''' <summary>
''' Create a Guid.COMB
''' (A COMBination of an ordinary GUID and the current datetime.)
'''
</summary>
''' <remarks>
''' This Guid generator returns sequential Guids.
''' The why's, how's, and perf tests are in
''' Jimmy Nilsson's article
''' http://www.informit.com/articles/article.asp?p=25862.
'''
<para>
''' It appears that NHibernate uses the implementation
''' described in that article and reproduced below.
''' We do not know about the perf characteristics
''' of Guid.Comb vs. Guid in databases other than
''' MS SQL Server.
'''
</para><para>
''' The current datetime should be in
''' SQL Server format. That is days after
''' 1st of Jan 1900 and the no of milliseconds
''' after midnight, divided by 3.3333.
''' It is the lowest six bytes of the GUID
''' that get exchanged for the current datetime.
'''
</para>
''' </remarks>
Public Class GuidComb 

    '********************************
    '*
    '''' <summary>Create a new Guid.Comb.<'summary>
    '''' <remarks>
    '''' Version by jtmueller, Jul 9, 2002.
    '''' <![CDATA[
    '''' http:''www.informit.com'discussion'index.asp?postid=a8275a70-0698-46f0-8c8f-bf687464628c&rl=1
    '''' ]]>
    '''' The author wonders:
    '''' "One issue is that while the resolution for a DATETIME in SQL Server is 1'300th of a second,
    '''' the resolution for a DateTime in .NET is 1'10000th of a second (100 nanoseconds).
    '''' This means that the sequential part of the COMB will change more frequently
    '''' than in the SQL Server version, but I'm not sure if this is a good or a bad thing."
    '''' On the chance that this is a problem, we've gone with the version below.
    '''' <'remarks>
    'Public Shared Function NewComb() As Guid
    '    Dim dateBytes As Byte() = BitConverter.GetBytes(DateTime.UtcNow.Ticks)
    '    Dim guidBytes As Byte() = Guid.NewGuid().ToByteArray()
    '    ' copy the last six bytes from the date to the last six bytes of the GUID
    '    Array.Copy(dateBytes, dateBytes.Length - 7, guidBytes, guidBytes.Length - 7, 6)
    '    Return New Guid(guidBytes)
    'End Function
    '*
    '******************************** 

    ''' <summary>Create a new Guid.Comb.</summary>
   
''' <remarks>
    ''' Version by glapointe, Aug 16, 2002
    ''' <![CDATA[
    ''' http://www.informit.com/discussion/index.asp?postid=a8275a70-0698-46f0-8c8f-bf687464628c&rl=1
    ''' ]]>
    ''' This one purports to "match exactly (or pretty darn close) the combs created in SQL Server."
    ''' However, I modified slightly to use UtcNow and static BaseDate values.
    ''' UtcNow gives cross-timezone ordering of records inserted with Guid.Comb keys.
    ''' </remarks>
    Private Sub New()
    End Sub  

    Public Shared Function NewComb() As Guid
        Dim guidArray As Byte() = System.Guid.NewGuid().ToByteArray() 

        Dim now As DateTime = DateTime.UtcNow ' was: DateTime now = DateTime.Now; 

        ' Get the days and milliseconds which will be used to build the byte string
        Dim days As TimeSpan = New TimeSpan(now.Ticks - msBaseDateTicks)
        Dim msecs As TimeSpan = New TimeSpan(now.Ticks - (New DateTime(now.Year, now.Month, now.Day).Ticks)) 

        ' Convert days and msecs to byte arrays
        ' SQL Server is accurate to 1/300th of a millisecond
        ' .NET DateTime ticks are in milliseconds
        ' so we divide .NET ticks by 3.333333
        Dim daysArray As Byte() = BitConverter.GetBytes(days.Days)
        Dim msecsArray As Byte() = BitConverter.GetBytes(CLng(Fix(msecs.TotalMilliseconds / 3.333333)))

        ' Reverse the bytes to match SQL Servers ordering
        Array.Reverse(daysArray)
        Array.Reverse(msecsArray) 

        ' Copy the bytes into the guid
        Array.Copy(daysArray, daysArray.Length - 2, guidArray, guidArray.Length - 6, 2)
        Array.Copy(msecsArray, msecsArray.Length - 4, guidArray, guidArray.Length - 4, 4) 

        Return New System.Guid(guidArray)
    End Function 

    ''' <summary>Extract datetime part of a Guid.Comb.</summary>
    
''' <remarks>
   
''' Please note: because I modified <see cref="NewComb"/> to use Utc datetime,
    ''' this function returns the date in Universal Time;
    ''' you may want to convert back to local time with
    ''' <code>GetDateFromComb(aGuidComb).ToLocalTime();</code>
   
''' </remarks>
    Public Shared Function GetDateFromComb(ByVal guid As System.Guid) As DateTime
        Dim daysArray As Byte() = New Byte(3) {}
        Dim msecsArray As Byte() = New Byte(3) {}
        Dim guidArray As Byte() = guid.ToByteArray() 

        ' Copy the date parts of the guid to the respective byte arrays.
        Array.Copy(guidArray, guidArray.Length - 6, daysArray, 2, 2)
        Array.Copy(guidArray, guidArray.Length - 4, msecsArray, 0, 4) 

        ' Reverse the arrays to put them into the appropriate order
        Array.Reverse(daysArray)
        Array.Reverse(msecsArray) 

        ' Convert the bytes to ints
        Dim days As Integer = BitConverter.ToInt32(daysArray, 0)
        Dim msecs As Integer = BitConverter.ToInt32(msecsArray, 0) 

        Dim [date] As DateTime = msBaseDate.AddDays(days)
        [date] = [date].AddMilliseconds(msecs * 3.333333) 

        Return [date]
    End Function 

    Private Shared msBaseDate As DateTime = New DateTime(1900, 1, 1)
    Private Shared msBaseDateTicks As Long = msBaseDate.Ticks

End Class 

Upon examing the above code, you may find yourself wondering why the date bytes are substituted at the end of the GUID value rather than at the beginning. That’s because SQL Server compares uniqueidentifier[5] values by looking at byte “groups” right-to-left, and then looking left-to-right within a byte “group”.  The byte groups are those little clusters you see delimited by hyphens (“-“) when you examine the data in a uniqueidentifier column.  For more information on the SQL Server uniqueidentifier sort process, see the following short article:

http://blogs.msdn.com/sqlprogrammability/archive/2006/11/06/how-are-guids-compared-in-sql-server-2005.aspx

 

Exercising the NewComb() Method

Finally, here’s a little program that uses the NewComb() and GetDateFromComb() methods to generate some GUIDs and then extract the datetime info from them:

C#:

using System;
using System.Collections.Generic;
using System.Text; 

namespace ConsoleApplication1 {
  class Program {
    static void Main(string[] args) {
      GenerateGuidCombs();
    } 

    private static void GenerateGuidCombs() {
      for (int i = 0; i < 10; i++) {
        Guid aGuid = GuidComb.NewComb();
        Console.WriteLine(aGuid.ToString() + "     " +
          GuidComb.GetDateFromComb(aGuid).ToString(
          "g yyyy/MM/dd  hh:mm:ss:ffffff tt"));
        System.Threading.Thread.Sleep(250);
      } 

      Console.ReadLine();   

    }
  }
} 

 

VB:

Module Module1 

    Sub Main()
        GenerateGuidCombs()
    End Sub 

    Private Sub GenerateGuidCombs()
        For i As Integer = 0 To 9
            Dim aGuid As Guid = GuidComb.NewComb()
            Console.WriteLine(aGuid.ToString() & "     " & _
              GuidComb.GetDateFromComb(aGuid).ToString( _
              "g yyyy/MM/dd  hh:mm:ss:ffffff tt"))
            System.Threading.Thread.Sleep(250)
        Next i
        Console.ReadLine()
    End Sub 

End Module 

The program produces output similar to the following. Note that the program introduces a quarter-second delay between generation of each succeeding value. You can see that 250-millisecond delay reflected in the significant digits of the seconds fraction displayed at the very end of the time value, just before the “AM”.

 

If you remove the time delay in the for loop you’ll get a bunch of identical GUIDCOMB values generated, because the program can generate many of them under the time resolution of the system clock.  Of course, you’re not likely to get records added to your database table as rapidly as you can generate GUIDCOMBs in a tight loop.  But even if you did, it would be just fine from a sorting perspective: the last record added could still be placed at the bottom of the table, so there would be no shuffling of records!

 

Using a GUIDCOMB in an Entity’s Create Method

You use a GUIDCOMB in a Create() method just as you would use a GUID: you assign it directly to the new object’s primary key column. Note that primary keys are marked ReadOnly by the DevForce Object Mapper, so you cannot go through the primary key’s setter. Instead you use the column indexer:

C#:

  public static Tomato Create(PersistenceManager pManager) {
     Tomato aTomato = pManager.CreateEntity<Tomato>();
     // Assign GUIDCOMB Id value directly...
     aTomato["Id"] = GuidComb.NewComb();
     aTomato.AddToManager();
     return aTomato;
  } 

 

VB:

   Public Shared Function Create(ByVal pManager As PersistenceManager) As Tomato
    Dim aTomato As Tomato = pManager.CreateEntity(Of Tomato)()
    ' Assign GUIDCOMB Id value directly...
    aTomato("Id") = GuidComb.NewComb()
    aTomato.AddToManager()
    Return aTomato
  End Function
 

Conclusions

To the widely acknowledged benefits of GUID keys, GUIDCOMBs add – at least when using SQL Server – a dramatic performance improvement, and a handy ordering benefit. It seems likely that a similar performance benefit will be experienced on other database platforms as well, but as Nilsson says, you should always test in your own environment to get results you can bank on.

The only apparent downside of GUIDCOMBS is an increased chance (relative to straight GUIDs) of producing a duplicate; however, the probability of that remains exceedingly small.  (See Nilsson’s comments on this in his article.)   Looks like a winner!



[1] pronounced “goo’-id” or “gwid”: your choice

[2] Okay, nobody really guarantees that GUID values will be unique, but there is widespread agreement that the odds of getting duplicates are exceedingly slim. Here, for example, is a nice little discussion about those odds:  http://www.msnewsgroups.net/group/microsoft.public.dotnet.languages.csharp/topic3363.aspx . Or just search the web for “duplicate guid values” or some such to find your own discussions.

[3] In a clustered index, the table rows are physically sorted in the order of the values in the indexed column or columns. As such, a given table can support only one clustered index. In SQL Server, by default, primary key indexes are clustered. 

[4] Note that the “last record added”, as determined from the GUIDCOMB, refers to the last record created – wherever – rather than to the record most recently inserted into the database. Sally Sue may create a new Order locally at 10 AM sharp on Tuesday, but delay committing it to the database until 10:30 AM, or until the next day, or the next week. Meanwhile, other records created later than Tuesday 10 AM may have been committed by their creators.

[5] “Uniqueidentifier” is the name SQL Server gives to a column of type GUID.