Where have you been all my life?

Every once in a while a technology comes along that once you have become accustomed to it you wonder how you ever got by without it. For me the ‘stand out by a mile’ example of this is of course LINQ and lambda which if I do not use within any given day means that I’m probably on holiday or procrastinating.

Sometimes, and much more rarely a tech comes to my attention that has been in existence for many years and once used I think…

‘Man, where have you been all my life?’

In such circumstances I feel cheated! Just such an example came across my desk this week and I feel compelled to share it, especially as my coworker had also never used this tech. He too immediately saw how he could use it in a current project.

The technology is called ‘Common Table Expressions’ or CTE and has been around since SQL Server 2008.

I know right!!!

Essentially this is a table structure within SQL Server that can be self referencing and so is perfect for examining and returning data that is of a hierarchical nature. Lets dive straight in with my very simple, but real world sample. I have a database where one table called IndexCategories contains potentially infinite hierarchical data. There may be many root records each with many child records, each of those with many child records ad infinitum. The table is structured thus:-

IndexCategories

  • ID
  • Deleted
  • UpdateCount
  • Metric_ID
  • Parent_ID – Links to the parent of this child record, null indicates a root record

 

The key thing about a CTE is that can be self referencing which is so amazingly flexible I can hardly contain myself, but I’ll try. So my CTE is defined as below

WITH HeirarchicalIndexCategories(ID, Deleted, UpdateCount,Metric_ID,Parent_ID, Level)

AS
(
— Anchor member definition
    SELECT m.ID, m.Deleted, m.UpdateCount, m.Metric_ID ,m.Parent_ID,
        0 AS Level
    FROM dbo.IndexCategories AS m
    WHERE Parent_ID IS NULL
    UNION ALL
— Recursive member definition
    SELECT m.ID, m.Deleted, m.UpdateCount, m.Metric_ID ,m.Parent_ID,
        Level+1
    FROM dbo.IndexCategories AS m
    INNER JOIN HeirarchicalIndexCategories AS h
        ON m.Parent_ID=h.ID
)

— Statement that executes the CTE

SELECT ID, Deleted, UpdateCount, Metric_ID ,Parent_ID,Level
FROM HeirarchicalIndexCategories
END

So walking through this we start with the definition of the CTE itself, we start with the name of HeirarchicalIndexCategories. The arguments list are not strictly an arguments list at all but are instead a declaration of columns that the CTE surfaces to the outside world.

WITH HeirarchicalIndexCategories(ID, Deleted, UpdateCount,Metric_ID,Parent_ID, Level)

We then come to the first part of the SQL, nothing complicated here just a standard SELECT on the IndexCategories table. This is however the ‘anchor’ part of the query and as such ONLY returns IndexCategories that have no parent records. Thus we constrain on only records where the ParentID is null. We also pass in a zero value for the Level to indicate that it is at the very top of the ‘tree’.

— Anchor member definition
    SELECT m.ID, m.Deleted, m.UpdateCount, m.Metric_ID ,m.Parent_ID,
        0 AS Level
    FROM dbo.IndexCategories AS m
    WHERE Parent_ID IS NULL

Next comes the real magic, the ‘recursion’ part of the query. Again this is fairly simple and is defined thus

    SELECT m.ID, m.Deleted, m.UpdateCount, m.Metric_ID ,m.Parent_ID,
        Level+1
    FROM dbo.IndexCategories AS m
    INNER JOIN HeirarchicalIndexCategories AS h
        ON m.Parent_ID=h.ID

Nothing too taxing here, as you can see we are returning the same data with the exception that we add 1 to the Level value returned for the ‘Parent’ Index Category. They key part of this statement is however that we are joining to the results of HeirarchicalIndexCategories so in the first instance we will be joining to the root rows on the parent id linkage, and on the second call we would be joining  to any ‘Level 1’ results. This continues ad infinitum until there are no results returned at which point the recursion will unwind itself.

A note of caution, as this of course uses recursion you need to ensure that you structure your call correctly otherwise you will fall foul of unbounded recursion errors. There is, I believe an optimisation hint (MAXRECURSION ) you can use to prevent this from occurring but I think that if you need it you may already be lost!

All that remains is to call the CTE itself, here I think for me is the one real weakness of these structures, You can only consume them once, I guess you can get round that by putting them inside a stored procedure but why do I HAVE to! We call our CTE thus:-

SELECT ID, Deleted, UpdateCount, Metric_ID ,Parent_ID,Level
FROM HeirarchicalIndexCategories

Simple! You can of course use this like any other table, join to it, constrain it , group etc. As yet I have no benchmarks for performance and have not yet read no articles concerning this. I’d be keen for any feedback more experienced users than I may have on using them in the wild.

I hope this helps.