Caching results in SQL Server

I have a few SQL scripts that provide data to be displayed on our home page. These queries calculate data over the past 4 months and display as a chart for the users to get a quick overview of the information. The problem is these queries take over 10 seconds to run. Obviously this would be a problem. The usual way to deal with this is create a table that has the data you want displayed (the cache) and you update the table in a background process, such as a SQL agent job. When searching for a way to do this online this was the solution everybody recommended and ended up using. My problem is I didn’t want to create a whole bunch of tables to hold the cached data for each one. I needed a way to basically have a “cache” table that holds the results for various queries, each with different types of data (different column types/values, etc.).

What I ended up using was a combination of the FOR XML and OPENXML commands and a single “blob” table to store the data with a timestamp for when it was last updated. Since it will probably be easier to just see an example I will give a short example of how to go about doing this.

DECLARE @Results TABLE (Ordering INT,
    Campus VARCHAR(30),
    RealDate DATETIME,
    HeadCount INT)
SET @guid = '00000000-0000-0000-0000-000000000000'

-- Check for cached data, only use it if it is less than 2 hours old.
IF EXISTS (SELECT * FROM util_cache WHERE [guid] = @guid AND date_modified >= DATEADD(HOUR, -2, GETDATE()))
    DECLARE @xdoc AS nvarchar(max)

    SELECT @xdoc = blob FROM util_cache WHERE [guid] = @guid

    EXEC sp_xml_preparedocument @doc OUTPUT, @xdoc
    SELECT * FROM OPENXML(@doc, N'/root/Result') WITH (Ordering INT, Campus VARCHAR(30), RealDate DATETIME, HeadCount INT)
    EXEC sp_xml_removedocument @doc

-- No cache found, populate @Results with the data that takes awhile to build...
INSERT INTO @Results ...

-- Save in cache
        IF NOT EXISTS (SELECT * FROM util_cache WHERE [guid] = @guid)
            INSERT INTO util_cache ([guid], date_modified, blob)
                VALUES (@guid, GETDATE(), (SELECT * FROM @Results FOR XML RAW ('Result'), ROOT))
            UPDATE util_cache SET
                 date_modified = GETDATE()
                ,blob = (SELECT * FROM @Results FOR XML RAW ('Result'), ROOT)
                WHERE [guid] = @guid

-- Select data for display
SELECT * FROM @Results

So what we are doing is declaring in in-memory table and GUID to use. The in-memory table is what is used to build the actual data if the cache is not valid. The GUID is used to uniquely identify this cache object.

First we check the util_cache table to see if a valid entry for our guid exists. If it does, we take the nvarchar(max) value and throw it into the XML code to convert it back to a SQL result set. The sp_xml_preparedocument stored procedure basically opens the XML stream and parses it into something that OPENXML can use. The sp_xml_removedocument stored procedure closes the XML stream and frees up memory. In the middle is the OPENXML function which takes the XML document and an XML query for which data to return, in our case we want all the Result records under the root element. The WITH clause tells it how to format the data, in this case we just match the table format that we used to store it.

So if the cache data was not found or was expired, we need to build new results. The reason I do everything inside a TRY/CATCH block is so that if something goes wrong with query instead of getting an error page the user just gets and empty result set (i.e. a blank graph). Then the reason I do it in a transaction is to lock the util_cache table (if I understand how things work) so we don’t have 2 threads trying to INSERT at the same time. Converting data into XML is as simple as adding the FOR XML RAW (‘RecordName‘), ROOT clause to the end of the SELECT statement. RecordName is the name of the node to generate for each record, this needs to match what you use in the OPENXML statement.

In reality the setup I have is pretty similar to the above except that everything is inside a stored procedure that accepts a parameter for how old the cached data can be before it is considered stale. In my code that runs on the homepage I use 36 hours (the data doesn’t change that often, so this is fine). Then I have a SQL agent job run every 24 hours that calls the same stored procedure with and passes it a value of 0 which forces the data to be re-build immediately. So the user should never notice a delay, but if the sql job fails to run a single user might notice a 10 second delay once a day.

Leave a Reply

Your email address will not be published. Required fields are marked *