I have a few SQL scripts that provide data to be displayed on our home page. These queries calculate data over the past 4 months and display as a chart for the users to get a quick overview of the information. The problem is these queries take over 10 seconds to run. Obviously this would be a problem. The usual way to deal with this is create a table that has the data you want displayed (the cache) and you update the table in a background process, such as a SQL agent job. When searching for a way to do this online this was the solution everybody recommended and ended up using. My problem is I didn’t want to create a whole bunch of tables to hold the cached data for each one. I needed a way to basically have a “cache” table that holds the results for various queries, each with different types of data (different column types/values, etc.).
What I ended up using was a combination of the FOR XML and OPENXML commands and a single “blob” table to store the data with a timestamp for when it was last updated. Since it will probably be easier to just see an example I will give a short example of how to go about doing this.
DECLARE @Results TABLE (Ordering INT,
Campus VARCHAR(30),
RealDate DATETIME,
HeadCount INT)
DECLARE @guid UNIQUEIDENTIFIER
SET @guid = '00000000-0000-0000-0000-000000000000'
-- Check for cached data, only use it if it is less than 2 hours old.
IF EXISTS (SELECT * FROM util_cache WHERE [guid] = @guid AND date_modified >= DATEADD(HOUR, -2, GETDATE()))
BEGIN
DECLARE @doc AS INT
DECLARE @xdoc AS nvarchar(max)
SELECT @xdoc = blob FROM util_cache WHERE [guid] = @guid
EXEC sp_xml_preparedocument @doc OUTPUT, @xdoc
SELECT * FROM OPENXML(@doc, N'/root/Result') WITH (Ordering INT, Campus VARCHAR(30), RealDate DATETIME, HeadCount INT)
EXEC sp_xml_removedocument @doc
RETURN
END
-- No cache found, populate @Results with the data that takes awhile to build...
INSERT INTO @Results ...
-- Save in cache
BEGIN TRY
BEGIN TRANSACTION
IF NOT EXISTS (SELECT * FROM util_cache WHERE [guid] = @guid)
BEGIN
INSERT INTO util_cache ([guid], date_modified, blob)
VALUES (@guid, GETDATE(), (SELECT * FROM @Results FOR XML RAW ('Result'), ROOT))
END ELSE BEGIN
UPDATE util_cache SET
date_modified = GETDATE()
,blob = (SELECT * FROM @Results FOR XML RAW ('Result'), ROOT)
WHERE [guid] = @guid
END
COMMIT TRANSACTION
END TRY
BEGIN CATCH
END CATCH
-- Select data for display
SELECT * FROM @Results
So what we are doing is declaring in in-memory table and GUID to use. The in-memory table is what is used to build the actual data if the cache is not valid. The GUID is used to uniquely identify this cache object.
First we check the util_cache table to see if a valid entry for our guid exists. If it does, we take the nvarchar(max) value and throw it into the XML code to convert it back to a SQL result set. The sp_xml_preparedocument stored procedure basically opens the XML stream and parses it into something that OPENXML can use. The sp_xml_removedocument stored procedure closes the XML stream and frees up memory. In the middle is the OPENXML function which takes the XML document and an XML query for which data to return, in our case we want all the Result records under the root element. The WITH clause tells it how to format the data, in this case we just match the table format that we used to store it.
So if the cache data was not found or was expired, we need to build new results. The reason I do everything inside a TRY/CATCH block is so that if something goes wrong with query instead of getting an error page the user just gets and empty result set (i.e. a blank graph). Then the reason I do it in a transaction is to lock the util_cache table (if I understand how things work) so we don’t have 2 threads trying to INSERT at the same time. Converting data into XML is as simple as adding the FOR XML RAW (‘RecordName‘), ROOT clause to the end of the SELECT statement. RecordName is the name of the node to generate for each record, this needs to match what you use in the OPENXML statement.
In reality the setup I have is pretty similar to the above except that everything is inside a stored procedure that accepts a parameter for how old the cached data can be before it is considered stale. In my code that runs on the homepage I use 36 hours (the data doesn’t change that often, so this is fine). Then I have a SQL agent job run every 24 hours that calls the same stored procedure with and passes it a value of 0 which forces the data to be re-build immediately. So the user should never notice a delay, but if the sql job fails to run a single user might notice a 10 second delay once a day.