What Time Is It?

This is a problem that I am sure that all business intelligence developers face at one point or another:  how to move the clock forward or background by business hours or business days.  Moving the clock forward by business days is not as simple as DATEADD(dd, 2, @StartDate).  SQL Server does not have any built-in intelligence for business time and business days, and besides, how could it?  Each business has their own schedule.  For example, my company works a four day work week, and has skeleton operations on Fridays, but Fridays are not considered a business day.

 The Environment

In my environment, I am moving IT ticket data from SolarWinds’ help desk product into a data warehouse.  In my data warehouse, I have a Date dimension table and a Time of Day dimension table.  Both tables have indicators that indicate if the current row represents a date or time that is within business hours.  Since we only run one shift, we don’t need to worry about multiple shifts.  Employees stagger their lunch breaks, so that there is no break in the middle of the day for lunch.

The Date table is set to have one row equal to one day.  The Time of Day table is set to have one row equal one minute.

My Initial Approach

The first attempt to solve this problem was a scalar valued function that uses a CROSS APPLY statement to create a table in-memory that combines the Date and Time of Day tables such that there is one row for each business minute within each business day.  The actual query that I am using to accomplish this is below:

SELECT @StartRow = RowNum FROM
SELECT ROW_NUMBER OVER(ORDER BY d.DateKey, t.TimeOfDayKey) AS RowNum, t.HourOfDay, t.MinuteOfHour, d.FullDate
FROM dbo.[Time of Day Dimension] AS t
(SELECT DateKey, Fulldate FROM dim.Date WHERE BusinessDay = 'Business Day') AS d
WHERE Schedule = 'During Business Hours'
) AS TimeRows
WHERE FullDate = @StartDate AND HourOfDay = @StartHour AND MinuteOfHour = @StartMinute

Ok, so there is a lot going on in this query.  Let’s step through the logic of this function before diving too deep into the code.

First, the function takes start date and time in its component parts:  date, hour, and minute.  The function also takes the same arguments for the duration to move the clock forward.  The function then creates a giant table in memory consisting of only business days and business hours, then numbers each row.  It finds the starting row number, and then converts every duration argument into business minutes, adds it to the row number of the starting row, then returns the component parts of the row it lands on.

Now that we have the logic, let’s break this query down.  This query contains nested sub-queries, which is a terrible design in my opinion.  However, the road to good designs is paved with the iterations of bad designs, right?  Let’s break this down working from the inner query and stepping outward.

SELECT DateKey, Fulldate FROM dim.Date WHERE BusinessDay = 'Business Day'

This query is returning a data set of the key from the date table, and the date stored in the FullDate.  This sub-query acts as the right hand table in the CROSS APPLY statement.

SELECT ROW_NUMBER OVER(ORDER BY d.DateKey, t.TimeOfDayKey) AS RowNum, t.HourOfDay, t.MinuteOfHour, d.FullDate
FROM dbo.[Time of Day Dimension] AS t
(SELECT DateKey, Fulldate FROM dim.Date WHERE BusinessDay = 'Business Day') AS d
WHERE Schedule = 'During Business Hours'
) AS TimeRows

This query does the heavy lifting.  It is executed as a sub-query in the FROM clause of the outermost query.  It’s mission is to create a table where one hour is equal to one business minute.  The WHERE clauses in both queries filter out the rows that do not correspond to times within business hours.  The CROSS APPLY statement is used because we are using table expressions and not actual tables.  While this does create a look-up table for us to use to make our date calculations, it is terribly inefficient as it has to be run twice for each function execution – once to get the starting row number and once to retrieve the data after we advance our row number.  Since we are using a scalar-valued function, that means the function runs once for each service ticket, which equates to 42,000 executions of the function or 84,000 executions of this expensive query.  Yikes!

The outer most query in this example simply retrieves the row number of the row matching the start date and time using the second query as the basis for the FROM clause.

Second Iteration, Same as the First

The biggest problem with this function so far is that it has to execute an expensive query twice, leading to high run times.  In this iteration, my goal is to store the sub-query used as the FROM clause in a table variable so we only have to execute it once.  I think the final end result is going to be a look-up table of sort, but let’s step up to that and see if the table variable can get us to where we need to be.

First, we declare our table variable and give it our structure with this code.

DECLARE @BusinessTime table (RowNum int, HourOfDay int, MinuteOfHour int, FullDate date)

Our table variable will hold four columns and will allow us to return the data in the composite parts required.

Now, we fill our table variable.

INSERT @BusinessTime
(RowNum, HouOfDay, MinuteOfHour, FullDate)
ROW_NUMBER() OVER(ORDER BY d.DateKey, t.TimeOfDayKey) AS RowNum,t.HourOfDay, t.MinuteOfHour, d.FullDate
FROM dbo.[Time of Day Dimension] AS t
(SELECT DateKey, FullDate FROM dim.Date WHERE BusinessDay = 'Business Day') AS d

Even with this change, we still have 42,000 executions of a very expensive query.  Our target window is less than 8 minutes of execution time, as our ETL job that encompasses this calculation has to run every 15 minutes to update the data warehouse.  Given this, I think it makes the most sense to create a physical table to act as our lookup table.

Third Time’s the Charm — or not

In this iteration, we are going to move the expensive query out of the function and turn it into a table.  Our first step is to build a table to hold our result columns from the expensive query.

CREATE TABLE [Lookup].[Business Minutes]
(RowNum int not null, HourOfDay int not null, MinuteOfHour int not null, FullDate date not null)

Now we fill our table with our expensive INSERT statement above that we used to fill our table variable in the second iteration.  The result is a table with 5,647,620 rows containing every business minute from 1/3/2000 through 12/31/2040.  Since I haven’t yet gone through and flagged the holidays out within the date table, this table erroneously includes holidays.  However, that is a problem for a different day.

The last step is to alter the function code to look-up dates against this table and test it in the ETL query.  Below is the new query to retrieve the starting row in our business time table.

SELECT @StartRow = RowNum FROM [Lookup].[Business Minutes] WHERE FullDate = @StartDate AND HourOfDay = @StartHour AND HourOfMinute = @StartMinute

Much cleaner than the first iteration of that query, don’t you think?  After this query we can simply add our number of minutes and then query the same table with the new row number to get the date parts, then use DATETIMEFROMPARTS to return a datetime result to the ETL query.  At least, that’s what I thought.  A test run of the ETL query resulted in the query running past the 8 minute execution window.

Looking at the execution plan I see a missing index on my staging table that could improve query performance by 99%.  I’ll gladly take that by simply implementing an index.  After executing the command suggested by SQL SERVER to create the index, I test run the query again and find that it’s returning 706 rows after two minutes.  If the number of rows returned follows a linear curve, then we only have 2,824 rows out of 42,000 rows after our max execution time.  Another look at the execution plan shows me another index to implement.

Success!  After implementing this second index, run time has fallen to 11 seconds.  Now we just need to sort out the data accuracy – our function is returning too many null values.  Every row is a datetime column listing the time at which a ticket is created, and our function is adding business time to that so theoretically, there should be no null values.

Sorting out the Accuracy Issues

My accuracy problems stemmed from a use case I failed to consider – what happens when I process a record that was opened outside of business hours?  The current logic targets one row by looking for date, hour and minute to match what was provided, but if a time outside of business hours was provided than there would be no rows returned, resulting in null value.  By changing the query to use >= for the criteria, and a TOP (1) ordered by row number, we return the next business time row from the supplied inputs.

SELECT TOP (1) @StartRow = RowNum FROM [Lookup].[Business Minutes] AS d WHERE FullDate >= @StartDate AND HourOfDay >= @StartHour AND MinuteOfHour >= @StartMinute

We can rely on RowNum because it was ordered over the date, hour, and minute sort scheme when we loaded the table.

Lessons Learned

In solving this problem, I learned two important lessons.  First, when utilizing large look-up tables of data that doesn’t change, it’s best to capture it in a physical table.  Second, indexes are critical to high performing queries and large data sets.

If you have any other pointers or want to share your approach to solving this problem, please comment below!  I’d love to hear your opinion and yes, I know that’s a dangerous thing to say on the Internet.