Friday, April 3, 2009

"Average" Date

Have you ever tried to calculate the average of a datetime column, and gotten this error:

"Msg 8117, Level 16, State 1, Line 1
Operand data type datetime is invalid for avg operator."

When you first think about it, the error makes sense - are you trying to determine the average month, year, hour, or second? But shouldn't there be such a thing as an "average" date? If we have a bunch of sales orders in a given month, doesn't the "average" date of sale actually mean something?

What if I calculated the MIN value, then calc'd the DATEDIFF between the MIN and all other values? At that point I'd essentially have an integer value, which of course I could average, and then derive the "average" date:

CvrtToDate AS (
/* "DataValue" is assumed to be varchar(max) */
DataValue = CONVERT(datetime, DataValue)
FROM DataSet
WHERE ISDATE(DataValue) = 1
,MinAndMax AS (
ValueMin = MIN(DataValue)
,ValueMax = MAX(DataValue)
FROM CvrtToDate
,DateDiffs AS (
DaysFromMin = DATEDIFF(d, MinAndMax.ValueMin, DataValue)
FROM CvrtToDate, MinAndMax
,AvgDaysFromMin AS (
SELECT DataValue = AVG(DaysFromMin)
FROM DateDiffs
AvgDate = DATEADD(d, AvgDaysFromMin.DataValue, MinAndMax.ValueMin)
FROM MinAndMax, AvgDaysFromMin

This query bears a result that makes sense - we have a date that is between the oldest, and most recent, that is somewhere near the midway point.

A little Google research bears fruit for a much simpler calculation. From "Ask Ben: Averaging Date/Time Stamps In SQL": "The secret to date/time averaging is that date/time stamps can be represented as a floating point number. I have covered this a number of times on this blog so I won't go into too much detail, but the idea is that as a floating point number, the integer part represents the number of days since the beginning of time (as the SQL server defines it) and the decimal part represents the time or rather, the fraction of days. SQL does not make this conversion for you; you have to CAST the date/time stamp as a FLOAT value."

This leads to the revised calculation:

ValueMin = MIN(CONVERT(datetime, DataValue))
,ValueMax = MAX(CONVERT(datetime, DataValue))
,ValueAvg = CONVERT(datetime, AVG(CONVERT(float,
CONVERT(datetime, DataValue))))
FROM DataSet
WHERE ISDATE(DataValue) = 1

Not only is this calc far simpler, but it is slightly more precise, as it includes time-of-day in the result.

No comments:

Post a Comment