In a previous posting, I analyzed the comparison algorithm to determine its complexity, and found it to be of cubic order. The next step is to test the actual performance of the code and see if my estimate was correct. To do this, I wrote a testing script that tests the code several times under several different input sizes (this code can be found here). Using the table in the testing script, I ran a couple more queries to reformat the results to be more suitable for my specific code:
NextInputSize = InputSize + 50.0,
NumberOfRuns = COUNT(*),
AveragePercentMatch = AVG(PcntMatch),
AverageRunTime = AVG(CONVERT(decimal(9, 2), CONVERT(varchar(max), DATEDIFF(ss, RunStart, RunDone)) + '.' + CONVERT(varchar(max), DATEDIFF(ms, RunStart, RunDone))))
GROUP BY InputSize
ExpectedNextAverageRunTime = POWER(A.NextInputSize / A.InputSize, 3.0) * A.AverageRunTime
FROM #AggregatedPerformanceResults A
JOIN #AggregatedPerformanceResults B ON B.InputSize = A.NextInputSize
and here are the results I saw:
The 'ExpectedNextAverageRunTime' values are, as the name indicates, the expected average time of the next input size based on the results of the given input size. This is computed by computing the percentage difference between the given and next input size, cubing it, and multiplying the result by the given average run time. For InputSize = 50, the percentage difference between 100 and 50 is 100/50 = 2, 2 to the third power is 8, times the average run time of 4.618 seconds is 36.944 seconds. Notice that differs only slightly from the actual average run time for InputSize of 100, which is 37.27 seconds. Comparing subsequent expected versus actual average run times shows a similar consistency, which confirms the analysis of the previous posting, that the matching algorithm is of cubic order of complexity.
Next: optimizing the matching algorithm.