Tuesday, December 24, 2019

Complicated Comments

This is a simpler version of a post I made 10 years ago.


DECLARE @x int

/*

-- */

SET @x = 1

-- /*

--*/ SET @x = 2

PRINT @x





Question: after running the statement above, what will the output be?

  1. 1
  2. 2
  3. Null
  4. An error occurs.

Monday, December 16, 2019

Removing 100% Duplicates

In a table that contains duplicates as identified by some business rule (same name/address, etc.), but the rows included in the sets of duplicates are actually 100% the same, it's impossible to identify a "survivor" of a de-duplication process that deletes all but one of the rows in each set. There is a way in T-SQL to do this, by using the TOP(n) clause after the DELETE. It's kind of unusual, and you should be careful with it. Here's an example:

CREATE TABLE #src (x int, y int, z int)
GO
INSERT #src VALUES (1, 2, 3)
INSERT #src VALUES (1, 2, 3)
INSERT #src VALUES (1, 2, 3)
INSERT #src VALUES (10, 20, 30)
INSERT #src VALUES (100, 200, 300)
GO

SELECT * FROM #src

At this point, #src looks like this (I cannot seem to remove the odd white space in this table):
x   y   z
1   2   3
1   2   3
1   2   3
10  20  30
100 200 300


Now let's remove our duplicates:
DELETE TOP(2)
FROM #src
WHERE x = 1


SELECT * FROM #src

And we see the results:
x   y   z
1   2   3
10  20  30
100 200 300


In the DELETE TOP(n) part of the statement, n=#dupes - 1. In our case, we had 3 duplicates and removed 2 of them. Which brings up an important point - we used the "x=1" condition to identify the group, and we knew that it had exactly 3 rows in that dupe group. So if we generalize the problem, we may want to use dynamic SQL to build a solution:

; WITH DupeGroups AS (
 SELECT
  [x]
  ,[cnt] = COUNT(*)
 FROM #src
 GROUP BY [x]
 HAVING COUNT(*) > 1
)
SELECT

 [sql] = 'DELETE TOP(' + CONVERT(varchar, [cnt] - 1)
 + ') FROM #src WHERE [x] = ' + CONVERT(varchar, [x]) + ';'
FROM DupeGroups


This can then be executed using sp_SqlExec to remove the duplicates.