<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-5937947540405767218</id><updated>2011-12-28T19:57:33.819Z</updated><category term='Data Profiler'/><category term='Median'/><category term='Metadata'/><category term='SQL Gurus'/><category term='introduction'/><category term='ETL'/><category term='SQL'/><category term='admin'/><category term='Powershell'/><category term='CRM'/><category term='DW'/><category term='Complexity Analysis'/><category term='TSQL Challenge'/><category term='Query Performance'/><category term='Statistics'/><category term='Data Patterns'/><category term='101'/><category term='XML'/><category term='CEP'/><category term='Jesse McLain'/><category term='Big O Notation'/><category term='file import'/><category term='CTE'/><category term='Text Matching'/><category term='Mathematical Series'/><category term='BI'/><category term='Links'/><category term='DBA'/><category term='Celko'/><category term='probability'/><category term='Articles'/><category term='SSC'/><category term='hardware'/><category term='recursion'/><category term='code review'/><category term='Books'/><title type='text'>Jesse's SQL Blog</title><subtitle type='html'>Focuses on SQL Server, especially T-SQL development and programming, with the aim of presenting projects and problems in SQL Server and the thought process behind delivering a solution.</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>90</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-4165363325207226202</id><published>2011-12-28T19:51:00.003Z</published><updated>2011-12-28T19:57:33.828Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='SQL'/><title type='text'>Parsing Stored Procedure Result Set</title><content type='html'>One thing I've always wanted to be able to do is to parse the result set(s) of a stored procedure, so that I can easily create a structure to contain it's output. This &lt;a href="http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=70626"&gt;thread on sqlteam.com&lt;/a&gt; details a way to capture the output without knowing the structure ahead of time. This is done by using a linked server that loops back to itself, and then runs this statement:&lt;br /&gt;&lt;br /&gt;select * into #t from openquery(loopback, 'exec yourSproc')&lt;br /&gt;&lt;br /&gt;to select the results into a temp table, the metadata of which can then be parsed as needed. This leads to the 'pie in the sky' idea of a stored proc that, given the name of another stored proc, will output a CREATE TABLE script that mirrors the result set of the second stored proc.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-4165363325207226202?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/4165363325207226202/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2011/12/parsing-stored-procedure-result-set.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/4165363325207226202'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/4165363325207226202'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2011/12/parsing-stored-procedure-result-set.html' title='Parsing Stored Procedure Result Set'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-257900166771957420</id><published>2011-12-28T19:39:00.002Z</published><updated>2011-12-28T19:42:50.889Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='SQL'/><title type='text'>Removing Trailing Zeroes</title><content type='html'>I recently encountered the classic problem of removing trailing zeroes from formatted numeric output, and after mocking up some complicated code and doing some research, I came across a &lt;a href="http://database.ittoolbox.com/groups/technical-functional/sql-server-l/how-to-remove-trailing-zeros-264879"&gt;thread that says to convert it to float before varchar&lt;/a&gt;. It's a perfect solution that is simple and works correctly.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-257900166771957420?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/257900166771957420/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2011/12/removing-trailing-zeroes.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/257900166771957420'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/257900166771957420'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2011/12/removing-trailing-zeroes.html' title='Removing Trailing Zeroes'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-310095815818526996</id><published>2011-07-13T16:22:00.001+01:00</published><updated>2011-07-13T16:24:30.786+01:00</updated><title type='text'>View EPs Neatly</title><content type='html'>Here's some code to view Extended Properties of a table in a neat denormalized report:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;-- replace 'TABLENAME' with the name of your table:&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;DECLARE @ObjId int; SET @ObjId = OBJECT_ID('TABLENAME')&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;DECLARE @sql varchar(max)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;SET @sql = &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;'SELECT&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;    TableName = T.Name&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;    ,ColumnName = C.Name&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;'&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;; WITH TableNameBase AS (&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;    SELECT *&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;    FROM SYS.TABLES &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;    WHERE OBJECT_ID = @ObjId&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;), PropNames0 AS (&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;    SELECT DISTINCT Name &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;    FROM SYS.EXTENDED_PROPERTIES&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;    WHERE Major_Id = (SELECT [OBJECT_ID] FROM TableNameBase)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;    AND Minor_Id &amp;gt; 0&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;), PropNames AS (&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;    SELECT Name, NameOrder = ROW_NUMBER() OVER (ORDER BY Name)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;    FROM PropNames0&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;) &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;SELECT * INTO #PropNames FROM PropNames&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;SELECT @sql = @sql + &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;'    ,[' + P.Name + '] = ISNULL(P' + LTRIM(STR(P.NameOrder)) + '.Value, '''')&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;'&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;FROM #PropNames P&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;SET @sql = @sql + &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;'FROM SYS.TABLES T&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;JOIN SYS.COLUMNS C&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;    ON C.OBJECT_ID = T.OBJECT_ID&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;'&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;SELECT @sql = @sql + &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;'LEFT JOIN SYS.EXTENDED_PROPERTIES P' + LTRIM(STR(P.NameOrder)) + '&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;    ON P' + LTRIM(STR(P.NameOrder)) + '.Major_Id = T.OBJECT_ID&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;    AND P' + LTRIM(STR(P.NameOrder)) + '.Minor_Id = C.Column_Id&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;    AND P' + LTRIM(STR(P.NameOrder)) + '.Name = ''' + P.Name + '''&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;'&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;FROM #PropNames P&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;SET @sql = @sql + &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;'WHERE T.OBJECT_ID = ' + LTRIM(STR(@ObjId))&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;PRINT @sql&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;EXEC(@sql)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;DROP TABLE #PropNames&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-310095815818526996?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/310095815818526996/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2011/07/view-eps-neatly.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/310095815818526996'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/310095815818526996'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2011/07/view-eps-neatly.html' title='View EPs Neatly'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-755512966910759747</id><published>2011-06-09T16:17:00.000+01:00</published><updated>2011-06-09T16:19:15.499+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='SQL'/><title type='text'>Query to List Tables and their Primary Keys</title><content type='html'>This query produces a resultset of two columns: every table in the current database, and the corresponding primary key expression if one exists:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;; WITH Base AS (&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;    SELECT&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;        TABLE_NAME = QUOTENAME(T.TABLE_SCHEMA) + '.' + QUOTENAME(T.TABLE_NAME)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;        ,C.Column_Name&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;        ,C.ORDINAL_POSITION&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;    FROM INFORMATION_SCHEMA.TABLES T&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;    LEFT JOIN INFORMATION_SCHEMA.TABLE_CONSTRAINTS PK&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;        ON PK.TABLE_NAME = T.TABLE_NAME&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;        AND PK.TABLE_SCHEMA = T.TABLE_SCHEMA&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;        AND PK.CONSTRAINT_TYPE = 'PRIMARY KEY'&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;    LEFT JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE C&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;        ON C.CONSTRAINT_NAME = PK.CONSTRAINT_NAME&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;        AND C.TABLE_NAME = PK.TABLE_NAME&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;        AND C.TABLE_SCHEMA = C.TABLE_SCHEMA&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;), OrderIt AS (&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;    SELECT&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;        TABLE_NAME&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;        ,Column_Name = CONVERT(varchar(max), Column_Name)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;        ,ORDINAL_POSITION&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;        ,RecurseVar = ROW_NUMBER() OVER (PARTITION BY TABLE_NAME ORDER BY ORDINAL_POSITION)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;    FROM Base&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;), Recursed AS (&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;    SELECT&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;        Table_Name&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;        ,Column_Name&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;        ,RecurseVar&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;    FROM OrderIt&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;    WHERE RecurseVar = 1&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;    UNION ALL &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;    SELECT&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;        B.Table_Name&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;        ,Column_Name = RTRIM(R.Column_Name) + ', ' + B.Column_Name&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;        ,B.RecurseVar&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;    FROM Recursed R&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;    JOIN OrderIt B&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;        ON B.Table_Name = R.Table_Name &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;        AND B.RecurseVar = R.RecurseVar + 1&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;), GetMax AS (&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;    SELECT&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;        Table_Name&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;        ,MAX_RecurseVar = MAX(RecurseVar)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;    FROM Recursed&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;    GROUP BY Table_Name&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;), Results AS (&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;    SELECT &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;        R.Table_Name&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;        ,Primary_Key = R.Column_Name&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;    FROM Recursed R&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;    JOIN GetMax G&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;        ON G.Table_Name = R.Table_Name&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;        AND G.MAX_RecurseVar = R.RecurseVar&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;SELECT * FROM Results&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;ORDER BY Table_Name&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-755512966910759747?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/755512966910759747/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2011/06/query-to-list-tables-and-their-primary.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/755512966910759747'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/755512966910759747'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2011/06/query-to-list-tables-and-their-primary.html' title='Query to List Tables and their Primary Keys'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-1019445496240342554</id><published>2011-05-17T15:45:00.002+01:00</published><updated>2011-05-17T22:17:15.879+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='XML'/><category scheme='http://www.blogger.com/atom/ns#' term='SQL'/><title type='text'>XML SQL - Probing Depth of Document</title><content type='html'>Now let's say that you want to know how many levels deep this relationship goes. Without knowing the answer ahead of time, we have to use a recursive query to determine this:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;; WITH XML_Doc AS (&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;   SELECT&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;       id&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;       ,parentid&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;       ,[level] = 1&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;       ,nodetype&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;       ,localname&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;       ,prev&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;       ,text&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;   FROM #tmp2&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;   WHERE LocalName = 'RootNode'&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;   UNION ALL&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;   SELECT&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;       T.id&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;       ,T.parentid&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;       ,[level] = R.[level] + 1&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;       ,T.nodetype&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;       ,T.localname&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;       ,T.prev&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;       ,T.text&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;   FROM XML_Doc R&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;   JOIN #tmp2 T&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;       ON R.Id = T.ParentId&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;SELECT *&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;INTO #XML_Doc&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;FROM XML_Doc&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-1019445496240342554?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/1019445496240342554/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2011/05/xml-sql-probing-depth-of-document.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/1019445496240342554'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/1019445496240342554'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2011/05/xml-sql-probing-depth-of-document.html' title='XML SQL - Probing Depth of Document'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-711215360949963105</id><published>2011-05-17T15:32:00.006+01:00</published><updated>2011-05-17T15:45:31.196+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='XML'/><category scheme='http://www.blogger.com/atom/ns#' term='SQL'/><title type='text'>XML Parsing with Recursive Table Structure</title><content type='html'>After you load the XML data into the temp table using the code from &lt;a href="http://jessesql.blogspot.com/2011/05/xml-into-sql.html"&gt;yesterday's post&lt;/a&gt;, you end up with a hierarchical structure representing the XML data with simple id/parentid columns. So a node at the top level with an id=1 would have nodes at the next level with parentid=1, where id is unique and parentid always references id. There is also a column called 'nodetype' that seems to determine what this node represents. Now just on inspection, I deduced that nodetype=1 are root nodes, 3's are leaf nodes, and 2's are a combination (they have attribute info and children). This code will condense type 3's into their parent:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;SELECT&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;    X1.Id&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;    ,X1.ParentId&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;    ,X1.NodeType&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;    ,X1.LocalName&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;    ,X1.prev&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;    ,text = COALESCE(X1.text, X3.text)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;INTO #tmp2&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;FROM #tmp X1&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;LEFT JOIN #tmp X3&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;    ON X3.ParentId = X1.Id&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;    AND X3.NodeType = 3&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;WHERE X1.NodeType &amp;lt;&amp;gt; 3&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;ORDER BY Id&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-711215360949963105?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/711215360949963105/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2011/05/xml-parsing-with-recursive-table.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/711215360949963105'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/711215360949963105'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2011/05/xml-parsing-with-recursive-table.html' title='XML Parsing with Recursive Table Structure'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-962874113024812312</id><published>2011-05-16T18:57:00.004+01:00</published><updated>2011-05-17T15:33:41.998+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='XML'/><category scheme='http://www.blogger.com/atom/ns#' term='SQL'/><title type='text'>XML into SQL</title><content type='html'>I'm working on loading an XML file into SQL right now, and struggling a bit with it. Not so much the coding itself, but figuring out how to represent the hierarchical XML data in a relational SQL table.&lt;br /&gt;&lt;br /&gt;I did discover this tidbit for taking advantage of SQL's built-in tools for handling XML:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;DECLARE @hdoc int&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;DECLARE @doc varchar(max)&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;SELECT @doc = CONVERT(varchar(max), XML_Column) FROM dbo.XML_Table&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;EXEC sp_xml_preparedocument @hdoc OUTPUT, @doc&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;PRINT @hdoc&lt;/span&gt;  &lt;span style="font-family:courier new;"&gt;SELECT *&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;INTO #tmp&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;FROM OPENXML (@hdoc, '/RootNodeName',2)&lt;/span&gt;  &lt;span style="font-family:courier new;"&gt;EXEC sp_xml_removedocument @hdoc&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Works well.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-962874113024812312?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/962874113024812312/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2011/05/xml-into-sql.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/962874113024812312'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/962874113024812312'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2011/05/xml-into-sql.html' title='XML into SQL'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-8714299095876782933</id><published>2011-05-12T15:40:00.000+01:00</published><updated>2011-05-13T21:30:14.330+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='SQL'/><title type='text'>Getting the Real Name of a Temp Table</title><content type='html'>If you've ever wanted to retrieve the schema for a temp table, you probably noticed that in the TempDb database, the name of your temp table is not quite the same (unless you're using a global temp table - this entry does not apply to those). Say that you create a temp table named "#tmp" in your development database. If you go into TempDb, for example in the Sys.Tables view, you will find your table name, but padded on the right with underscores and a string of digits, to a length of 128 characters. So if you try to look up the schema in TempDb.Information_Schema.Columns using "#tmp", it will fail. To alleviate this problem, I wrote a simple SELECT that returns the true name of that temp table as it is stored in TempDb:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;SELECT TempTableName = OBJECT_NAME(OBJECT_ID('TempDb..#tmp'), (SELECT Database_Id FROM SYS.DATABASES WHERE Name = 'TempDb'))&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-8714299095876782933?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/8714299095876782933/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2011/05/getting-real-name-of-temp-table.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/8714299095876782933'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/8714299095876782933'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2011/05/getting-real-name-of-temp-table.html' title='Getting the Real Name of a Temp Table'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-7377607246354749973</id><published>2010-08-23T20:08:00.002Z</published><updated>2010-08-23T20:13:20.723Z</updated><title type='text'>Getting Row Counts</title><content type='html'>Have you ever tried to get the row count of a table by a simple "SELECT COUNT(*)" statement, and been stupefied by how long it took to return the result? &lt;a href="http://www.mssqlcity.com/Articles/KnowHow/RowCount.htm"&gt;This article&lt;/a&gt; demonstrates how to use sysindexes to speed up that row count:&lt;br /&gt;&lt;br /&gt;&lt;i&gt;SELECT rows FROM sysindexes WHERE id = OBJECT_ID('table_name') AND indid &lt; 2&lt;/i&gt;&lt;br&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-7377607246354749973?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/7377607246354749973/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2010/08/getting-row-counts.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/7377607246354749973'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/7377607246354749973'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2010/08/getting-row-counts.html' title='Getting Row Counts'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-789226401425826819</id><published>2010-08-23T19:06:00.002Z</published><updated>2010-08-23T19:10:07.669Z</updated><title type='text'>Params for Pivoting</title><content type='html'>So let's add a second parameter to indicate the format of the return set. The default value will return it as&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;AttribName    AttribValue    Row Number&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;----------    -----------    ----------&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;&lt;/span&gt;&lt;br /&gt;and the other value will return the fully-pivoted data set:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;AttribName    AttribValue_Row1  ...  AttribValue_RowN&lt;br /&gt;----------    ----------------  ...  ----------------&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-789226401425826819?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/789226401425826819/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2010/08/params-for-pivoting.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/789226401425826819'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/789226401425826819'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2010/08/params-for-pivoting.html' title='Params for Pivoting'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-1570325083780392864</id><published>2010-08-19T18:34:00.000Z</published><updated>2010-08-19T18:35:19.561Z</updated><title type='text'>How to Pivot</title><content type='html'>How about we construct a CTE using dynamic SQL, and we pivot the attribute values by constructing a series of SELECT...UNION SELECT statements, so that we have one SELECT per attribute???&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-1570325083780392864?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/1570325083780392864/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2010/08/how-to-pivot.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/1570325083780392864'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/1570325083780392864'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2010/08/how-to-pivot.html' title='How to Pivot'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-6361214209592017634</id><published>2010-08-19T17:50:00.003Z</published><updated>2010-08-19T18:05:25.255Z</updated><title type='text'>PivotTable SQL Code</title><content type='html'>I want to write TSQL code that will pivot any given table. The way I imagine it, is that I implement this code as a stored proc, which accepts a table name as parameter. It returns a result set comprising the perfect pivot of that table. The result set need not be named, but could be inserted into a table. It will have to convert all values to varchar (or char for easier printing; perhaps can have a switch as a parameter).&lt;br /&gt;&lt;br /&gt;Structure of return set:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;AttribName    AttribValue_Row1  ...  AttribValue_RowN&lt;br /&gt;----------    ----------------  ...  ----------------&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;So, how do we do this? We can easily pivot the column names into values for 'AttribName' by pulling from Sys.Columns view. The problem then becomes how do we pivot the attribute values into the genericized column?&lt;br /&gt;&lt;br /&gt;We need to first pivot the data into this structure:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;AttribName    AttribValue    Row Number&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;----------    -----------    ----------&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Then we can self join the data set as many times as there are rows in the original data. Or not.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-6361214209592017634?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/6361214209592017634/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2010/08/pivottable-sql-code.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/6361214209592017634'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/6361214209592017634'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2010/08/pivottable-sql-code.html' title='PivotTable SQL Code'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-3215771618749534601</id><published>2010-08-19T15:59:00.000Z</published><updated>2010-08-19T16:00:09.753Z</updated><title type='text'>Stairway to Database Design</title><content type='html'>"Designing a database is very easy to do, but hard to do well. There are many databases that have been built in the world that do not meet the needs of the applications they support.&lt;br /&gt;&lt;br /&gt;The fundamentals of design are often not taught to developers or DBAs when they are given the task of building tables. Many people do not understand the logic used to decide what columns to use, the data types for each one, constraints between columns, and the keys that should be defined.&lt;br /&gt;&lt;br /&gt;Joe Celko, widely viewed as an expert in the SQL language, brings us a series that looks to help you understand the fundamentals of the design process. &lt;a href="http://www.sqlservercentral.com/articles/Stairway+Series/70199/"&gt;The articles in the series are linked below&lt;/a&gt;..."&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-3215771618749534601?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/3215771618749534601/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2010/08/stairway-to-database-design.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/3215771618749534601'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/3215771618749534601'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2010/08/stairway-to-database-design.html' title='Stairway to Database Design'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-7278655965861989521</id><published>2010-08-09T18:47:00.000Z</published><updated>2010-08-09T18:48:28.636Z</updated><title type='text'>Random Rows</title><content type='html'>Here's an easy way to select a random row from a table:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;SELECT TOP 1 * FROM Test4 ORDER BY NEWID()&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-7278655965861989521?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/7278655965861989521/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2010/08/random-rows.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/7278655965861989521'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/7278655965861989521'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2010/08/random-rows.html' title='Random Rows'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-303079467354947818</id><published>2010-08-03T20:50:00.003+01:00</published><updated>2010-08-05T16:01:33.507+01:00</updated><title type='text'>Data Profiling, Data Quality</title><content type='html'>A couple of good Wikipedia articles on &lt;a href="http://en.wikipedia.org/wiki/Data_profiling"&gt;data profiling&lt;/a&gt; and &lt;a href="http://en.wikipedia.org/wiki/Data_quality"&gt;data quality&lt;/a&gt;, and one on the (relatively) new &lt;a href="http://www.sqlservercentral.com/articles/Integration+Services+%28SSIS%29/64133/"&gt;data profiling tool in SQL Server 2008 SSIS&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-303079467354947818?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/303079467354947818/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2010/08/data-profiling-data-quality.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/303079467354947818'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/303079467354947818'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2010/08/data-profiling-data-quality.html' title='Data Profiling, Data Quality'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-5876220887758898990</id><published>2010-07-30T16:15:00.002+01:00</published><updated>2010-07-30T16:22:49.354+01:00</updated><title type='text'>SQL Date Functions Syntax</title><content type='html'>One thing that drives me a little crazy with T-SQL is the syntax of the date functions:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;DATEADD &lt;b&gt;(&lt;/b&gt;&lt;i&gt;datepart &lt;/i&gt;&lt;b&gt;,&lt;/b&gt; number, d&lt;i&gt;ate &lt;/i&gt;&lt;b&gt;)&lt;br /&gt;&lt;/b&gt;DATEDIFF &lt;b&gt;(&lt;/b&gt; &lt;i&gt;datepart &lt;/i&gt;&lt;b&gt;, &lt;/b&gt;&lt;i&gt;startdate &lt;/i&gt;&lt;b&gt;, &lt;/b&gt;&lt;i&gt;enddate &lt;/i&gt;&lt;b&gt;)&lt;br /&gt;&lt;/b&gt;DATEPART &lt;b&gt;(&lt;/b&gt; &lt;i&gt;datepart &lt;/i&gt;&lt;b&gt;, &lt;/b&gt;&lt;i&gt;date &lt;/i&gt;&lt;b&gt;)&lt;br /&gt;&lt;br /&gt;&lt;/b&gt;&lt;/pre&gt;The designers were consistent with putting the 'datepart' as the first parameter, but for DATEADD, why did they made the base date the last parameter? I think it should be consistent with DATEPART and DATEDIFF, so that the second parameter is always a DATETIME.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-5876220887758898990?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/5876220887758898990/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2010/07/sql-date-functions-syntax.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/5876220887758898990'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/5876220887758898990'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2010/07/sql-date-functions-syntax.html' title='SQL Date Functions Syntax'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-5441963050411080605</id><published>2010-07-30T14:09:00.000+01:00</published><updated>2010-07-30T14:11:19.039+01:00</updated><title type='text'>Tips to optimize your SQL statements</title><content type='html'>&lt;a href="http://www.sqlservercentral.com/articles/Performance+Tuning/70647/"&gt;Interesting article&lt;/a&gt; on &lt;a href="http://www.SQLServerCentral.com"&gt;SQLServerCentral.com&lt;/a&gt; By Brian Ellul, 2010/07/29:&lt;br /&gt;&lt;br /&gt;There is a huge difference between writing an SQL statement which works  and one which works well and performs well. Sometimes developers are too  focused on just writing their SQL to perform the required task, without  taking into consideration its performance and most importantly its  impact on the SQL Server Instance, i.e. the amount of CPU, IO, and  memory resources their SQL is consuming&lt;strong&gt;.&lt;/strong&gt; Thus, they  starve other SQL Server processes during the SQL statement execution  bringing the whole instance to its knees. This article is intended to  provide the SQL developer with a set of easy checks to perform to try  and optimize the SQL Statements."&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-5441963050411080605?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/5441963050411080605/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2010/07/tips-to-optimize-your-sql-statements.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/5441963050411080605'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/5441963050411080605'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2010/07/tips-to-optimize-your-sql-statements.html' title='Tips to optimize your SQL statements'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-8168911465743942008</id><published>2010-07-27T22:00:00.000+01:00</published><updated>2010-07-27T22:01:37.123+01:00</updated><title type='text'>Level of measurement</title><content type='html'>Interesting article in Wikipedia about metrics, and the different sorts:&lt;br /&gt;&lt;br /&gt;The "&lt;a href="http://en.wikipedia.org/wiki/Level_of_measurement"&gt;&lt;b&gt;levels of measurement&lt;/b&gt;&lt;/a&gt;", or &lt;b&gt;scales of measure&lt;/b&gt; are expressions that typically refer to the &lt;b&gt;theory of scale types&lt;/b&gt; developed by the psychologist &lt;a href="http://en.wikipedia.org/wiki/Stanley_Smith_Stevens" title="Stanley Smith Stevens"&gt;Stanley Smith Stevens&lt;/a&gt;. Stevens proposed his theory in a 1946 &lt;i&gt;&lt;a href="http://en.wikipedia.org/wiki/Science_%28journal%29" title="Science (journal)"&gt;Science&lt;/a&gt;&lt;/i&gt; article titled "On the theory of scales of measurement"&lt;sup id="cite_ref-0" class="reference"&gt;&lt;a href="http://en.wikipedia.org/wiki/Level_of_measurement#cite_note-0"&gt;&lt;span&gt;[&lt;/span&gt;1&lt;span&gt;]&lt;/span&gt;&lt;/a&gt;&lt;/sup&gt;. In this article Stevens claimed that all &lt;a href="http://en.wikipedia.org/wiki/Measurement" title="Measurement"&gt;measurement&lt;/a&gt; in science was conducted using four different types of scales that he called "nominal", "ordinal", "interval" and "ratio".&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-8168911465743942008?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/8168911465743942008/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2010/07/level-of-measurement.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/8168911465743942008'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/8168911465743942008'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2010/07/level-of-measurement.html' title='Level of measurement'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-174382445829070290</id><published>2010-07-26T15:14:00.002+01:00</published><updated>2010-07-26T15:17:15.858+01:00</updated><title type='text'>NK Search</title><content type='html'>I just tested the NK search algorithm on an unindexed table of 9 columns and over 10 million rows, and it ran for 4 minutes before returning the result that no natural key of 4 columns or fewer exists. This ran 255 queries in that time. The biggest performance boost was the use of the sample check, whereby the duplicate count is ran on the TOP 1000 rows first, then the full data set.&lt;br /&gt;&lt;br /&gt;I'm polishing off the code and should post it by the end of the week.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-174382445829070290?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/174382445829070290/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2010/07/nk-search.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/174382445829070290'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/174382445829070290'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2010/07/nk-search.html' title='NK Search'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-7095896329549550834</id><published>2010-07-15T17:11:00.003+01:00</published><updated>2010-07-15T17:14:00.026+01:00</updated><title type='text'>New and Improved Natural Key Discovery Algorithm</title><content type='html'>While preparing for my presentation on code I wrote that discovers natural keys,for the Baltimore SQL Server Users Group meeting on Monday, I discovered a blind spot in the design. I hit the panic button a few days ago, but today I think I've got a new solution that solves a lot of problems.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-7095896329549550834?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/7095896329549550834/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2010/07/new-and-improved-natural-key-discovery.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/7095896329549550834'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/7095896329549550834'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2010/07/new-and-improved-natural-key-discovery.html' title='New and Improved Natural Key Discovery Algorithm'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-3939251192695515234</id><published>2010-07-14T19:30:00.003+01:00</published><updated>2010-07-26T15:26:07.307+01:00</updated><title type='text'>Table Variables and Query Optimizer</title><content type='html'>In yesterday's post I mentioned that I replaced the temp tables in a stored procedure with table variables. This worked perfectly functionally, but doomed the performance by orders of magnitude. The problem was that the query optimizer was choosing MERGE JOINs instead of HASH JOINs for the table variables. Forcing the hash joins with query hints fixed the problem.&lt;br /&gt;&lt;br /&gt;[Ed. note 7/26]&lt;a href="http://ask.sqlservercentral.com/questions/15150/temp-table-performance"&gt;Here's another interesting discussion&lt;/a&gt; about this. Note the post that reads "And also, because table variables do not have statistics, the query  optimizer will often take dubious choices because it believes the row  count to be 1 all the time - leading to inappropriate join methods  etc..."&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-3939251192695515234?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/3939251192695515234/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2010/07/table-variables-and-query-optimizer.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/3939251192695515234'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/3939251192695515234'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2010/07/table-variables-and-query-optimizer.html' title='Table Variables and Query Optimizer'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-1860778016380710656</id><published>2010-07-13T21:51:00.001+01:00</published><updated>2010-07-13T21:53:28.777+01:00</updated><title type='text'>VS2008 SSRS and Temp Tables in Spds</title><content type='html'>Ran into a problem using VS2008 to report the results of data from a SQL Server 2005 stored procedure. Apparently there is an issue with VS2008 and stored procedures that make use of temp tables. According to &lt;a href="http://www.eggheadcafe.com/software/aspnet/31537275/reporting-services-and-temp-tables.aspx"&gt;this article&lt;/a&gt;, I replaced all of the temp tables with variable tables, and the problem went away.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-1860778016380710656?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/1860778016380710656/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2010/07/vs2008-ssrs-and-temp-tables-in-spds.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/1860778016380710656'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/1860778016380710656'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2010/07/vs2008-ssrs-and-temp-tables-in-spds.html' title='VS2008 SSRS and Temp Tables in Spds'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-9186128213830361061</id><published>2010-07-13T03:40:00.003+01:00</published><updated>2010-07-13T03:49:48.694+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='SQL'/><category scheme='http://www.blogger.com/atom/ns#' term='code review'/><title type='text'>Code Review, Years Later</title><content type='html'>I'm reviewing code I wrote a couple of years ago to &lt;a href="http://www.sqlservercentral.com/scripts/T-SQL/62086/"&gt;determine natural keys of raw data sets&lt;/a&gt;, in order to do a talk on it at the &lt;a href="http://bssug.org/"&gt;local SQL Server users' group&lt;/a&gt; meeting next week. After an initial reading, I thought that the code worked a certain way (doing breadth-first searches), but it actually works in a hybrid breadth/depth first fashion.&lt;br /&gt;&lt;br /&gt;The first thought one might have would be something along the lines of "comment much?", but truthfully, I do, or at least I think I do. The code is found in a SQL Server Central article, explaining much of the background of the problem I was trying to solve, and the code itself has an entire page of comments, and yet I find now that there simply isn't one concise explanation of the resultant design of my algorithm.&lt;br /&gt;&lt;br /&gt;There must be a magic number of weeks after code is finished and polished off, when the original intent of the developer is still in his or her brain, but beginning to fade. &lt;span style="font-weight: bold;"&gt;That&lt;/span&gt; is the perfect time to finish commenting the code, because at that time the developer will have to approach the code as an outsider, but still be able to summon those fuzzy thoughts about why something was done a certain way. Does anyone know what that magic number of weeks is?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-9186128213830361061?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/9186128213830361061/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2010/07/code-review-years-later.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/9186128213830361061'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/9186128213830361061'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2010/07/code-review-years-later.html' title='Code Review, Years Later'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-425013451440230478</id><published>2010-07-11T17:55:00.003+01:00</published><updated>2010-07-11T17:56:35.039+01:00</updated><title type='text'>Recursive Stored Procedures and Temp Tables</title><content type='html'>If a stored procedure creates a temp table and then calls itself recursively, will the newly instantiated stored procedure have access to the temp table?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-425013451440230478?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/425013451440230478/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2010/07/recursive-stored-procedures-and-temp.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/425013451440230478'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/425013451440230478'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2010/07/recursive-stored-procedures-and-temp.html' title='Recursive Stored Procedures and Temp Tables'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-1464060284620501990</id><published>2010-07-02T21:37:00.005+01:00</published><updated>2010-07-02T21:58:05.867+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='SQL'/><title type='text'>Temp Table Scope</title><content type='html'>One of the gotchas of TSQL is temp table scope. The fact that I can create a temp table within a stored procedure, then call a second stored procedure and reference that temp table, scares me a little. It reminds me of the 3G languages, with their global variables and modified private scope and such.&lt;br /&gt;&lt;br /&gt;A big enhancement in SQL in this department: table variables. The scope of a table variable is the same as that of a "regular" variable - local to the scope of the procedure. So replacing the use of a temp table in a stored procedure eliminates the concern that a called inner procedure will unexpectedly change the data. However, there are major performance declines when using table variables to hold large amounts of data, as &lt;a href="http://www.sql-server-performance.com/articles/per/temp_tables_vs_variables_p1.aspx"&gt;this article explains&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Another link: &lt;a href="http://www.mssqltips.com/tip.asp?tip=1556"&gt;http://www.mssqltips.com/tip.asp?tip=1556&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-1464060284620501990?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/1464060284620501990/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2010/07/temp-table-scope.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/1464060284620501990'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/1464060284620501990'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2010/07/temp-table-scope.html' title='Temp Table Scope'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-3594886381795052842</id><published>2010-06-30T16:21:00.005+01:00</published><updated>2010-07-01T02:18:03.652+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='admin'/><category scheme='http://www.blogger.com/atom/ns#' term='hardware'/><title type='text'>Installing Second Hard Drive</title><content type='html'>DISCLAIMER: I am not a hardware guy, nor am I a sysadmin type. But I'm trying to do things on the cheap in the DIY spirit. A friend gave me some old hardware, and I'm using it to try and upgrade my old desktop. I had some misadventures last night trying to install a second IDE drive on my Dell Optiplex 270 (yes I know it's old, it's just a lab computer). The drive is a 40gb Seagate that I'm planning on using for TempDb. When I first put it in, I didn't change the jumper settings, so the pc thought I had two master IDE drives. This caused BIG problems in BIOS. I could no longer boot, and my computer no longer recognized my original master drive. I changed the jumper settings of the second drive and tried to reboot - no luck. Now the pc was reporting TWO unknown drives (which I suppose is progress). I played around with the BIOS settings, but again, no luck. Then today after googling the task, I came across &lt;a href="http://www.helpwithpcs.com/upgrading/install-hard-drive.htm"&gt;this how-to article&lt;/a&gt; that explains that the IDE cable that connects the IDE drives to the motherboard must be plugged in to the drives in a very specific way. The instructions worked perfectly. When I booted up in Windows, I noticed that that drive had two partitions of 20gb each. Wanting to use the entire drive for my new TempDb, I researched &lt;a href="http://wiki.answers.com/Q/How_do_you_remove_a_partition_on_your_hard_drive"&gt;how to delete the partitions and reformat&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-3594886381795052842?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/3594886381795052842/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2010/06/installing-second-hard-drive.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/3594886381795052842'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/3594886381795052842'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2010/06/installing-second-hard-drive.html' title='Installing Second Hard Drive'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-4565678198246878528</id><published>2010-06-29T14:41:00.003+01:00</published><updated>2010-06-29T15:47:11.509+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='TSQL Challenge'/><category scheme='http://www.blogger.com/atom/ns#' term='SQL'/><title type='text'>T-SQL Challenge #33</title><content type='html'>Just finished &lt;a href="http://beyondrelational.com/blogs/tc/archive/2010/06/28/TSQL-Challenge-33-Calculate-list-of-bookings-fully-paid-on-the-booking-date.aspx"&gt;T-SQL Challenge #33&lt;/a&gt;. It was less challenging than some of the others (I finished it in under 20 minutes), but the problem was interesting enough and still required the declarative thinking that is the aim of these challenges.&lt;br /&gt;&lt;br /&gt;Also last week &lt;a href="http://www.sqlservercentral.com/Authors/Articles/RBarry_Young/659055/"&gt;R. Barry Young&lt;/a&gt; published articles on how to gain this mode of thinking: "&lt;a href="http://www.sqlservercentral.com/articles/T-SQL/66097/"&gt;There Must Be 15 Ways To Lose Your Cursors... part 1, Introduction&lt;/a&gt;".&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-4565678198246878528?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/4565678198246878528/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2010/06/t-sql-challenge-33.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/4565678198246878528'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/4565678198246878528'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2010/06/t-sql-challenge-33.html' title='T-SQL Challenge #33'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-4520030011215240587</id><published>2010-06-24T04:33:00.002+01:00</published><updated>2010-06-30T21:48:51.453+01:00</updated><title type='text'>SQL Search Tool</title><content type='html'>I'm not big on 3rd party plug-in tools, and am not much for shilling for companies, but I LOVE Red Gate's &lt;a href="http://www.red-gate.com/products/SQL_Search/"&gt;SQL Search product&lt;/a&gt;. It makes searching your database for text fragments very easy (like when you analyze the impact of changing a column name, for example), and displays the results in an easy-to-use interface that enables you to click and edit the affected objects.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-4520030011215240587?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/4520030011215240587/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2010/06/sql-search-tool.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/4520030011215240587'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/4520030011215240587'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2010/06/sql-search-tool.html' title='SQL Search Tool'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-8178424005271526205</id><published>2010-06-11T06:05:00.002+01:00</published><updated>2010-06-11T06:09:31.305+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='SQL'/><title type='text'>Needle in a Haystack</title><content type='html'>While working on loading 13m rows into a staging table, a name-parsing routine came across unexpected data, and the load failed. One of the greatest drawbacks to set-based ETL such as T-SQL, versus row- or batch-based ETL tools such as Informatica or SSIS, is that data discrepancies cause failures that are difficult to diagnose. In this case, the load failed with the error:&lt;br /&gt;&lt;br /&gt;"Msg 537, Level 16, State 5, Procedure spd_Load_Stg_Names, Line 31&lt;br /&gt;Invalid length parameter passed to the LEFT or SUBSTRING function.&lt;br /&gt;The statement has been terminated."&lt;br /&gt;&lt;br /&gt;This is almost useless as far as finding the problem data, as it does not tell me what value, or which row, caused the failure.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-8178424005271526205?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/8178424005271526205/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2010/06/needle-in-haystack.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/8178424005271526205'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/8178424005271526205'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2010/06/needle-in-haystack.html' title='Needle in a Haystack'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-6090686451384447431</id><published>2010-05-23T17:53:00.003+01:00</published><updated>2010-05-23T19:52:12.945+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='file import'/><category scheme='http://www.blogger.com/atom/ns#' term='SQL'/><title type='text'>Bulk Insert CSV with Text Qualifiers</title><content type='html'>One of the biggest shortcomings of SQL Server's BCP/Bulk Insert tool is the lack of specifying a text qualifier in a comma-delimited ("CSV"), text file for import. Let's take a look at an example of a record from a CSV file that we wish to import into a table:&lt;br /&gt;&lt;br /&gt;"John Smith", "123 Main St, Apt 3", "Anytown", "XX", "12345"&lt;br /&gt;&lt;br /&gt;So the format of the record is Name, Address, City, State, ZIP. The fields are separated by commas, and encapsulated with double-quotes. The "Address" field demonstrates the need for the text qualifier: the embedded comma that separates the first address line from the second. Without text qualifiers each line of address would appear as a separate field, rather than one discrete value. That in itself would not be a problem, but because some addresses only have one line, the number of delimiters becomes variable, and the import is then faulty.&lt;br /&gt;&lt;br /&gt;The problem crystalizes when we try to import a file of that format into SQL Server. The BCP/Bulk Insert utilities do not have an option to specify the text qualifier. We have a couple of "kloodgy" options before us: We can specify the delimiter as ','. The problem here is that every field encapsulated with the double quotes will retain that character in the database, leaving us to update the table to remove those characters. We can specify that the delimiter is '","' instead of ','; this is a step in the right direction, but it will still leave a double quote leading in the first field and trailing in the last, leaving us with less work than in the first case.&lt;br /&gt;&lt;br /&gt;Given the maturity of the SQL Server product, I'm surprised that Microsoft hasn't added this feature. I suppose that is their way of moving developers towards SSIS, which of course does have it.&lt;br /&gt;&lt;br /&gt;If we really want to properly import this file using BCP or BULK INSERT without any weird cleanup kludges, we have to use a format file. &lt;a href="http://msdn.microsoft.com/en-us/library/ms191516.aspx"&gt;Here's a good article on MSDN&lt;/a&gt; about how to create a format file from scratch using a BCP option. To complete my task, I will take the resultant format file and modify it to account for my comma delimiters and text qualifiers.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-6090686451384447431?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/6090686451384447431/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2010/05/bulk-insert-csv-with-text-qualifiers.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/6090686451384447431'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/6090686451384447431'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2010/05/bulk-insert-csv-with-text-qualifiers.html' title='Bulk Insert CSV with Text Qualifiers'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-2982298838376908040</id><published>2009-12-08T19:37:00.003Z</published><updated>2010-06-30T22:04:59.623+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='TSQL Challenge'/><title type='text'>TSQL Challenge #18 Blues</title><content type='html'>I'm working on &lt;a href="http://beyondrelational.com/blogs/tc/archive/2009/11/30/tsql-challenge-18-generate-text-formatted-month-calendars.aspx"&gt;TSQL Challenge #18&lt;/a&gt;, and hitting up against a brick wall. The challenge involves building a calendar for given months of interest (month/year that is). So I build a CTE that first figures out the first &amp;amp; last day of the month, then the week numbers for those days. Matching the week numbers against a tally table, I can then tell which weeks that the days of a particular month will span - those weeks then become rows in the results. I then build a CTE to hold all of my days of interest, with the week number and day-of-month as columns. Using my 'Weeks' CTE as a base, I LEFT JOIN the 'Days' dataset seven times - one for each day of the week.&lt;br /&gt;&lt;br /&gt;This is the error I encounter: &lt;span style="color: rgb(255, 0, 0);font-family:courier new;" &gt;"The conversion of a char data type to a datetime data type resulted in an out-of-range datetime value."&lt;/span&gt; If I limit my day-related tally table to a max of 28 (rather than 31), the error disappears, but I don't get any days in my calendar past the 28th of course. The weird thing is, if I set the tally to max of 31, and I select from the 'Days2' CTE, I don't get an error, and all the days I expect to see are present. Something strange is going on under the hood of SQL.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-2982298838376908040?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/2982298838376908040/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2009/12/tsql-challenge-18.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/2982298838376908040'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/2982298838376908040'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2009/12/tsql-challenge-18.html' title='TSQL Challenge #18 Blues'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-2995483376837067524</id><published>2009-12-03T18:53:00.004Z</published><updated>2009-12-04T21:14:09.370Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='ETL'/><category scheme='http://www.blogger.com/atom/ns#' term='DW'/><title type='text'>Kimball Design Tip #119 Updating the Date Dimension</title><content type='html'>I've always thought of the Time or Date dimension to be a static one, rather than slowly-changing, until I read this &lt;a href="http://www.kimballgroup.com/html/designtips.html"&gt;tip from Kimball U&lt;/a&gt;: "...there are some attributes you may add to the basic date dimension that will change over time. These include indicators such as IsCurrentDay, IsCurrentMonth, IsPriorDay, IsPriorMonth, and so on. IsCurrentDay obviously must be updated each day."&lt;br /&gt;&lt;br /&gt;Creating and automatically updating these attributes of the Date dimension will save time (sorry, no pun intended), for report writers and data mart developers, and also standardize such calculations across the enterprise. For example, let's say you are responsible for the data warehouse of a catalog company, and you want to calculate lag time between ordering and shipping, a common metric of customer satisfaction. Furthermore, management decides to give everyone an extra holiday as a bonus for company profitability from last year, and so lets everyone take off Groundhog Day. To account for this in your metric, you are going to want to have an attribute like "BusinessDayOfYear" as an incrementing integer, so that the first business day in January gets assigned BusinessDayOfYear=1, the second day is 2, and so forth, until you have the highest number as the last business day in December (which is most likely Dec 30th). If the company now has Groundhog Day as a holiday, then the IsHoliday for Feb 2nd is changed from 0 to 1, and then the BusinessDayOfYear attribute is recalculated.&lt;br /&gt;&lt;br /&gt;Calculating how many business days between ordering and shipping is then trivial. Of course, this does not account for orders placed in December that ship in January, so you might want to have an attribute BusinessDayLifetime, which starts at 1 the first day the company was in business, and increments freely forever.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-2995483376837067524?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/2995483376837067524/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2009/12/kimball-design-tip-119-updating-date.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/2995483376837067524'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/2995483376837067524'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2009/12/kimball-design-tip-119-updating-date.html' title='Kimball Design Tip #119 Updating the Date Dimension'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-256290849696089794</id><published>2009-11-30T16:01:00.003Z</published><updated>2009-11-30T16:46:09.834Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='CEP'/><category scheme='http://www.blogger.com/atom/ns#' term='SSC'/><category scheme='http://www.blogger.com/atom/ns#' term='SQL'/><title type='text'>Building a Yahoo Finance input adapter for SQL Server StreamInsight</title><content type='html'>&lt;a href="http://www.sqlservercentral.com/articles/StreamInsight/68375/"&gt;Here's an excellent SSC article&lt;/a&gt; by &lt;a href="http://blogical.se/blogs/jahlen/default.aspx"&gt;Johan Åhlén&lt;/a&gt; on building &lt;a href="http://en.wikipedia.org/wiki/Complex_event_processing"&gt;CEP (Complex Event Processing) &lt;/a&gt;solutions in SQL 2008. "What is StreamInsight and what is it good for? StreamInsight is a platform for developing and deploying applications that handle high-speed streaming data. It could be used for near real-time processing of data from production environments, structured data such as financial information and unstructured data such as Facebook, Twitter and blogs. Multiple sources can be combined and refined before they are being output. This technology is called CEP - complex event processing.&lt;br /&gt;&lt;br /&gt;"Combine StreamInsight with data mining, and you can have real-time fraud detection, stock market prediction, you name it... Imagine a real-time Business Intelligence-application where the management can actually see the KPI gauges moving on the screen. &lt;strong&gt;I would say that monitoring real-time flow of information is one of the key success factors for tomorrow's Business Intelligence.&lt;/strong&gt; This is also supported by Ralph Kimball, saying in an interview that Business Intelligence is moving from the strategic level towards operational level processes such as customer support, logistics and sales. At the operational level data must be continuously updated, not just once per day. I would add also that the new generation, that has grown up with Facebook and Twitter, will make it necessary to monitor new sources for successful Business Intelligence."&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-256290849696089794?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/256290849696089794/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2009/11/building-yahoo-finance-input-adapter.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/256290849696089794'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/256290849696089794'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2009/11/building-yahoo-finance-input-adapter.html' title='Building a Yahoo Finance input adapter for SQL Server StreamInsight'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-4777235479656733310</id><published>2009-11-23T15:56:00.003Z</published><updated>2009-11-23T16:08:24.586Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='ETL'/><category scheme='http://www.blogger.com/atom/ns#' term='DW'/><category scheme='http://www.blogger.com/atom/ns#' term='SQL'/><title type='text'>Data Vault Institute</title><content type='html'>While reading the &lt;a href="http://www.sqlservercentral.com/Forums/Topic823043-1693-1.aspx"&gt;comments&lt;/a&gt; on an SSC article about fault-tolerant ETL loading, I came across a link to the &lt;a href="http://www.datavaultinstitute.com/"&gt;Data Vault Institute&lt;/a&gt;, and what sparked my attention was the comment by &lt;a href="http://www.danlinstedt.com/"&gt;Dan Linstedt&lt;/a&gt; that "today's data warehouse has become a system of record. Due in part for the need of compliance." This struck me as an important point, given that many of the databases on which I've worked in the past were marketing databases, where bad data was generally disposed of. But in financial, health care, and governmental data warehouses, this approach would be completely unsatisfactory.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-4777235479656733310?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/4777235479656733310/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2009/11/data-vault-institute.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/4777235479656733310'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/4777235479656733310'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2009/11/data-vault-institute.html' title='Data Vault Institute'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-5786647300700171968</id><published>2009-11-16T14:34:00.000Z</published><updated>2009-11-16T14:35:53.797Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='TSQL Challenge'/><category scheme='http://www.blogger.com/atom/ns#' term='SQL'/><title type='text'>TSQL Challenge #17</title><content type='html'>&lt;a href="http://beyondrelational.com/blogs/tc/archive/2009/11/16/tsql-challenge-17-creating-cross-rows-references-with-inline-hyperlinks.aspx"&gt;"TSQL Challenge 17 - Creating cross rows references with inline hyperlinks"&lt;/a&gt; is now out. Due in - a couple of weeks?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-5786647300700171968?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/5786647300700171968/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2009/11/tsql-challenge-17.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/5786647300700171968'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/5786647300700171968'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2009/11/tsql-challenge-17.html' title='TSQL Challenge #17'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-2944934355993662788</id><published>2009-11-09T18:13:00.004Z</published><updated>2009-11-10T06:51:04.320Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='SQL'/><title type='text'>Intelligent Keys</title><content type='html'>I'm engaged in an internal debate about the use of &lt;a href="http://www.bcarter.com/intsurr1.htm"&gt;intelligent versus surrogate keys.&lt;/a&gt; Typically when this issue arises, the debate centers around whether we want to use a key that is already present in the data (such as SSN in an employee table - this is an intelligent key, also known as a &lt;a href="http://en.wikipedia.org/wiki/Natural_key"&gt;natural key&lt;/a&gt;), or if it's better to generate a new meaningless key (such as an auto-incrementing integer - this is a &lt;a href="http://en.wikipedia.org/wiki/Surrogate_key"&gt;surrogate key&lt;/a&gt;).&lt;br /&gt;&lt;br /&gt;Now the internal debate isn't over that issue per se - I fall on the side that favors the surrogate key creation. The real debate I'm in is whether it's okay to create an &lt;em&gt;intelligent surrogate key&lt;/em&gt;. The most typical surrogate as mentioned previously is an auto-incrementing integer identity - every time a row is inserted into the table, a new key is created by adding one to the max value. These keys have zero business meaning - that's their advantage, that they are decoupled from the business data. However, there are situations where it makes sense to create this value intelligently. One example is creating a time dimension in a data warehouse, whereby the primary key consists of an integer in the form "YYYYMMDD". Microsoft favors this method (as I discovered in their &lt;a href="http://www.amazon.com/MCTS-Self-Paced-Training-Exam-70-448/dp/0735626367/ref=sr_1_1?ie=UTF8&amp;amp;s=books&amp;amp;qid=1257790985&amp;amp;sr=1-1"&gt;training kit for exam 70-448&lt;/a&gt;). A big advantage to this approach is that if you create a clustered index on that intelligent surrogate key, all of your data will be sorted in date order (of course, if you insert into that table by earliest date first, it will also be in that order - unless you add an earlier time period at a later date).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-2944934355993662788?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/2944934355993662788/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2009/11/intelligent-keys.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/2944934355993662788'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/2944934355993662788'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2009/11/intelligent-keys.html' title='Intelligent Keys'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-8913920321075716445</id><published>2009-11-09T15:05:00.000Z</published><updated>2009-11-09T15:06:04.882Z</updated><title type='text'>TSQL Challenge #16 Out</title><content type='html'>I missed it (was out a week ago): &lt;a href="http://beyondrelational.com/blogs/tc/archive/2009/11/02/tsql-challenge-16-find-intersections-in-date-ranges-and-concatenate-aggregated-labels.aspx"&gt;http://beyondrelational.com/blogs/tc/archive/2009/11/02/tsql-challenge-16-find-intersections-in-date-ranges-and-concatenate-aggregated-labels.aspx&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-8913920321075716445?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/8913920321075716445/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2009/11/tsql-challenge-16-out.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/8913920321075716445'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/8913920321075716445'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2009/11/tsql-challenge-16-out.html' title='TSQL Challenge #16 Out'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-2366769100428465011</id><published>2009-10-27T14:15:00.006Z</published><updated>2009-11-10T18:55:42.098Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='ETL'/><category scheme='http://www.blogger.com/atom/ns#' term='BI'/><category scheme='http://www.blogger.com/atom/ns#' term='SQL'/><category scheme='http://www.blogger.com/atom/ns#' term='101'/><title type='text'>SQL/ETL/BI 101</title><content type='html'>I'm going to compile a list of articles that provide introductions to topics in SQL and BI.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.sqlservercentral.com/articles/Indexing/68439/"&gt;Introduction to Indexes&lt;/a&gt;, by &lt;a href="http://www.sqlservercentral.com/Authors/Articles/Gail_Shaw/148127/"&gt;Gail Shaw&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.databasejournal.com/features/article.php/3593466/MS-SQL-Series.htm"&gt;Index of MS-SQL Articles on DatabaseJournal.com&lt;/a&gt;, a gold mine of introductory articles.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.b-eye-network.com/view/11352"&gt;"What a Data Warehouse is Not"&lt;/a&gt; by Bill Inmon.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-2366769100428465011?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/2366769100428465011/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2009/10/sql-101.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/2366769100428465011'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/2366769100428465011'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2009/10/sql-101.html' title='SQL/ETL/BI 101'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-5930615805086062342</id><published>2009-10-23T20:01:00.002Z</published><updated>2009-10-23T20:05:23.813Z</updated><title type='text'>SSIS Data Cleansing Webinar</title><content type='html'>I watched a good webinar on &lt;a href="http://blogs.pragmaticworks.com/pragmatic-works/2009/10/data-cleansing-webinar.html"&gt;SSIS Data Cleansing&lt;/a&gt;, by Brian Knight and the good folks of &lt;a href="http://www.pragmaticworks.com/"&gt;Pragmatic Works&lt;/a&gt; of Jacksonville FL. "In this session with SQL Server MVP, Brian Knight, you'll learn how to cleanse your data and apply business rules to your data in SSIS. Learn how to solve complex data problems quickly in SSIS using simple techniques in the data flow. Brian will start by showing you the Data Profiling Task. Then he'll show how to use transforms like Fuzzy Grouping to de-duplicate your data and SSIS scripts to satisfy common scenarios he sees in the industry."&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-5930615805086062342?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/5930615805086062342/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2009/10/ssis-data-cleansing-webinar.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/5930615805086062342'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/5930615805086062342'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2009/10/ssis-data-cleansing-webinar.html' title='SSIS Data Cleansing Webinar'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-2869563691989347704</id><published>2009-10-23T13:54:00.005Z</published><updated>2009-10-23T19:03:37.331Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='SSC'/><category scheme='http://www.blogger.com/atom/ns#' term='DBA'/><category scheme='http://www.blogger.com/atom/ns#' term='SQL'/><title type='text'>Help, my database is corrupt. Now what?</title><content type='html'>&lt;p&gt;Found a &lt;a href="http://www.sqlservercentral.com/articles/65804/"&gt;good article on SSC about database corruption &lt;/a&gt;by &lt;a href="http://www.sqlservercentral.com/Authors/Articles/Gail_Shaw/148127/"&gt;Gail Shaw&lt;/a&gt;:&lt;/p&gt;&lt;p&gt;"What to do when the database is corrupt.&lt;/p&gt;&lt;ol&gt;&lt;li&gt;Don't panic&lt;/li&gt;&lt;li&gt;Don't detach the database&lt;/li&gt;&lt;li&gt;Don't restart SQL&lt;/li&gt;&lt;li&gt;Don't just run repair.&lt;/li&gt;&lt;li&gt;Run an integrity check&lt;/li&gt;&lt;li&gt;Afterwards, do a root-cause analysis"&lt;br /&gt;&lt;/li&gt;&lt;/ol&gt;&lt;p&gt;Don't have a corrupt database, but still want to play in the sandbox? Click &lt;a href="http://sqlblogcasts.com/blogs/tonyrogerson/archive/2007/03/10/how-to-create-a-corrupt-database-using-bulk-insert-update-and-bcp-sql-server-as-a-hex-editor.aspx"&gt;here&lt;/a&gt; and &lt;a href="http://sqlblog.com/blogs/linchi_shea/archive/2007/08/29/how-to-corrupt-a-sql-server-2005-database.aspx"&gt;here&lt;/a&gt; for a couple of ways to &lt;em&gt;corrupt&lt;/em&gt; your database. (Don't do this to production data!)&lt;/p&gt;&lt;p&gt;And if all else fails, there's a product called &lt;a href="http://www.officerecovery.com/mssql/index.htm"&gt;Recovery for SQL Server&lt;/a&gt; to help fix your files.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-2869563691989347704?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/2869563691989347704/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2009/10/help-my-database-is-corrupt-now-what.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/2869563691989347704'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/2869563691989347704'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2009/10/help-my-database-is-corrupt-now-what.html' title='Help, my database is corrupt. Now what?'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-6440140627228017602</id><published>2009-10-22T14:59:00.001Z</published><updated>2009-10-22T15:02:11.534Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='SSC'/><category scheme='http://www.blogger.com/atom/ns#' term='DBA'/><category scheme='http://www.blogger.com/atom/ns#' term='Query Performance'/><title type='text'>SQL Server Execution Plans</title><content type='html'>&lt;a href="http://www.sqlservercentral.com/articles/books/65831/"&gt;Here's a link on SSC&lt;/a&gt; to a free PDF download of Grant Fitchley's "SQL Server Execution Plans", which is, funnily enough, about how to interpret and act upon an execution plan in SQL Server. "Every day, out in the various SQL Server forums, the same types of questions come up again and again: why is this query running slow? Why isn't my index getting used? And on and on. In order to arrive at the answer you have to ask the same return question in each case: have you looked at the execution plan? " – Grant Fritchey&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-6440140627228017602?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/6440140627228017602/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2009/10/sql-server-execution-plans.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/6440140627228017602'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/6440140627228017602'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2009/10/sql-server-execution-plans.html' title='SQL Server Execution Plans'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-2678722637921481827</id><published>2009-10-20T16:29:00.003Z</published><updated>2009-10-20T17:17:27.258Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='ETL'/><title type='text'>Kimball University: Six Key Decisions for ETL Architectures</title><content type='html'>&lt;a href="http://www.sqlservercentral.com/"&gt;SSC&lt;/a&gt; posted a link to a good article titled &lt;a href="http://intelligent-enterprise.informationweek.com/channels/information_management/showArticle.jhtml?articleID=220600174"&gt;"Kimball University: Six Key Decisions for ETL Architectures"&lt;/a&gt;, by &lt;a class="hpBlogAuthor" href="http://www.intelligententerprise.com/experts/kimball;jsessionid=TCXNFAWUVH3IJQE1GHPSKH4ATMY32JVN"&gt;Bob Becker&lt;/a&gt;. Although written for directors/managers, I think developers will also find many of the points useful to understanding the context of an ETL project.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-2678722637921481827?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/2678722637921481827/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2009/10/kimball-university-six-key-decisions.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/2678722637921481827'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/2678722637921481827'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2009/10/kimball-university-six-key-decisions.html' title='Kimball University: Six Key Decisions for ETL Architectures'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-1946794452331249843</id><published>2009-10-19T19:12:00.002Z</published><updated>2009-10-19T19:16:21.700Z</updated><title type='text'>The ‘Subscription List’ SQL Problem</title><content type='html'>SQL ServerCentral now has a "Stack-Overflow"-type forum for SQL questions, &lt;a href="http://ask.sqlservercentral.com/"&gt;ask.sqlservercentral.com&lt;/a&gt;. Phil Factor posted an interesting problem &lt;a href="http://ask.sqlservercentral.com/questions/92/the-subscription-list-sql-problem"&gt;here&lt;/a&gt; that asks developers to post solutions to a report on a subscription list.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-1946794452331249843?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/1946794452331249843/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2009/10/subscription-list-sql-problem.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/1946794452331249843'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/1946794452331249843'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2009/10/subscription-list-sql-problem.html' title='The ‘Subscription List’ SQL Problem'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-7689482700921791995</id><published>2009-10-19T14:47:00.004Z</published><updated>2009-10-19T18:49:15.308Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Celko'/><category scheme='http://www.blogger.com/atom/ns#' term='Statistics'/><category scheme='http://www.blogger.com/atom/ns#' term='SQL'/><title type='text'>Celko on Stats</title><content type='html'>&lt;a href="http://www.simple-talk.com/sql/t-sql-programming/causation,-correlation-and-crackpots/?utm_source=simpletalk&amp;amp;utm_medium=email&amp;amp;utm_content=Causation20090921&amp;amp;utm_campaign=SQL"&gt;Click here&lt;/a&gt; for a really good article by Joe Celko on statistics. He discusses the difference between causation and correlation, and how to compute such things in SQL.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-7689482700921791995?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/7689482700921791995/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2009/10/celko-on-stats.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/7689482700921791995'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/7689482700921791995'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2009/10/celko-on-stats.html' title='Celko on Stats'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-2375775645089249034</id><published>2009-10-19T14:21:00.003Z</published><updated>2009-10-22T15:03:00.177Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='TSQL Challenge'/><category scheme='http://www.blogger.com/atom/ns#' term='SQL'/><title type='text'>T-SQL Challenge #15</title><content type='html'>&lt;a href="http://beyondrelational.com/blogs/tc/archive/2009/10/19/TSQL-Challenge-15-matching-data-between-rows-and-columns.aspx"&gt;T-SQL Challenge #15&lt;/a&gt; is out. It requires the use of PIVOT; an example can be found on MSDN at the &lt;a href="http://msdn.microsoft.com/en-us/library/ms177634.aspx"&gt;bottom of this entry&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-2375775645089249034?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/2375775645089249034/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2009/10/t-sql-challenge-15.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/2375775645089249034'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/2375775645089249034'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2009/10/t-sql-challenge-15.html' title='T-SQL Challenge #15'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-46035001578235502</id><published>2009-10-13T19:15:00.002Z</published><updated>2009-10-13T19:18:29.441Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='DBA'/><category scheme='http://www.blogger.com/atom/ns#' term='Powershell'/><category scheme='http://www.blogger.com/atom/ns#' term='SQL'/><title type='text'>Database Space Capacity Planning</title><content type='html'>Good article on &lt;a href="http://www.sqlservercentral.com/articles/powershell/68011/"&gt;"Database Space Capacity Planning"&lt;/a&gt; by &lt;a href="http://www.sqlservercentral.com/Authors/Articles/Chad_Miller/421/"&gt;Chad Miller&lt;/a&gt;, that uses Powershell to query the space capacity of every SQL Server on your network. From his summary: "The need to monitor and forecast database and volume space is a critical task for database administrators. You can use the process described in this article to create a consolidated space forecasting report, which focuses on a "days remaining" metric. In addition, the use of PowerShell to collect data and load into a SQL table as demonstrated in this article, provides a solution you can easily adapt to many database administration problems."&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-46035001578235502?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/46035001578235502/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2009/10/database-space-capacity-planning.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/46035001578235502'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/46035001578235502'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2009/10/database-space-capacity-planning.html' title='Database Space Capacity Planning'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-7629536359778424595</id><published>2009-10-03T19:25:00.002Z</published><updated>2009-10-03T19:30:16.153Z</updated><title type='text'>Are Row Based DBs the Problem in BI?</title><content type='html'>"Using a traditional, row-based database to run critical reporting and analytics systems is like entering a delivery truck in a Grand Prix race. It's just not what it was designed to do." - Dan Lahl, director of analytics for Sybase. Interesting article by &lt;a href="http://www.ciozone.com/index.php?option=com_myblog&amp;amp;blogger=charlesb2k&amp;amp;Itemid=732"&gt;charlesb2k&lt;/a&gt; on the pros and cons of transactional, star-schema, and columnar databases for BI and analytics: &lt;a href="http://www.ciozone.com/index.php?option=com_myblog&amp;amp;show=Are-Row-Based-DBs-the-Problem-in-BI-.html&amp;amp;Itemid=732"&gt;"Are Row Based DBs the Problem in BI?"&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-7629536359778424595?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/7629536359778424595/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2009/10/are-row-based-dbs-problem-in-bi.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/7629536359778424595'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/7629536359778424595'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2009/10/are-row-based-dbs-problem-in-bi.html' title='Are Row Based DBs the Problem in BI?'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-992650218143301822</id><published>2009-09-30T21:33:00.001Z</published><updated>2009-09-30T21:35:18.828Z</updated><title type='text'>Setting up Change Data Capture in SQL Server 2008</title><content type='html'>Good article on &lt;a href="http://www.builderau.com.au/program/sqlserver/soa/Setting-up-Change-Data-Capture-in-SQL-Server-2008/0,339028455,339297651,00.htm"&gt;"Setting up Change Data Capture in SQL Server 2008"&lt;/a&gt; by &lt;a href="http://www.builderau.com.au/program/sqlserver/email.htm?TYPE=editor&amp;amp;AT=339297651-339028455t-320002016c"&gt;Tim Chapman, TechRepublic&lt;/a&gt;. As some of the commentators noted, the article explains how to easily setup CDC in SQL 2008, but provides no help for querying the CDC tables.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-992650218143301822?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/992650218143301822/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2009/09/setting-up-change-data-capture-in-sql.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/992650218143301822'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/992650218143301822'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2009/09/setting-up-change-data-capture-in-sql.html' title='Setting up Change Data Capture in SQL Server 2008'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-4970119020335204246</id><published>2009-09-30T20:49:00.002Z</published><updated>2009-09-30T20:54:06.115Z</updated><title type='text'>How to Calculate Data Warehouse Reliability</title><content type='html'>Very good article on &lt;a href="http://www.information-management.com/infodirect/2009_138/extract_transform_load_data_warehouse-10016063-1.html"&gt;"How to Calculate Data Warehouse Reliability"&lt;/a&gt; by &lt;a href="http://www.information-management.com/authors/1062851.html"&gt;Ashok Nayak&lt;/a&gt;. He makes the point that even if every stage in your data warehouse processing has 90%+ reliability, your DW as a whole might only have 70 or 80% reliability. Conventional thinking leads us to believe that every stage is like a link in a chain (where the chain is only as strong as its weakest link), but he concludes that it's much worse than that, that processes P1, P2, and P3, with independent reliabilities R1, R2, and R3, respectively, is not MIN(R1, R2, R3), but is actually R1 * R2 * R3.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-4970119020335204246?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/4970119020335204246/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2009/09/how-to-calculate-data-warehouse.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/4970119020335204246'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/4970119020335204246'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2009/09/how-to-calculate-data-warehouse.html' title='How to Calculate Data Warehouse Reliability'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-4536894248273008269</id><published>2009-09-28T19:17:00.002Z</published><updated>2009-09-28T19:38:05.633Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='SQL'/><title type='text'>T-SQL Challenge #14</title><content type='html'>&lt;a href="http://beyondrelational.com/blogs/tc/archive/2009/09/28/tsql-challenge-14-identify-the-longest-sequence-of-characters-in-a-string.aspx"&gt;T-SQL Challenge #14&lt;/a&gt; is out. I've completed the challenge functionally, now I will tweak my solution to improve performance.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://1.bp.blogspot.com/_rPFttr3fnLI/SsEQiHE9Q0I/AAAAAAAAAEg/ARZ3ssofKt4/s1600-h/TSQL14.GIF"&gt;&lt;img id="BLOGGER_PHOTO_ID_5386604807513260866" style="WIDTH: 216px; CURSOR: hand; HEIGHT: 320px" alt="" src="http://1.bp.blogspot.com/_rPFttr3fnLI/SsEQiHE9Q0I/AAAAAAAAAEg/ARZ3ssofKt4/s320/TSQL14.GIF" border="0" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-4536894248273008269?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/4536894248273008269/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2009/09/t-sql-challenge-14.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/4536894248273008269'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/4536894248273008269'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2009/09/t-sql-challenge-14.html' title='T-SQL Challenge #14'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_rPFttr3fnLI/SsEQiHE9Q0I/AAAAAAAAAEg/ARZ3ssofKt4/s72-c/TSQL14.GIF' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-8733007074960690873</id><published>2009-09-18T17:44:00.004Z</published><updated>2009-09-28T19:20:02.991Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Books'/><category scheme='http://www.blogger.com/atom/ns#' term='SQL Gurus'/><category scheme='http://www.blogger.com/atom/ns#' term='SQL'/><title type='text'>Books by Joe Celko</title><content type='html'>&lt;a href="http://www.sqlservercentral.com/blogs/sqlmanofmystery/archive/2009/09/12/what-i-ve-read-and-recommend-to-others-joe-celko-edition.aspx"&gt;Excellent article&lt;/a&gt; by Wesley Brown that summarizes a bunch of books by the legendary &lt;a href="http://en.wikipedia.org/wiki/Joe_Celko"&gt;Joe Celko&lt;/a&gt;. Don't know Joe? "He has participated on the &lt;a class="mw-redirect" title="ANSI" href="http://en.wikipedia.org/wiki/ANSI"&gt;ANSI&lt;/a&gt; X3H2 Database Standards Committee, and helped write the &lt;a class="mw-redirect" title="Structured Query Language" href="http://en.wikipedia.org/wiki/Structured_Query_Language"&gt;SQL&lt;/a&gt;-89 and SQL-92 standards. He is the author of seven books on SQL, and over 800 published articles on SQL and other database topics."&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-8733007074960690873?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/8733007074960690873/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2009/09/books-by-joe-celko.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/8733007074960690873'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/8733007074960690873'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2009/09/books-by-joe-celko.html' title='Books by Joe Celko'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-8660501065407579650</id><published>2009-09-09T13:52:00.001Z</published><updated>2009-09-09T13:54:01.599Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='SQL'/><title type='text'>SQL Formatting</title><content type='html'>&lt;a href="http://www.simple-talk.com/sql/t-sql-programming/transact-sql-formatting-standards-%28coding-styles%29/?utm_source=simpletalk&amp;amp;utm_medium=email&amp;amp;utm_content=CodeFormatting20090908&amp;amp;utm_campaign=SQL"&gt;Here&lt;/a&gt; is a good article on the need to develop formatting standards for T-SQL code within an organization.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-8660501065407579650?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/8660501065407579650/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2009/09/sql-formatting.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/8660501065407579650'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/8660501065407579650'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2009/09/sql-formatting.html' title='SQL Formatting'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-2756613788509471021</id><published>2009-09-03T17:23:00.002Z</published><updated>2009-09-03T17:27:07.597Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='TSQL Challenge'/><category scheme='http://www.blogger.com/atom/ns#' term='SQL'/><title type='text'>TSQL Challenge #12</title><content type='html'>Click &lt;a href="http://www.jessemclain.com/Downloads/Code/SQL/TSQL_Challenge/jesse_mclain_tsqlchallenge_12.sql.htm"&gt;here&lt;/a&gt; to view my solution for &lt;a href="http://beyondrelational.com/blogs/tc/archive/2009/08/09/tsql-challenge-12-build-sequential-ranges-of-dates-with-propagation-to-missing-values.aspx"&gt;TSQL Challenge #12&lt;/a&gt;, which involves identifying missing dates in a range, and propagating values for those missing dates.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-2756613788509471021?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/2756613788509471021/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2009/09/tsql-challenge-12.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/2756613788509471021'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/2756613788509471021'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2009/09/tsql-challenge-12.html' title='TSQL Challenge #12'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-6908516085237623822</id><published>2009-09-01T16:36:00.005Z</published><updated>2009-09-01T17:46:01.053Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='recursion'/><category scheme='http://www.blogger.com/atom/ns#' term='SQL'/><category scheme='http://www.blogger.com/atom/ns#' term='CTE'/><title type='text'>TSQL Challenge #13 - Set-based solution</title><content type='html'>&lt;div&gt;The set-based solution took me 3 or 4 times as long to develop as the &lt;a href="http://jessesql.blogspot.com/2009/08/tsql-challenge-13.html"&gt;cursor-based solution&lt;/a&gt; yesterday. I'm not sure if that's the nature of set-based development, my own learning curve, or both (like going from algebra to calculas). The set-based solution took a couple of interesting tricks to solve, which I will hold off publishing until after the deadline to the &lt;a href="http://beyondrelational.com/blogs/tc/archive/2009/08/31/tsql-challenge-13-break-a-batch-of-invoices-into-multiple-parts-based-on-rules.aspx"&gt;challenge&lt;/a&gt;.&lt;/div&gt;&lt;br /&gt;&lt;a href="http://4.bp.blogspot.com/_rPFttr3fnLI/Sp1dpdNLbfI/AAAAAAAAAEA/b6WCV8zP8f8/s1600-h/results.GIF"&gt;&lt;img id="BLOGGER_PHOTO_ID_5376556496946163186" style="WIDTH: 254px; CURSOR: hand; HEIGHT: 320px" alt="" src="http://4.bp.blogspot.com/_rPFttr3fnLI/Sp1dpdNLbfI/AAAAAAAAAEA/b6WCV8zP8f8/s320/results.GIF" border="0" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-6908516085237623822?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/6908516085237623822/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2009/09/tsql-challenge-13-set-based-solution.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/6908516085237623822'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/6908516085237623822'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2009/09/tsql-challenge-13-set-based-solution.html' title='TSQL Challenge #13 - Set-based solution'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_rPFttr3fnLI/Sp1dpdNLbfI/AAAAAAAAAEA/b6WCV8zP8f8/s72-c/results.GIF' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-53161939772595391</id><published>2009-08-31T15:52:00.006Z</published><updated>2009-09-01T17:46:47.139Z</updated><title type='text'>TSQL Challenge #13</title><content type='html'>Looking at &lt;a href="http://beyondrelational.com/blogs/tc/archive/2009/08/31/tsql-challenge-13-break-a-batch-of-invoices-into-multiple-parts-based-on-rules.aspx"&gt;TSQL Challenge #13&lt;/a&gt;, I created a grouping by batch and invoice number:&lt;br /&gt;&lt;br /&gt;&lt;a href="http://1.bp.blogspot.com/_rPFttr3fnLI/SpvysXM-5vI/AAAAAAAAAD4/bmyJ-kY1mPE/s1600-h/Cut1.GIF"&gt;&lt;img id="BLOGGER_PHOTO_ID_5376157424153454322" style="WIDTH: 264px; CURSOR: hand; HEIGHT: 205px" alt="" src="http://1.bp.blogspot.com/_rPFttr3fnLI/SpvysXM-5vI/AAAAAAAAAD4/bmyJ-kY1mPE/s320/Cut1.GIF" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;div&gt;&lt;/div&gt;The most straight-forward approach from here is to insert this into a temp table, open a cursor on it, run through the data and modify the "Set" column so that the batch/invoice numbers are appropriately grouped together. Then join the results of that temp table back to the original data set via batch/invoice number, so that the modified "Set" column is appended. This cursor-based solution is &lt;a href="http://www.jessemclain.com/Downloads/Code/SQL/TSQL_Challenge/No13/Cursor-based_solution.htm"&gt;here&lt;/a&gt; (the rules require a set-based solution, so I did the cursor-based one just as a baseline to compare to the set-based solution).&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-53161939772595391?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/53161939772595391/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2009/08/tsql-challenge-13.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/53161939772595391'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/53161939772595391'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2009/08/tsql-challenge-13.html' title='TSQL Challenge #13'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_rPFttr3fnLI/SpvysXM-5vI/AAAAAAAAAD4/bmyJ-kY1mPE/s72-c/Cut1.GIF' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-2018098271852445095</id><published>2009-08-01T02:30:00.002+01:00</published><updated>2009-08-01T02:34:52.861+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='SQL'/><category scheme='http://www.blogger.com/atom/ns#' term='Links'/><category scheme='http://www.blogger.com/atom/ns#' term='Articles'/><title type='text'>Interesting SQL Articles</title><content type='html'>&lt;a href="http://www.sqlservercentral.com/articles/SQL+Server+2008/65539/"&gt;SQL Server 2008 T-SQL Enhancements Part - I&lt;/a&gt;&lt;br /&gt;&lt;a href="http://www.sqlservercentral.com/articles/T-SQL/67666/"&gt;Celko's Summer SQL Stumpers: Prime numbers&lt;/a&gt;&lt;br /&gt;&lt;a href="http://www.sqlservercentral.com/articles/Automation/67547/"&gt;A Simple way of Automating Scripts&lt;/a&gt;&lt;br /&gt;&lt;a href="http://www.databasejournal.com/features/mssql/article.php/3829781/Attribute-Discretization--Customize-Grouping-Names.htm"&gt;Attribute Discretization: Customize Grouping Names&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-2018098271852445095?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/2018098271852445095/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2009/07/interesting-sql-articles.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/2018098271852445095'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/2018098271852445095'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2009/07/interesting-sql-articles.html' title='Interesting SQL Articles'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-4099242690582131935</id><published>2009-07-07T19:22:00.001+01:00</published><updated>2009-07-07T21:38:26.447+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='SQL'/><title type='text'>Avoid Logging When Populating a Table</title><content type='html'>I recently ran into a brick wall while trying to populate a skinny table with just over 130 million rows, in that the log filled up way before the table did (I was down to about 40gb free on my local machine). This is a scenario where I have multiple recursive CTEs preceeding a INSERT .. SELECT FROM used to create those rows. To get around this problem, I created a stored procedure that outputs the results of those CTEs as a straightforward SELECT, then I redirect that output to a text file via a BCP batch file. I then BCP that file back into my destination table, thereby bypassing the extraneous logging that, in this case, is just a waste of space and time.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-4099242690582131935?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/4099242690582131935/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2009/07/avoid-logging-when-populating-table.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/4099242690582131935'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/4099242690582131935'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2009/07/avoid-logging-when-populating-table.html' title='Avoid Logging When Populating a Table'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-3558691301061398586</id><published>2009-06-18T22:27:00.000+01:00</published><updated>2009-06-18T22:29:48.332+01:00</updated><title type='text'>Cool Use of CTEs to create number table</title><content type='html'>Many developers have needed to create a numbers table for matching data to ranges, etc., and this article shows various ways to do it, including multiple uses of CTEs: &lt;a href="http://www.projectdmx.com/tsql/tblnumbers.aspx"&gt;http://www.projectdmx.com/tsql/tblnumbers.aspx&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-3558691301061398586?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/3558691301061398586/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2009/06/cool-use-of-ctes-to-create-number-table.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/3558691301061398586'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/3558691301061398586'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2009/06/cool-use-of-ctes-to-create-number-table.html' title='Cool Use of CTEs to create number table'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-181219214082276877</id><published>2009-06-18T19:56:00.000+01:00</published><updated>2009-06-22T15:03:37.627+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='SQL'/><title type='text'>Poker DW: Stacking the Deck Part 2</title><content type='html'>&lt;a href="http://jessesql.blogspot.com/2009/06/poker-dw-stacking-deck.html"&gt;Earlier&lt;/a&gt; we showed how to construct a list of all possible 5 card poker hands, and verified the result in &lt;a href="http://en.wikipedia.org/wiki/Poker_probability"&gt;Wikipedia&lt;/a&gt;. What I want to create is a table of those hands, all 2,598,960 of them, with their high- and low-hand ranks also. I want to be able to analyze the odds of one hand winning over another with &lt;em&gt;n&lt;/em&gt; cards to come, which means that I'll need to create a temp copy of the deck, pull out the known cards from it, and make a hand with every card left and evaluate those hands against my table of all possible.&lt;br /&gt;&lt;br /&gt;Now how will I represent the cards in the table of poker hands? The most obvious way is to create a five column key consisting of the 5 CardId values that make that hand. Question: does the &lt;em&gt;order&lt;/em&gt; of those CardIds matter? Remember what we talked about &lt;a href="http://jessesql.blogspot.com/2009/06/poker-dw-stacking-deck.html"&gt;last entry&lt;/a&gt;, the order should not matter. But we have to put them in some sort of order - if we have five columns, say CardId1, CardId2, CardId3, CardId4, and CardId5, something is going to have to go somewhere. Let's say that we arbitrarily enter the CardIds into the columns in no particular order - how will we now query them? Let's make a trivial example of querying for two cards. Our WHERE clause of such a query would look like:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;WHERE CardId1 = @CardId1 AND CardId2 = @CardId2&lt;br /&gt;OR CardId1 = @CardId2 AND CardId2 = @CardId1&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;We have to match every permutation of variables to columns. With three cards:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;WHERE &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;CardId1 = @CardId1 AND CardId2 = @CardId2 AND CardId3 = @CardId3&lt;br /&gt;OR CardId1 = @CardId1 AND CardId2 = @CardId3 AND CardId3 = @CardId2 OR CardId1 = @CardId2 AND CardId2 = @CardId1 AND CardId3 = @CardId3 OR CardId1 = @CardId2 AND CardId2 = @CardId3 AND CardId3 = @CardId1 OR CardId1 = @CardId3 AND CardId2 = @CardId1 AND CardId3 = @CardId2 OR CardId1 = @CardId3 AND CardId2 = @CardId2 AND CardId3 = @CardId1&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Going back to our &lt;a href="http://en.wikipedia.org/wiki/Permutation"&gt;research on permutations&lt;/a&gt;, the number of permutations of &lt;em&gt;n&lt;/em&gt; elements is &lt;em&gt;n&lt;/em&gt;!, which is also equal to &lt;em&gt;n&lt;/em&gt;(&lt;em&gt;n&lt;/em&gt; + 1)/2. With five cards we're looking at a WHERE clause that is 5(5+1)/2 = 5*6/2 = 15 lines long. The coding for that isn't so bad (try not to make a mistake - you'll be matching 5 variable/column pairs per line for 15 lines, for a total of 75 equality checks), but think of how slowly that would perform! And that's just to evaluate one hand - imagine the gears grinding away to find all possible 5 card hands with two cards to come - if you're on the flop, and you want to evaluate your chances to the river, you have &lt;a href="http://en.wikipedia.org/wiki/Combinations"&gt;"47 choose 2"&lt;/a&gt; =&lt;br /&gt;1081 possible outcomes.&lt;br /&gt;&lt;br /&gt;What I came up with is a solution using prime numbers that I learned while studying &lt;a href="http://en.wikipedia.org/wiki/G%C3%B6del%27s_incompleteness_theorem"&gt;Gödel's incompleteness theorems&lt;/a&gt;. We assign every card in the deck a unique &lt;a href="http://en.wikipedia.org/wiki/List_of_prime_numbers"&gt;prime number&lt;/a&gt;; the first card gets 2, the second card 3, all the way up to the last card, which gets prime number 239. Now what happens if we want to look at a two-card hand and match it to a table of all possible two-card hands? If we multiply the prime numbers corresponding to those cards, we will get a number that is unique to those two cards (the primes of &lt;em&gt;any other two&lt;/em&gt; cards will result in a different number when multiplied). Obviously it doesn't matter which order the primes are multiplied, so we have just found the perfect primary key for our poker hands table. When we want to evaluate a hand, we multiply the primes corresponding to the cards and match the result to the primary key.&lt;br /&gt;&lt;br /&gt;We have an &lt;a href="http://www.jessemclain.com/downloads/code/sql/HandHistory/CreateDeck2.sql.html"&gt;updated Dim_Deck creation script&lt;/a&gt; that adds a "PrimeFactor" column to it. Now I'm working on a creating the table of all possible hands.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-181219214082276877?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/181219214082276877/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2009/06/poker-dw-stacking-deck-part-2.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/181219214082276877'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/181219214082276877'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2009/06/poker-dw-stacking-deck-part-2.html' title='Poker DW: Stacking the Deck Part 2'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-2444179809697653892</id><published>2009-06-18T16:25:00.001+01:00</published><updated>2009-10-05T19:40:29.752Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='probability'/><category scheme='http://www.blogger.com/atom/ns#' term='SQL'/><category scheme='http://www.blogger.com/atom/ns#' term='CTE'/><title type='text'>Poker DW: Stacking the Deck</title><content type='html'>Returning to the Poker Data Warehouse, I dove into the text-parsing process after setting up a sketch of an initial database design, with the idea of polishing out the design at a later date. I recently completed the parsing of a sample hand history, and man was it more work than I expected! Fortunately I love writing text-parsing routines (perhaps from early career work in merge-purge duplicate removal), so the 750 line stored procedure was more a labor of love than a tedious chore. Getting into the dirty details of the source data made me think more about the 30,000 ft overview also.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;I &lt;a href="http://www.jessemclain.com/downloads/code/sql/HandHistory/CREATE%20TABLE%20Dim_Deck.html"&gt;created a representation of a 52 card deck of cards&lt;/a&gt; in the Poker DW, and I started thinking about how to evaluate 5 card poker hands (i.e., determining what a player had at the end of the hand). What I really want is to be able to evaluate the odds of making the best hand on the next card or the river, which would ultimately allow me to judge whether a player made the right decision. This result would be similar to the "% to win" stats that you see on TV.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;After I created my deck of cards, I started playing around with representing a hand of 5 cards. How many possible 5 card hands are there? Easy - think of it like this. Take 1 card from the deck, there's 52 cards to choose from. Take another card, there's 51 to choose from. Keep picking until you have 5 cards in your hand, that leaves 52 * 51 * 50 * 49 * 48 = 311,875,200 possible 5 card hands.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;The problem with this method is that I'm picking &lt;a href="http://en.wikipedia.org/wiki/Permutations"&gt;permutations&lt;/a&gt; of 5 card hands, rather than &lt;a href="http://en.wikipedia.org/wiki/Combinations"&gt;combinations&lt;/a&gt;. Let's reduce my example above to picking two cards rather than five. According to that math, there are 52 * 51 = 2,652 possible two card hands. Using the card deck created above, this query will return that count, 2652 rows:&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;;WITH Draw1 AS (&lt;br /&gt;SELECT Card1 = CardId&lt;br /&gt;FROM Dim_Deck&lt;br /&gt;),&lt;br /&gt;Draw2 AS (&lt;br /&gt;SELECT&lt;br /&gt;Card1,&lt;br /&gt;Card2 = CardId&lt;br /&gt;FROM Dim_Deck D2&lt;br /&gt;JOIN Draw1 D1&lt;br /&gt;ON D1.Card1 &lt;&gt; D2.CardId&lt;br /&gt;)&lt;br /&gt;SELECT COUNT(*) FROM Draw2&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Note the use of the recursive CTE to create the second draw, &lt;span style="font-family:courier new;"&gt;Draw2&lt;/span&gt;. So let's say that I picked the five of clubs first, and the four of hearts second. That is one of the 2,652 possible events. But the reversal of that order is also one of the possible events (picking the four of hearts first, and the five of clubs second). But I really don't care which &lt;em&gt;order&lt;/em&gt; the two cards come in (the permutation), I only care about the set of cards that results.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Looking at an even simpler example of a deck of 5 cards, ace to five, how many ways are there to pick two? Here's a simple matrix:&lt;br /&gt;&lt;br /&gt;&lt;a href="http://4.bp.blogspot.com/_rPFttr3fnLI/Sjp_gbR6wlI/AAAAAAAAACs/1gCmSJR-ibw/s1600-h/Five_Card_Deck_Pick_Two.GIF"&gt;&lt;img id="BLOGGER_PHOTO_ID_5348727702511338066" style="WIDTH: 227px; CURSOR: hand; HEIGHT: 141px" alt="" src="http://4.bp.blogspot.com/_rPFttr3fnLI/Sjp_gbR6wlI/AAAAAAAAACs/1gCmSJR-ibw/s320/Five_Card_Deck_Pick_Two.GIF" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;The code above will pick everything except the diagonal that shows pairs:&lt;/p&gt;&lt;p&gt;&lt;br /&gt;&lt;a href="http://2.bp.blogspot.com/_rPFttr3fnLI/SjqANiYTTMI/AAAAAAAAAC0/V7vfjDQy_v4/s1600-h/Five_Card_Deck_Pick_Two_Ordered.GIF"&gt;&lt;img id="BLOGGER_PHOTO_ID_5348728477511273666" style="WIDTH: 227px; CURSOR: hand; HEIGHT: 141px" alt="" src="http://2.bp.blogspot.com/_rPFttr3fnLI/SjqANiYTTMI/AAAAAAAAAC0/V7vfjDQy_v4/s320/Five_Card_Deck_Pick_Two_Ordered.GIF" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;/p&gt;&lt;br /&gt;&lt;p&gt;but what we &lt;em&gt;really &lt;/em&gt;want is this:&lt;/p&gt;&lt;br /&gt;&lt;a href="http://2.bp.blogspot.com/_rPFttr3fnLI/SjqBxCzRItI/AAAAAAAAAC8/9HeaGjsuvv8/s1600-h/Five_Card_Deck_Pick_Two_Unordered.GIF"&gt;&lt;img id="BLOGGER_PHOTO_ID_5348730187021361874" style="WIDTH: 227px; CURSOR: hand; HEIGHT: 141px" alt="" src="http://2.bp.blogspot.com/_rPFttr3fnLI/SjqBxCzRItI/AAAAAAAAAC8/9HeaGjsuvv8/s320/Five_Card_Deck_Pick_Two_Unordered.GIF" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;p&gt;And in order to get it, we change the "&lt;&gt;" operators to "&gt;":&lt;/p&gt;&lt;p&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;;WITH Draw1 AS (&lt;br /&gt;SELECT Card1 = CardId&lt;br /&gt;FROM Dim_Deck&lt;br /&gt;),&lt;br /&gt;Draw2 AS (&lt;br /&gt;SELECT&lt;br /&gt;Card1,&lt;br /&gt;Card2 = CardId&lt;br /&gt;FROM Dim_Deck D2&lt;br /&gt;JOIN Draw1 D1&lt;br /&gt;ON D1.Card1 &gt; D2.CardId&lt;br /&gt;)&lt;br /&gt;SELECT COUNT(*) FROM Draw2&lt;/span&gt;&lt;/p&gt;&lt;p&gt;and we obtain the correct result, 1326 rows.&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-2444179809697653892?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/2444179809697653892/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2009/06/poker-dw-stacking-deck.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/2444179809697653892'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/2444179809697653892'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2009/06/poker-dw-stacking-deck.html' title='Poker DW: Stacking the Deck'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_rPFttr3fnLI/Sjp_gbR6wlI/AAAAAAAAACs/1gCmSJR-ibw/s72-c/Five_Card_Deck_Pick_Two.GIF' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-7958556815093130467</id><published>2009-06-15T21:54:00.000+01:00</published><updated>2009-06-15T21:58:43.257+01:00</updated><title type='text'>StackOverflow</title><content type='html'>I started &lt;a href="http://stackoverflow.com/users/122073/jesse"&gt;answering questions on StackOverflow&lt;/a&gt;, and came across a couple of interesting sites. The first is &lt;a href="http://vyaskn.tripod.com/code.htm"&gt;Vyas code page&lt;/a&gt;, a small library of useful SQL code. The other is an &lt;a href="http://www.ssmstoolspack.com/Main.aspx"&gt;add-on toolkit for SSMS&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-7958556815093130467?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/7958556815093130467/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2009/06/stackoverflow.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/7958556815093130467'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/7958556815093130467'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2009/06/stackoverflow.html' title='StackOverflow'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-3259301789981303097</id><published>2009-06-12T17:18:00.001+01:00</published><updated>2009-06-12T17:19:08.407+01:00</updated><title type='text'>T-SQL to Export Table Structure to a script</title><content type='html'>Good script for scripting out SQL tables via T-SQL rather than Enterprise Manager/Dev Studio: &lt;a href="http://www.sqlservercentral.com/scripts/Miscellaneous/30730/"&gt;http://www.sqlservercentral.com/scripts/Miscellaneous/30730/&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-3259301789981303097?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/3259301789981303097/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2009/06/t-sql-to-export-table-structure-to.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/3259301789981303097'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/3259301789981303097'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2009/06/t-sql-to-export-table-structure-to.html' title='T-SQL to Export Table Structure to a script'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-1079251044519006236</id><published>2009-05-27T19:11:00.000Z</published><updated>2009-05-27T19:16:15.499Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='CRM'/><category scheme='http://www.blogger.com/atom/ns#' term='SQL'/><title type='text'>CRM Import - Importing Into Drop-Down Combobox Targets</title><content type='html'>During the initial try to load accounts into MS-CRM 4.0, my source file failed for columns that had drop-down combobox columns as targets. A little research lead to this &lt;a href="http://social.microsoft.com/Forums/en-US/crm/thread/aec7fa52-0c80-440f-ad06-4fde609faf46"&gt;forum post&lt;/a&gt;, which advises using the GUIDs of the target system. I was a little mislead by this post, since it actually applied to lookups, not drop-down combobox values. The solution I needed involved converting the source values to the AttributeValue equivalents in the StringMap table of the MS-CRM database.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-1079251044519006236?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/1079251044519006236/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2009/05/crm-import-importing-into-drop-down.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/1079251044519006236'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/1079251044519006236'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2009/05/crm-import-importing-into-drop-down.html' title='CRM Import - Importing Into Drop-Down Combobox Targets'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-3748345929692920225</id><published>2009-05-21T19:00:00.001Z</published><updated>2009-05-21T19:05:25.987Z</updated><title type='text'>BCP out Temp Tables</title><content type='html'>Hit a little snag today trying to output, via BCP, the results of a stored procedure that created and dropped a local temp table. The spd would run fine in Query Analyser, but when run from the DOS prompt I got the error message that the temp table didn't exist. Perhaps the problem is caused by BCP compiling and running the spd in different threads? Anyways, the results of a little googling provide the workaround of keeping the table in TempDb rather than creating as a real temp table. That works as long as the table is not created and dropped within the spd that BCP calls.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-3748345929692920225?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/3748345929692920225/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2009/05/bcp-out-temp-tables.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/3748345929692920225'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/3748345929692920225'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2009/05/bcp-out-temp-tables.html' title='BCP out Temp Tables'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-7436338031942287605</id><published>2009-05-18T19:25:00.001Z</published><updated>2009-05-18T19:27:02.587Z</updated><title type='text'>Case Study: Poker DW: Reporting Questions</title><content type='html'>After reviewing the entities and their relationships, we next want to look at some of the questions we might want to answer from the data in our DW. Some possible questions:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Which players will call a big checkraise on the flop with an overpair?&lt;/li&gt;&lt;li&gt;What is the actual expected outcome of reraising on the button with a suited connector and bluffing the flop?&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;Next: ???&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-7436338031942287605?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/7436338031942287605/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2009/05/case-study-poker-dw-reporting-questions.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/7436338031942287605'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/7436338031942287605'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2009/05/case-study-poker-dw-reporting-questions.html' title='Case Study: Poker DW: Reporting Questions'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-6821489075217867891</id><published>2009-05-18T13:53:00.000Z</published><updated>2009-05-18T19:15:11.797Z</updated><title type='text'>Case Study: Poker DW: Entities</title><content type='html'>Continuing from the &lt;a href="http://jessesql.blogspot.com/2009/05/case-study-data-warehouse-for-poker.html"&gt;introduction&lt;/a&gt;, one of the first things we'll have to do in designing our Poker DW (Data Warehouse), is to identify all of the entities of interest. Taking a look &lt;a href="http://www.jessemclain.com/Downloads/Code/SQL/HandHistory/SampleHandHistory.txt"&gt;at our sample hand history file&lt;/a&gt; (ignore the last two lines of HTML code, the hosting company stamped them when I uploaded the file), the first line starts with a sequence number, the format of the game and the datetime stamp of when a particular hand took place. The second line indicates the name of the table, whether it is real or play money, and which seat is the dealer. The next ten or so lines tell us who is in which seat, and what their stack size is (at the beginning of the hand), followed by a line for each player who posts a blind (small, big, and other). So up to now, we've seen such entities as Time, Table, Player, Seat, and Money (stack and pot size).&lt;br /&gt;&lt;br /&gt;The next section begins with the header line "&lt;span style="font-family:courier new;"&gt;*** POCKET CARDS ***&lt;/span&gt;". Here we have such information as the hole cards dealt to the player, and all of the preflop action (fold, check, call, or raise). We can identify three more entities here: Betting Stage, Cards and Actions. The next section, "&lt;span style="font-family:courier new;"&gt;*** FLOP *** [10s 9d 3h]&lt;/span&gt;", contains the same entities, but this time we have community cards. At each step in these sections, we can calculate the pot size and stack sizes for each player. Two more sections, "Turn" and "River", provide similar info.&lt;br /&gt;&lt;br /&gt;Special consideration should be given to the next section, "&lt;span style="font-family:courier new;"&gt;*** SHOW DOWN ***&lt;/span&gt;", as it will show us exactly what cards other players held during the hand, allowing us to "backfill" that info for earlier rounds of betting. This will help us answer some important questions in the hand histories. The final section, "&lt;span style="font-family:courier new;"&gt;*** SUMMARY ***&lt;/span&gt;", provides info such as the rake, the Hi hand (and Low if this is a hi/lo split game), and the final pot size (which we can use to verify our "running" pot size throughout the hand).&lt;br /&gt;&lt;br /&gt;So let's summarize our entities and their relationships. Central to this is Hands. Hands occur at certain Times at a particular Table, which have Seats. Players make Actions with Money based on Cards appearing at a Betting Stage.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-6821489075217867891?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/6821489075217867891/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2009/05/case-study-poker-dw-entities.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/6821489075217867891'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/6821489075217867891'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2009/05/case-study-poker-dw-entities.html' title='Case Study: Poker DW: Entities'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-114202951798956504</id><published>2009-05-15T15:20:00.000Z</published><updated>2009-05-18T18:42:53.468Z</updated><title type='text'>Case Study: Data Warehouse for Poker (Intro)</title><content type='html'>A friend of mine plays a good deal of online poker, and wants to improve his game by studying the hands he has played. I suggested creating a data warehouse from the hand histories held in the text files that the app saves, and using that data to identify winning and losing trends in his game. This entry will serve as the first in a series of the steps we will take to develop this.&lt;br /&gt;&lt;br /&gt;Guide to Entries:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;a href="http://jessesql.blogspot.com/2009/05/case-study-data-warehouse-for-poker.html"&gt;Intro&lt;/a&gt; (this entry)&lt;/li&gt;&lt;li&gt;&lt;a href="http://jessesql.blogspot.com/2009/05/case-study-poker-dw-entities.html"&gt;Identifying Entities&lt;/a&gt;&lt;/li&gt;&lt;li&gt;Reporting Questions&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-114202951798956504?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/114202951798956504/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2009/05/case-study-data-warehouse-for-poker.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/114202951798956504'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/114202951798956504'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2009/05/case-study-data-warehouse-for-poker.html' title='Case Study: Data Warehouse for Poker (Intro)'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-9124540781281866563</id><published>2009-05-14T14:03:00.000Z</published><updated>2009-05-14T16:11:23.024Z</updated><title type='text'>Grouping Datetimes to Identify Sessions</title><content type='html'>Let's say that I have a record of events that are unique by a datetime "timestamp" marker, and that these events can be grouped by sessions, so that every event that occurs within a certain period of time of the preceeding and/or subsequent events to it are considered as part of the same session. For example, let's say that we are examining cars driving by a &lt;a href="http://en.wikipedia.org/wiki/Traffic_counter"&gt;traffic counter&lt;/a&gt; where each car passing is recorded as an event, and we want to organize the events by sessions so that any two cars passing within 5 seconds of one another constitutes a session (the events will "chain" together so that if four cars pass within 2 seconds of one another, but the first and last cars are within 10 seconds, all four car events are a part of the same session).&lt;br /&gt;&lt;br /&gt;Now, if I only have the events, how do I create sessions around them?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-9124540781281866563?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/9124540781281866563/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2009/05/grouping-datetimes-to-identify-sessions.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/9124540781281866563'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/9124540781281866563'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2009/05/grouping-datetimes-to-identify-sessions.html' title='Grouping Datetimes to Identify Sessions'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-7366118932703442751</id><published>2009-05-06T13:55:00.000Z</published><updated>2009-05-06T13:57:52.105Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Median'/><category scheme='http://www.blogger.com/atom/ns#' term='Statistics'/><category scheme='http://www.blogger.com/atom/ns#' term='SQL'/><title type='text'>Another Version for Calculating Median</title><content type='html'>Joe Celko published a "history" of calculating the median in SQL, along with a final version that seems to work similarly to mine: &lt;a href="http://www.simple-talk.com/sql/t-sql-programming/median-workbench/"&gt;http://www.simple-talk.com/sql/t-sql-programming/median-workbench/&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-7366118932703442751?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/7366118932703442751/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2009/05/another-version-for-calculating-median.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/7366118932703442751'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/7366118932703442751'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2009/05/another-version-for-calculating-median.html' title='Another Version for Calculating Median'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-3184378095380859887</id><published>2009-05-05T20:06:00.001Z</published><updated>2009-05-07T01:33:12.800Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Metadata'/><category scheme='http://www.blogger.com/atom/ns#' term='SQL'/><title type='text'>Querying Sys.Columns &amp; Sys.Types</title><content type='html'>If you want to &lt;a href="http://msdn.microsoft.com/en-us/library/ms345522.aspx#_FAQ7"&gt;query the structure of a table&lt;/a&gt; that includes column names and data types, you have to perform a join between catalog views Sys.Columns and Sys.Types. There are some caveats to this. If any of your columns are defined as nvarchar or user-defined data types, you must qualify the data coming from Sys.Types. When you add a user-defined data type, it is entered into Sys.Types with a reference to it's native data type. In the example of the AdventureWorks database, the data type "AccountNumber" is defined as nvarchar, and shows up in Sys.Types with system_type_id = 231 (which points to "nvarchar", where system_type_id = 231 and user_type_id = 231).&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Running the query:&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;SELECT&lt;br /&gt;Tb.Name,&lt;br /&gt;C.Name,&lt;br /&gt;Tp.Name&lt;br /&gt;FROM Sys.Tables Tb&lt;br /&gt;JOIN Sys.Schemas Sch&lt;br /&gt;ON Sch.Schema_Id = Tb.Schema_Id&lt;br /&gt;JOIN Sys.Columns C&lt;br /&gt;ON C.Object_Id = Tb.Object_Id&lt;br /&gt;JOIN Sys.Types Tp&lt;br /&gt;ON Tp.System_Type_Id = C.System_Type_Id&lt;br /&gt;WHERE Tb.Name = 'Address'&lt;br /&gt;ORDER BY Tb.Name, C.Name, Tp.Name&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;produces these results:&lt;br /&gt;&lt;br /&gt;&lt;a href="http://3.bp.blogspot.com/_rPFttr3fnLI/SgGuwnoBgMI/AAAAAAAAACc/duENrt74JUs/s1600-h/20090506+-+sys-types.GIF"&gt;&lt;img id="BLOGGER_PHOTO_ID_5332735584077185218" style="WIDTH: 163px; CURSOR: hand; HEIGHT: 320px" alt="" src="http://3.bp.blogspot.com/_rPFttr3fnLI/SgGuwnoBgMI/AAAAAAAAACc/duENrt74JUs/s320/20090506+-+sys-types.GIF" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Weird, huh? Why did 'AddressLine1' show up six times with six different data types? The reason is two-fold. First, 'AddressLine1' is defined as nvarchar(60), which means that it will also show up as "sysname" datatype (think of "sysname" as MicroSoft's built-in user-defined data type).&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Take a look at the results of the query below. It shows that, including itself, six different data types are based on nvarchar! That's why 'AddressLine1' showed up six times in the query above.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;SELECT Name FROM Sys.Types Tp&lt;br /&gt;WHERE System_Type_Id = 231&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;&lt;span style="font-family:courier new;"&gt;Name&lt;br /&gt;-------------------&lt;br /&gt;nvarchar&lt;br /&gt;sysname&lt;br /&gt;AccountNumber&lt;br /&gt;Name&lt;br /&gt;OrderNumber&lt;br /&gt;Phone&lt;br /&gt;(6 row(s) affected)&lt;/span&gt;&lt;/p&gt;&lt;br /&gt;&lt;p&gt;So let's change our query to use this 'User_Type_Id' column instead:&lt;/p&gt;&lt;p&gt;&lt;span style="font-family:courier new;"&gt;SELECT&lt;br /&gt;Tb.Name,&lt;br /&gt;C.Name,&lt;br /&gt;Tp.Name&lt;br /&gt;FROM Sys.Tables Tb&lt;br /&gt;JOIN Sys.Schemas Sch&lt;br /&gt;ON Sch.Schema_Id = Tb.Schema_Id&lt;br /&gt;JOIN Sys.Columns C&lt;br /&gt;ON C.Object_Id = Tb.Object_Id&lt;br /&gt;JOIN Sys.Types Tp&lt;br /&gt;ON Tp.User_Type_Id = C.System_Type_Id&lt;br /&gt;WHERE Tb.Name = 'Address'&lt;br /&gt;ORDER BY Tb.Name, C.Name, Tp.Name&lt;/span&gt;&lt;/p&gt;This produces the results we want:&lt;br /&gt;&lt;p&gt;&lt;/p&gt;&lt;br /&gt;&lt;a href="http://2.bp.blogspot.com/_rPFttr3fnLI/SgGxR_KDPnI/AAAAAAAAACk/WbIo1o0XO8g/s1600-h/20090506+-+sys-types2.GIF"&gt;&lt;img id="BLOGGER_PHOTO_ID_5332738356352859762" style="WIDTH: 282px; CURSOR: hand; HEIGHT: 205px" alt="" src="http://2.bp.blogspot.com/_rPFttr3fnLI/SgGxR_KDPnI/AAAAAAAAACk/WbIo1o0XO8g/s320/20090506+-+sys-types2.GIF" border="0" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-3184378095380859887?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/3184378095380859887/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2009/05/querying-syscolumns-systypes.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/3184378095380859887'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/3184378095380859887'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2009/05/querying-syscolumns-systypes.html' title='Querying Sys.Columns &amp; Sys.Types'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_rPFttr3fnLI/SgGuwnoBgMI/AAAAAAAAACc/duENrt74JUs/s72-c/20090506+-+sys-types.GIF' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-1294370396621224432</id><published>2009-04-21T15:52:00.000Z</published><updated>2009-05-06T14:40:18.945Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Data Patterns'/><category scheme='http://www.blogger.com/atom/ns#' term='SQL'/><title type='text'>Data Patterns and the LIKE Clause</title><content type='html'>The &lt;a href="http://msdn.microsoft.com/en-us/library/ms179859.aspx"&gt;"LIKE" clause of the SQL SELECT&lt;/a&gt; statement is one of the more interesting features of SQL's character processing. Most everyone is familiar with the "%" wildcard, which allows queries such as &lt;span style="font-family:courier new;"&gt;"SELECT LastName FROM Customers WHERE LastName LIKE 'Mc%'"&lt;/span&gt;. This returns all the customer last names beginning with "Mc" (such as yours truly). But I suspect that many developers are unaware of some of the deeper uses of data pattern expressions.&lt;br /&gt;&lt;br /&gt;The list of other wildcard characters related to LIKE includes "_", "[", "-", "]", and "^". The first, "_", is the 'any single character' expression. The "[]" characters act as a single character wildcard, but allow us to specify which characters will match. The WHERE clause above is equivalent to &lt;span style="font-family:courier new;"&gt;"WHERE LastName LIKE '[M][c]%'"&lt;/span&gt;. When multiple characters reside within the brackets, the filter acts like an "or" expression. So changing the filter to &lt;span style="font-family:courier new;"&gt;"WHERE LastName LIKE '[M][c][aeiou]%'" &lt;/span&gt;would produce last names beginning with "Mc", then followed by a vowel, then any terminating string.&lt;br /&gt;&lt;br /&gt;If you use the "-" with the brackets, you can specify ranges of characters (ranges defined by ASCII order). For example, let's say we want to search for user names that begin with 'jmclain' and are then followed by a single digit number. We would execute "&lt;span style="font-family:courier new;"&gt;SELECT * FROM Users WHERE UserName LIKE 'jmclain[0-9&lt;/span&gt;]'".&lt;br /&gt;&lt;br /&gt;Where it gets complicated is when you want to search a column for wildcard literals. For example, let's say that you have a column called 'SalesDescription', and you want to count the rows where the SalesDescription column contains the string "50% off". If you were to execute "&lt;span style="font-family:courier new;"&gt;SELECT COUNT(*) FROM Sales WHERE SalesDescription LIKE '50% off'&lt;/span&gt;", you would mistakenly pull in rows with SalesDescription values such as '50 cents off', since the "%" wildcard represents "any string". To correct this, you have two options. The simplest is to enclose the "%" wildcard with brackets, so that the filter changes to "&lt;span style="font-family:courier new;"&gt;WHERE SalesDescription LIKE '50[%] off'&lt;/span&gt;".&lt;br /&gt;&lt;br /&gt;The second option is to make use of the ESCAPE clause of the LIKE operator. What this method lacks in simplicity, it make up in robustness (and isn't really that complicated anyways). To solve the above problem suchwise, the filter changes to "&lt;span style="font-family:courier new;"&gt;WHERE SalesDescription LIKE '50!% off' ESCAPE '!'&lt;/span&gt;". I prefer the first method above because 1. it is simpler, and 2. in order to use the ESCAPE clause, you must be certain that your target expression doesn't contain the escape character. So if a given SalesDescription value in the table was, unbeknowst to you, something like '50% off!!!', the results start to become unreliable. Best practices for using ESCAPE stipulate first starting with uncommon characters such as "~" or "", and then querying your column to make sure they are not present.&lt;br /&gt;&lt;br /&gt;The best use of ESCAPE is when you want to find brackets in your target. Let's say that you wanted to find the SalesDescription value "[50% off]". After checking to ensure that the column values don't contain the tilde ("~") character, you would use the filter "&lt;span style="font-family:courier new;"&gt;WHERE SalesDescription LIKE '~[50~% off~]' ESCAPE '~'&lt;/span&gt;".&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-1294370396621224432?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/1294370396621224432/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2009/04/data-patterns-and-like-clause.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/1294370396621224432'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/1294370396621224432'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2009/04/data-patterns-and-like-clause.html' title='Data Patterns and the LIKE Clause'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-4469099115433779287</id><published>2009-04-17T15:21:00.001Z</published><updated>2009-04-23T14:02:34.812Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='SQL'/><title type='text'>Converting Datetime Values to Varchar</title><content type='html'>Back when I used to code in Foxpro, I had to write custom code to convert date/time values to various formats. T-SQL provides for a great number of formats using the CONVERT function. An &lt;a href="http://www.mssqltips.com/tip.asp?tip=1145"&gt;article on MSSQLTips&lt;/a&gt; lists many (but not all) of these formats. This code (which can easily be wrapped into a stored procedure), will list out all valid format codes and an example of how they will appear:&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;SET NOCOUNT ON&lt;br /&gt;CREATE TABLE #Fmts (FmtNo tinyint, Example varchar(max))&lt;br /&gt;DECLARE @fmt int; SET @fmt = 0&lt;br /&gt;DECLARE @dt datetime; SET @dt = GETDATE()&lt;br /&gt;WHILE @fmt &lt; 132&lt;br /&gt;BEGIN&lt;br /&gt;BEGIN TRY&lt;br /&gt;INSERT INTO #Fmts (FmtNo, Example)&lt;br /&gt;VALUES (@fmt, CONVERT(varchar, @dt, @fmt))&lt;br /&gt;END TRY&lt;br /&gt;BEGIN CATCH&lt;br /&gt;PRINT '@fmt = ' + LTRIM(STR(@fmt)) + ' is not valid.'&lt;br /&gt;END CATCH&lt;br /&gt;SET @fmt = @fmt + 1&lt;br /&gt;END&lt;br /&gt;SELECT FmtNo, Example = LEFT(Example, 30) FROM #Fmts&lt;br /&gt;DROP TABLE #Fmts&lt;br /&gt;SET NOCOUNT OFF &lt;div&gt; &lt;/div&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;&lt;br /&gt;&lt;br /&gt;And sample output:&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://1.bp.blogspot.com/_rPFttr3fnLI/SfB0pMvm1vI/AAAAAAAAAB0/v-4OmmAJXdg/s1600-h/Convert+Datetime+to+Varchar.sql.gif"&gt;&lt;img id="BLOGGER_PHOTO_ID_5327886610323658482" style="WIDTH: 211px; CURSOR: hand; HEIGHT: 320px" alt="" src="http://1.bp.blogspot.com/_rPFttr3fnLI/SfB0pMvm1vI/AAAAAAAAAB0/v-4OmmAJXdg/s320/Convert+Datetime+to+Varchar.sql.gif" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-4469099115433779287?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/4469099115433779287/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2009/04/converting-datetime-values-to-varchar.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/4469099115433779287'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/4469099115433779287'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2009/04/converting-datetime-values-to-varchar.html' title='Converting Datetime Values to Varchar'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_rPFttr3fnLI/SfB0pMvm1vI/AAAAAAAAAB0/v-4OmmAJXdg/s72-c/Convert+Datetime+to+Varchar.sql.gif' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-7487878115771522199</id><published>2009-04-15T15:09:00.001Z</published><updated>2009-04-20T14:08:45.252Z</updated><title type='text'>Question of the Day</title><content type='html'>I stumped the gurus on &lt;a href="http://www.sqlservercentral.com/"&gt;SQLServerCentral.com&lt;/a&gt; with another challenging &lt;a href="http://www.sqlservercentral.com/questions/TSQL/66116/"&gt;Question of the Day&lt;/a&gt; on April 6th, 2009:&lt;br /&gt;&lt;br /&gt;&lt;p&gt;Given this code,&lt;/p&gt;&lt;pre class="code"&gt;DECLARE @val int;&lt;br /&gt;SET @val = -1&lt;br /&gt;CREATE TABLE #empty (val int)&lt;br /&gt;&lt;/pre&gt;&lt;p&gt;which statement(s) will result in @val being NULL? (select all that apply)&lt;/p&gt;&lt;ol&gt;&lt;li&gt;SET @val = NULL&lt;/li&gt;&lt;li&gt;SELECT @val = NULL FROM #empty&lt;/li&gt;&lt;li&gt;SELECT @val = val FROM #empty&lt;/li&gt;&lt;li&gt;SELECT @val = (SELECT val FROM #empty)&lt;/li&gt;&lt;/ol&gt;As of today, only about 30% of respondents answered correctly, and judging from the &lt;a href="http://http//www.sqlservercentral.com/Forums/Topic690594-1181-1.aspx"&gt;comments in the discussion section&lt;/a&gt;, a lot of them gained a deeper understanding of the concept of null in SQL. I try to make my questions (and answers), tricky enough so as to not be obvious, but my goal isn't to trick people with arcane technicalities - I want to make them aware of certain subtleties of the database engine. This question arose after some unexpected results made me delve into a bit of code, and tested scenarios just like those in the question.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-7487878115771522199?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/7487878115771522199/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2009/04/question-of-day.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/7487878115771522199'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/7487878115771522199'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2009/04/question-of-day.html' title='Question of the Day'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-2552075107849966516</id><published>2009-04-13T21:42:00.001Z</published><updated>2009-04-17T15:19:47.931Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='SQL'/><title type='text'>Collation Sequences</title><content type='html'>Being a database developer rather than a DBA, I rarely deal with collation types, but I came across a situation recently where I had to dig into the issue. My objective was to produce a breakdown of how many rows contained each ASCII character. I considered two approaches: slice the values into one-character chunks, or loop through all 256 ASCII values and count the number of rows containing each character. The former approach has the advantage of not only counting the rows but the frequency of characters (e.g., "100 rows contain 250 instances of the character 'A'"), but I opted for the second approach since it intuitively seemed faster. If your database was created with case-insensitive collation (such as "SQL_Latin1_General_CP1_CI_AS"), checking for the characters 'A', would pull in values of 'a':&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;USE NorthWind&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;GO&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;SELECT DISTINCT City&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;FROM dbo.Customers&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;WHERE CHARINDEX('A', City) &gt; 0&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_rPFttr3fnLI/SeiSqHSifjI/AAAAAAAAABk/4FG7vb64cgs/s1600-h/Northwind+Customer+City.GIF"&gt;&lt;img style="cursor: pointer; width: 177px; height: 320px;" src="http://1.bp.blogspot.com/_rPFttr3fnLI/SeiSqHSifjI/AAAAAAAAABk/4FG7vb64cgs/s320/Northwind+Customer+City.GIF" alt="" id="BLOGGER_PHOTO_ID_5325667811574578738" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;To fix, simply add the "COLLATE" clause to the query:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;SELECT DISTINCT City&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;FROM dbo.Customers&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;WHERE CHARINDEX('A' COLLATE Latin1_General_BIN, City) &gt; 0&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_rPFttr3fnLI/SeiTV8XOoTI/AAAAAAAAABs/9LUcmTathNg/s1600-h/Northwind+Customer+City2.GIF"&gt;&lt;img style="cursor: pointer; width: 153px; height: 138px;" src="http://4.bp.blogspot.com/_rPFttr3fnLI/SeiTV8XOoTI/AAAAAAAAABs/9LUcmTathNg/s320/Northwind+Customer+City2.GIF" alt="" id="BLOGGER_PHOTO_ID_5325668564555702578" border="0" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-2552075107849966516?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/2552075107849966516/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2009/04/collation-sequences.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/2552075107849966516'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/2552075107849966516'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2009/04/collation-sequences.html' title='Collation Sequences'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_rPFttr3fnLI/SeiSqHSifjI/AAAAAAAAABk/4FG7vb64cgs/s72-c/Northwind+Customer+City.GIF' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-7971144397151421620</id><published>2009-04-03T20:04:00.001Z</published><updated>2009-04-03T20:15:44.931Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Statistics'/><category scheme='http://www.blogger.com/atom/ns#' term='SQL'/><title type='text'>"Average" Date</title><content type='html'>Have you ever tried to calculate the average of a datetime column, and gotten this error:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;"Msg 8117, Level 16, State 1, Line 1&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;Operand data type datetime is invalid for avg operator."&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;When you first think about it, the error makes sense - are you trying to determine the average month, year, hour, or second? But shouldn't there be such a thing as an "average" date? If we have a bunch of sales orders in a given month, doesn't the "average" date of sale actually mean something?&lt;br /&gt;&lt;br /&gt;What if I calculated the MIN value, then calc'd the DATEDIFF between the MIN and all other values? At that point I'd essentially have an integer value, which of course I could average, and then derive the "average" date:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;;WITH &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;CvrtToDate AS (&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;    SELECT &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;        /* "DataValue" is assumed to be varchar(max) */&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;        DataValue = CONVERT(datetime, DataValue)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;    FROM DataSet&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;    WHERE ISDATE(DataValue) = 1&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;,MinAndMax AS (&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;    SELECT&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;        ValueMin = MIN(DataValue)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;        ,ValueMax = MAX(DataValue)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;    FROM CvrtToDate&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;,DateDiffs AS (&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;    SELECT&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;        DaysFromMin = DATEDIFF(d, MinAndMax.ValueMin, DataValue)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;    FROM CvrtToDate, MinAndMax&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;,AvgDaysFromMin AS (&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;    SELECT DataValue = AVG(DaysFromMin)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;    FROM DateDiffs&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;SELECT&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;    AvgDate = DATEADD(d, AvgDaysFromMin.DataValue, MinAndMax.ValueMin)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;FROM MinAndMax, AvgDaysFromMin&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;This query bears a result that makes sense - we have a date that is between the oldest, and most recent, that is somewhere near the midway point.&lt;br /&gt;&lt;br /&gt;A little Google research bears fruit for a much simpler calculation. From "&lt;a href="http://www.bennadel.com/blog/175-Ask-Ben-Averaging-Date-Time-Stamps-In-SQL.htm"&gt;Ask Ben: Averaging Date/Time Stamps In SQL&lt;/a&gt;": "The secret to date/time averaging is that date/time stamps can be represented as a floating point number. I have covered this a number of times on this blog so I won't go into too much detail, but the idea is that as a floating point number, the integer part represents the number of days since the beginning of time (as the SQL server defines it) and the decimal part represents the time or rather, the fraction of days. SQL does not make this conversion for you; you have to CAST the date/time stamp as a FLOAT value."&lt;br /&gt;&lt;br /&gt;This leads to the revised calculation:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;SELECT&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;    ValueMin = MIN(CONVERT(datetime, DataValue))&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;    ,ValueMax = MAX(CONVERT(datetime, DataValue))&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;    ,ValueAvg = CONVERT(datetime, AVG(CONVERT(float,&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;     CONVERT(datetime, DataValue))))&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;FROM DataSet&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;WHERE ISDATE(DataValue) = 1&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Not only is this calc &lt;span style="font-style: italic;"&gt;far simpler&lt;/span&gt;, but it is slightly more precise, as it includes time-of-day in the result.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-7971144397151421620?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/7971144397151421620/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2009/04/average-date.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/7971144397151421620'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/7971144397151421620'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2009/04/average-date.html' title='&quot;Average&quot; Date'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-3791462288641943799</id><published>2009-04-03T18:23:00.000Z</published><updated>2009-04-14T01:50:38.577Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Statistics'/><category scheme='http://www.blogger.com/atom/ns#' term='SQL'/><title type='text'>Calculating Median</title><content type='html'>I found an article on calculating the median in SQL (&lt;a href="http://www.sqlservercentral.com/scripts/Miscellaneous/31775/"&gt;http://www.sqlservercentral.com/scripts/Miscellaneous/31775/&lt;/a&gt;), and after reading a &lt;a href="http://en.wikipedia.org/wiki/Median"&gt;Wikipedia article&lt;/a&gt; on it, I realized that it was incorrect for sample sets of even size. I left a &lt;a href="http://www.sqlservercentral.com/Forums/Topic404673-687-1.aspx?Update=1"&gt;comment&lt;/a&gt; with a version of the calc that accounts for this, with a copy of this code:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;;WITH &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;TopHalf AS (&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;    SELECT TOP 50 PERCENT DataValue&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;    FROM DataSet&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;    ORDER BY DataValue ASC&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;,BottomHalf AS (&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;    SELECT TOP 50 PERCENT DataValue&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;    FROM DataSet&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;    ORDER BY DataValue DESC&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;,BottomOfTopHalf AS (&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;    SELECT TOP 1 DataValue&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;    FROM TopHalf&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;    ORDER BY DataValue DESC&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;,TopOfBottomHalf AS (&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;    SELECT TOP 1 DataValue&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;    FROM BottomHalf&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;    ORDER BY DataValue ASC&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;SELECT &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;    Median = (BottomOfTopHalf.DataValue&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;     + TopOfBottomHalf.DataValue) / 2.0&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;FROM BottomOfTopHalf, TopOfBottomHalf&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-3791462288641943799?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/3791462288641943799/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2009/04/calculating-median.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/3791462288641943799'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/3791462288641943799'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2009/04/calculating-median.html' title='Calculating Median'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-5191727132386289</id><published>2009-04-01T04:46:00.000Z</published><updated>2009-04-23T13:47:49.393Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='SQL'/><title type='text'>Function to Insert Commas Into Number String</title><content type='html'>While looking into some data quality issues, I considered the case of source data arriving as a string of numbers with embedded commas (for example, "1,526,734.56"), and as I didn't have any such data to test with, I decided to create some, and also created this function in the process (which can be used to create some): &lt;a href="http://www.sqlservercentral.com/scripts/TSQL/66428/"&gt;fn_AddCommasToNumberString&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;(Followup on 4/23/09): A &lt;a href="http://www.sqlservercentral.com/Forums/Topic701564-338-1.aspx"&gt;forum post on SQLServerCentral.com&lt;/a&gt; explained a very easy way to do this using varchar conversion:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;declare @test float&lt;br /&gt;set @test = 7265342.12&lt;br /&gt;select @test, convert(varchar(20),cast(@test as money),1)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-5191727132386289?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/5191727132386289/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2009/03/function-to-insert-commas-into-number.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/5191727132386289'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/5191727132386289'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2009/03/function-to-insert-commas-into-number.html' title='Function to Insert Commas Into Number String'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-6486339782234153104</id><published>2009-03-25T19:28:00.000Z</published><updated>2009-05-20T16:36:19.055Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Data Profiler'/><category scheme='http://www.blogger.com/atom/ns#' term='SQL'/><title type='text'>SQL Server Data Profiler</title><content type='html'>With the newest edition of SQL Server, 2008, comes a tool that serves as a good first-cut at providing something for data profiling. Here are some good articles on it:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;A good&lt;a href="http://www.sqlservercentral.com/articles/Integration+Services/64133/"&gt; intro to the Data Profiler tool&lt;/a&gt; &lt;/li&gt;&lt;li&gt;A more &lt;a href="http://www.simple-talk.com/sql/learn-sql-server/sql-server-2008--ssis-data-profiling-task/"&gt;detailed look&lt;/a&gt; into the tool&lt;/li&gt;&lt;li&gt;An article that &lt;a href="http://www.sql-server-performance.com/articles/biz/SSIS_New_Features_in_SQL_Server_2008_Part3_p1.aspx"&gt;focuses on the payoff&lt;/a&gt; &lt;/li&gt;&lt;li&gt;a look at how to incorporate &lt;a href="http://www.sqlservercentral.com/articles/Data+Profiling/65615/"&gt;functional dependency checks between columns&lt;/a&gt;&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-6486339782234153104?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/6486339782234153104/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2009/03/sql-server-data-profiler.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/6486339782234153104'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/6486339782234153104'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2009/03/sql-server-data-profiler.html' title='SQL Server Data Profiler'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-6319967279718103050</id><published>2009-03-25T15:11:00.000Z</published><updated>2009-04-14T01:51:53.458Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='SQL'/><title type='text'>Fuzzy Matching</title><content type='html'>While doing some research on data quality, I came across an &lt;a href="http://www.sqlservercentral.com/articles/SSIS/65616/"&gt;article&lt;/a&gt; about implementing the &lt;a href="http://en.wikipedia.org/wiki/Jaro-Winkler"&gt;Jaro-Winkler distance metri&lt;/a&gt;&lt;a href="http://en.wikipedia.org/wiki/Jaro-Winkler"&gt;c&lt;/a&gt; in SSIS. Given two character strings, this algorithm will return a numeric between 0 and 1 indicating similarity (so that 0 is no similarity and 1 means exact match). The Wikipedia entry uses the names MARTHA and MARHTA as an example, producing the result 0.94 (indicating a 94% similarity). &lt;i&gt;&lt;br /&gt;&lt;/i&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-6319967279718103050?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/6319967279718103050/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2009/03/fuzzy-matching.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/6319967279718103050'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/6319967279718103050'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2009/03/fuzzy-matching.html' title='Fuzzy Matching'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-3954884975115788987</id><published>2009-03-06T05:04:00.000Z</published><updated>2009-03-06T06:05:30.573Z</updated><title type='text'></title><content type='html'>When I ran the comparison of the original versus the most recent versions of the sequence comparisons, I saved the results of the Seq1 and Seq2 tables. While running some tests on the result tables, I discovered a mistake:&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_rPFttr3fnLI/SbCvkM5nvmI/AAAAAAAAABc/soOunYpQwRU/s1600-h/20090305+-+mismatched+results.GIF"&gt;&lt;img style="cursor: pointer; width: 153px; height: 320px;" src="http://2.bp.blogspot.com/_rPFttr3fnLI/SbCvkM5nvmI/AAAAAAAAABc/soOunYpQwRU/s320/20090305+-+mismatched+results.GIF" alt="" id="BLOGGER_PHOTO_ID_5309936997142871650" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;The four counts should all be equal, and because they are not, we know that there is a mistake in at least one of them. Because the version 1 results are not even consistent between one another, I have to think that the problem is with the original version.&lt;br /&gt;&lt;br /&gt;This query:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;SELECT &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;    Seq1_Line = ISNULL(LTRIM(STR(S1.CodeLineNum)), '')&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;    ,ISNULL(S1.CodeLineTxt, '')&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;    ,ISNULL(S2.CodeLineTxt, '')&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;    ,Seq2_Line = ISNULL(LTRIM(STR(S2.CodeLineNum)), '')&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;    ,OrderBy = ISNULL(S1.CodeLineNum, S2.CodeLineNum)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;FROM Seq1 S1&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;FULL OUTER JOIN Seq2 S2&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;    ON S1.CodeLineNum = S2.MatchLineNum&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;ORDER BY OrderBy&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;will display the matches between the two sequences in a "Windiff"-fashion, and running it on the results of the most recent matching algorithm shows that that one worked correctly.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-3954884975115788987?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/3954884975115788987/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2009/03/when-i-ran-comparison-of-original.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/3954884975115788987'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/3954884975115788987'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2009/03/when-i-ran-comparison-of-original.html' title=''/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_rPFttr3fnLI/SbCvkM5nvmI/AAAAAAAAABc/soOunYpQwRU/s72-c/20090305+-+mismatched+results.GIF' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-2527614268895919415</id><published>2009-03-04T02:37:00.000Z</published><updated>2009-03-04T04:25:43.899Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='SQL'/><title type='text'>Comparing Stored Procedures, Part 7</title><content type='html'>&lt;a href="http://jessesql.blogspot.com/2009/02/comparing-stored-procedures-part-6.html"&gt;Last week&lt;/a&gt; we discovered a set-based method of implementing our sequence-matching algorithm using a Common Table Expression (CTE), and found that it beat our iterative code by several orders of magnitude.  Remember the original impetus for this effort, comparing the two stored procedures of more than 1,000 lines each, which initially took about six-and-a-half hours? Running this &lt;a href="http://www.sqlservercentral.com/scripts/TSQL/66074/"&gt;new version&lt;/a&gt; in TempDb on an idle database server took just over 30 seconds! How could this be, given that in Part 6 a comparison of 800 random 2-letter values took about 70 seconds? (Omitted from those results - an input size of 1,000 took over 2 minutes.) How could a test of the same input size (1000 values), but with vastly more complex values (a real stored procedure rather than random two-letter values), take one-quarter of the time to complete? The answer lies in the length of the matching subsequences. If we run this query:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;SELECT &lt;/span&gt; &lt;span style="font-family:courier new;"&gt;  MatchLineNum, &lt;/span&gt; &lt;span style="font-family:courier new;"&gt;  SubSeqLen = COUNT(*)&lt;br /&gt;&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;FROM Seq1&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;GROUP BY MatchLineNum&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;ORDER BY COUNT(*) DESC&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;we get the results:&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_rPFttr3fnLI/Sa36tBu6oUI/AAAAAAAAABU/qhOUu4zEg-A/s1600-h/20090303+-+big+spd+matching.GIF"&gt;&lt;img style="cursor: pointer; width: 164px; height: 320px;" src="http://2.bp.blogspot.com/_rPFttr3fnLI/Sa36tBu6oUI/AAAAAAAAABU/qhOUu4zEg-A/s320/20090303+-+big+spd+matching.GIF" alt="" id="BLOGGER_PHOTO_ID_5309175187205955906" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;So on the first matching subsequence, we match 121 values, which is over 10% of the entire sequence. The next matches are composed of 84, 82, 64, and 56 values, which when combined with the first, comprise about 30% of all values. With the sequences of random values, the matching subsequences are much shorter, and therefore requre more executions of the CTE block.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-2527614268895919415?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/2527614268895919415/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2009/03/comparing-stored-procedures-part-7.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/2527614268895919415'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/2527614268895919415'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2009/03/comparing-stored-procedures-part-7.html' title='Comparing Stored Procedures, Part 7'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_rPFttr3fnLI/Sa36tBu6oUI/AAAAAAAAABU/qhOUu4zEg-A/s72-c/20090303+-+big+spd+matching.GIF' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-4899512621648362214</id><published>2009-02-25T18:05:00.000Z</published><updated>2009-03-04T04:02:57.081Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='SQL'/><category scheme='http://www.blogger.com/atom/ns#' term='CTE'/><title type='text'>Comparing Stored Procedures, Part 6</title><content type='html'>&lt;div&gt;In the &lt;a href="http://jessesql.blogspot.com/2009/02/comparing-spds-part-5-optimization.html"&gt;previous entry&lt;/a&gt;, we found that computing the maximum search size prior to the sequence matching using SQL Server 2005's CTE feature saved a considerable amount of time, but we were still performing at an unacceptable rate. I decided it was time to rework the core logic of the matching algorithm. Impressed by the blazing performance of the CTE used to precalculate the max search length, I wanted to recode the algorithm using the set-based logic of the CTE.&lt;br /&gt;&lt;br /&gt;&lt;p&gt;Initially using an iterative approach to this problem made a lot of sense, since we need to control the order of the subsequences checked (longer before shorter). There didn't seem to be an obvious way to do this with a CTE. The real power of the CTE comes from its feature of recursion - the ability to have a set of data in the CTE refer to itself (please go &lt;a href="http://www.sqlservercentral.com/articles/Development/recursivequeriesinsqlserver2005/1760/"&gt;here&lt;/a&gt; for an excellent article on this). I developed a hybrid approach that would combine iterative and recursive code. The recursive CTE in it will select off the longest matching subsequence between the two sequences it can find whose values are all unmatched, save it in a temp table, and then update the two sequences to show the matching values. It would then continue doing this until no more matching subsequences can be found. This new version can be found &lt;a href="http://www.sqlservercentral.com/scripts/TSQL/66074/"&gt;here&lt;/a&gt;.&lt;br /&gt;&lt;/p&gt;&lt;br /&gt;The code is much more concise, much more &lt;span style="font-style: italic;"&gt;elegant&lt;/span&gt;, than the iterative algorithm. But what is truly amazing is the performance increase. After altering the testing script to test all three versions of the code under the same conditions, in the same testing session, the improved performance speaks for itself:&lt;br /&gt;&lt;br /&gt;&lt;a href="http://3.bp.blogspot.com/_rPFttr3fnLI/SagsO0zNgyI/AAAAAAAAABE/bRXHJhlKDDQ/s1600-h/20090226+-+graph+of+version+compare.bmp"&gt;&lt;img id="BLOGGER_PHOTO_ID_5307540794059359010" style="width: 294px; height: 167px;" alt="" src="http://3.bp.blogspot.com/_rPFttr3fnLI/SagsO0zNgyI/AAAAAAAAABE/bRXHJhlKDDQ/s320/20090226+-+graph+of+version+compare.bmp" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;&lt;span style="font-family:courier new;"&gt;&lt;/span&gt;&lt;/p&gt;The 'Improved' column compares the latest version with the first. The improved performance represents not just absolute time; notice how slowly the rate of extra time required increases for each increment of the input size. After running the new version for input sizes from 100 to 800 values, I created this graph of the performance time in seconds (graph developed using &lt;a href="http://my.hrw.com/math06_07/nsmedia/tools/Graph_Calculator/graphCalc.html"&gt;The Graph Calculator&lt;/a&gt; by Holt, Rinehart, and Winston):&lt;br /&gt;&lt;br /&gt;&lt;p&gt;&lt;a href="http://3.bp.blogspot.com/_rPFttr3fnLI/Sag0HYuTPiI/AAAAAAAAABM/a_ANCzeXHu0/s1600-h/20090226+-+graph+of+vers+3+results+%28small%29+%28with+axis+headings%29.GIF"&gt;&lt;img id="BLOGGER_PHOTO_ID_5307549462356508194" style="width: 320px; height: 208px;" alt="" src="http://3.bp.blogspot.com/_rPFttr3fnLI/Sag0HYuTPiI/AAAAAAAAABM/a_ANCzeXHu0/s320/20090226+-+graph+of+vers+3+results+%28small%29+%28with+axis+headings%29.GIF" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;The graph rises very slowly at first, then the performance time starts to grow very quickly towards the right end of the graph. Using &lt;a href="http://www.xuru.org/rt/PowR.asp"&gt;this website for power regression analysis&lt;/a&gt;, we arrive at the approximate formula of y=3.5x^2.5/10^-6. If this is accurate, then the new version is somewhere between &lt;a href="http://en.wikipedia.org/wiki/Big_O_notation"&gt;square and cubic order of complexity&lt;/a&gt;, and although considerably faster than the previous versions, will suffer at larger input sizes.&lt;br /&gt;&lt;/p&gt;&lt;p&gt;&lt;a href="http://jessesql.blogspot.com/2009/03/comparing-stored-procedures-part-7.html"&gt;Next&lt;/a&gt;: we examine the results of running our new code on the original large stored procedures.&lt;br /&gt;&lt;/p&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-4899512621648362214?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/4899512621648362214/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2009/02/comparing-stored-procedures-part-6.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/4899512621648362214'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/4899512621648362214'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2009/02/comparing-stored-procedures-part-6.html' title='Comparing Stored Procedures, Part 6'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_rPFttr3fnLI/SagsO0zNgyI/AAAAAAAAABE/bRXHJhlKDDQ/s72-c/20090226+-+graph+of+version+compare.bmp' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-4532558412580538721</id><published>2009-02-24T05:01:00.000Z</published><updated>2009-02-25T06:02:26.779Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='SQL'/><category scheme='http://www.blogger.com/atom/ns#' term='CTE'/><title type='text'>Comparing Spds, Part 5 - Optimization</title><content type='html'>Testing out our &lt;a href="http://jessesql.blogspot.com/2009/02/comparing-spds-part-4-performance.html"&gt;sequence matching algorithm&lt;/a&gt; showed that our estimate of cubic order of complexity was accurate; unfortunately, this performance is what might be called "sub-optimal". If we have any hope of testing our 1,000 line stored procedures in a reasonable timeframe, we're going to have to optimize our algorithm.&lt;br /&gt;&lt;br /&gt;The most obvious optimization to make is to simply reduce the number of comparisons, and the way we're going to do that is by checking the sequences ahead of the actual matching. We first check each value in the sequences to see if there exists a match in the other sequence, and then overload the MatchLineNum column to indicate match or no-match. At this point we have a series of subsequences within each sequence of &lt;span style="font-style: italic;"&gt;possible&lt;/span&gt; matches. But what's important here is the &lt;span style="font-style: italic;"&gt;length&lt;/span&gt; of the subsequences - the length of the longest subsequence is the maximum search size for that sequence - we need not search for longer ones in it because we already know that longer ones already contain a mismatch. Also, the longest possible matching subsequence &lt;span style="font-style: italic;"&gt;between &lt;/span&gt;the two main ones will be the &lt;span style="font-style: italic;"&gt;smaller &lt;/span&gt;of these two maximum lengths.&lt;br /&gt;&lt;br /&gt;As an example, let's say that we have two sequences, &lt;span style="font-style: italic;"&gt;S1&lt;/span&gt; and &lt;span style="font-style: italic;"&gt;S2&lt;/span&gt;, and we find that the length of the longest possible subsequences of &lt;span style="font-style: italic;"&gt;S1&lt;/span&gt; and &lt;span style="font-style: italic;"&gt;S2&lt;/span&gt; is &lt;span style="font-style: italic;"&gt;n&lt;/span&gt; and &lt;span style="font-style: italic;"&gt;m&lt;/span&gt;, resp. For the purposes of matching the subsequences between &lt;span style="font-style: italic;"&gt;S1 &lt;/span&gt;and &lt;span style="font-style: italic;"&gt;S2&lt;/span&gt;, the maximum subsequence to compare will be of length min(&lt;span style="font-style: italic;"&gt;n&lt;/span&gt;, &lt;span style="font-style: italic;"&gt;m&lt;/span&gt;).&lt;br /&gt;&lt;br /&gt;The code to determine this maximum subsequence utilizes CTEs, one of the more important enhancements of SQL Server 2005. This was an interesting exercise. I wrote CTEs that first separately picked out the starting and ending points of the subsequences with matching values, constructed the sequences by matching starting and ending points, then calculated the maximum length of these subsequences. CTEs require a set-based approarch, rather than the more algorithmic row-by-row processing; the resulting code is more intuitive, and ultimately more powerful. Please look &lt;a href="http://www.sqlservercentral.com/articles/Development/commontableexpressionsinsqlserver2005/1758/"&gt;here&lt;/a&gt; for an excellent introductory article to CTEs.&lt;br /&gt;&lt;br /&gt;In comparing this improved version against the original code, I tested both with input sizes of 100, 200, 300, 400, and 500, running each code base 5 times for each input size. Here are the results (columns 'Version1' and 'Version2' represent performance time in seconds):&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&lt;strong&gt;InputSize___Version1___Version2____Improvement&lt;/strong&gt;&lt;br /&gt;100_________5.064______1.222_______75.9%&lt;br /&gt;200_________48.680_____13.534______72.2%&lt;br /&gt;300_________92.918_____35.958______61.3%&lt;br /&gt;400_________461.862____318.318_____31.1%&lt;br /&gt;500_________614.412____417.818_____32.0%&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;We can see that the new version improved performance considerably for the smaller input sizes, but peeters out as the size increases. Analyzing the print output from the stored procedure, paying special attention to the @CmprSz values that were output, I can see that the starting compare sizes grew larger as a ratio to the input size as the input size increased. In other words, the larger the sample size the smaller the cost savings. Given that the test sequences consisted of "words" of two letters of the form [A-J][A-J], obviously the larger the sample size the more likely that random values will be shared between the sequences. In two particular runs of the larger sample sets, all values were shared between the two sequences, resulting in zero cost savings.&lt;br /&gt;&lt;br /&gt;This is not exactly a failure of testing design - for many real-world scenarios, the likelihood of shared values should follow this trend of increasing as the sample size increases. Consider comparing the words from works of literature - comparing the text of any random two English language authors of the 20th century, the larger the works the more likely that they will share words between them. Another example is DNA - consider the &lt;a href="http://en.wikipedia.org/wiki/Dna#Base_pairing"&gt;base pairing&lt;/a&gt; of AT and GC combinations - since there are only two possibilities in the domain of values, we are certain to find 100% likelihood of shared values between any two DNA sequences (thus reducing our cost savings to zero).&lt;br /&gt;&lt;br /&gt;You can find a complete copy of the improved stored procedure &lt;a href="http://www.sqlservercentral.com/scripts/TSQL/66038/"&gt;here&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-4532558412580538721?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/4532558412580538721/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2009/02/comparing-spds-part-5-optimization.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/4532558412580538721'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/4532558412580538721'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2009/02/comparing-spds-part-5-optimization.html' title='Comparing Spds, Part 5 - Optimization'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-6501705302372529703</id><published>2009-02-19T16:09:00.000Z</published><updated>2009-02-25T18:55:20.219Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='SQL'/><title type='text'>Comparing Spds, Part 4 - Performance</title><content type='html'>In a &lt;a href="http://jessesql.blogspot.com/2009/02/comparing-spds-part-3-performance.html"&gt;previous posting&lt;/a&gt;, I analyzed the comparison algorithm to determine its complexity, and found it to be of cubic order. The next step is to test the actual performance of the code and see if my estimate was correct. To do this, I wrote a testing script that tests the code several times under several different input sizes (this code can be found &lt;a href="http://www.sqlservercentral.com/scripts/TSQL/65943/"&gt;here&lt;/a&gt;). Using the table in the testing script, I ran a couple more queries to reformat the results to be more suitable for my specific code:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;SELECT &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;InputSize, &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;NextInputSize = InputSize + 50.0, &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;NumberOfRuns = COUNT(*), &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;AveragePercentMatch = AVG(PcntMatch), &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;AverageRunTime = AVG(CONVERT(decimal(9, 2), CONVERT(varchar(max), DATEDIFF(ss, RunStart, RunDone)) + '.' + CONVERT(varchar(max), DATEDIFF(ms, RunStart, RunDone))))&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;INTO #AggregatedPerformanceResults&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;FROM #PerformanceResults&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;GROUP BY InputSize&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;SELECT&lt;br /&gt;A.InputSize, &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;A.AverageRunTime,&lt;br /&gt;ExpectedNextAverageRunTime = POWER(A.NextInputSize / A.InputSize,&lt;/span&gt;&lt;span style="font-family:courier new;"&gt; 3.0) * A.AverageRunTime&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;FROM #AggregatedPerformanceResults A&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;JOIN #AggregatedPerformanceResults B ON B.InputSize = A.NextInputSize&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;and here are the results I saw:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&lt;br /&gt;InputSize___AverageRunTime___ExpectedNextAvgRunTime&lt;br /&gt;50__________4.618000_________36.944000_____________&lt;br /&gt;100_________37.270000________125.786250____________&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;150_________142.642000_______338.114370____________&lt;br /&gt;200_________366.866000_______716.535156____________&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&lt;/span&gt;The 'ExpectedNextAverageRunTime' values are, as the name indicates, the expected average time of the next input size based on the results of the given input size. This is computed by computing the percentage difference between the given and next input size, cubing it, and multiplying the result by the given average run time. For InputSize = 50, the percentage difference between 100 and 50 is 100/50 = 2, 2 to the third power is 8, times the average run time of 4.618 seconds is 36.944 seconds. Notice that differs only slightly from the actual average run time for InputSize of 100, which is 37.27 seconds. Comparing subsequent expected versus actual average run times shows a similar consistency, which confirms the analysis of the previous posting, that the matching algorithm is of cubic order of complexity.&lt;br /&gt;&lt;br /&gt;Next: &lt;a href="http://jessesql.blogspot.com/2009/02/comparing-spds-part-5-optimization.html"&gt;optimizing the matching algorithm&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-6501705302372529703?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/6501705302372529703/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2009/02/comparing-spds-part-4-performance.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/6501705302372529703'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/6501705302372529703'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2009/02/comparing-spds-part-4-performance.html' title='Comparing Spds, Part 4 - Performance'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-3232777559251324023</id><published>2009-02-12T16:50:00.000Z</published><updated>2009-02-27T20:39:33.981Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='SQL'/><title type='text'>Complicated Comments</title><content type='html'>A "&lt;a href="http://www.sqlservercentral.com/questions/TSQL/65712/"&gt;Question of the Day&lt;/a&gt;" I submitted to &lt;a href="http://www.sqlservercentral.com/"&gt;SQLServerCentral.com&lt;/a&gt; was published yesterday, and it seemed to generate a lot of interest, due to the rather surprising answer.&lt;br /&gt;&lt;br /&gt;The question was, "Given the code below, which numbers will be printed?"&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;PRINT '1' -- /* ;PRINT '2' */ ;PRINT '3' /*&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;PRINT '4' --*/&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;--/*&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;PRINT '5'&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;--*/&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;/*&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;PRINT '6'&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;--/*&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;*/&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;PRINT '7'&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;--*/&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;PRINT '8'&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Possible answers:&lt;br /&gt;A. 1, 8&lt;br /&gt;B. 1, 4, 8&lt;br /&gt;C. 1, 4, 5, 8&lt;br /&gt;D. 1, 4, 5, 7, 8&lt;br /&gt;&lt;br /&gt;Correct answer: C&lt;br /&gt;&lt;br /&gt;Fewer than half of the respondents answered correctly (I certainly would have gotten the question wrong myself - this question derived from research that completely surprised me). It's not exactly a trick question, but the answer centers around a little-known feature of nested comments. It's fairly obvious that the numbers 1, 4, 5, and 8 will print. The tricky part is near the bottom of the code. Most people look at the block comment starting above the line "&lt;span style="font-family:courier new;"&gt;PRINT '6'&lt;/span&gt;" and think that it ends above the line "&lt;span style="font-family:courier new;"&gt;PRINT '7'&lt;/span&gt;", following the reasoning that the line below "&lt;span style="font-family:courier new;"&gt;PRINT '6'&lt;/span&gt;", "&lt;span style="font-family:courier new;"&gt;--/*&lt;/span&gt;" is ignored by the SQL parser. But because the inline comment of that line is already inside a block comment, the inline comment is ignored! Which means that the line "&lt;span style="font-family:courier new;"&gt;--/*&lt;/span&gt;" actually begins a new, nested block comment that closes on the next line, but causes the line "&lt;span style="font-family:courier new;"&gt;PRINT '7'&lt;/span&gt;" to remain commented-out. In other words, the block comment containing "&lt;span style="font-family:courier new;"&gt;PRINT '6'&lt;/span&gt;" also contains "&lt;span style="font-family:courier new;"&gt;PRINT '7'&lt;/span&gt;", and does not close until the line above "&lt;span style="font-family:courier new;"&gt;PRINT '8'&lt;/span&gt;". This confusion explains why, as of right now, 40% of the respondents chose answer "D".&lt;br /&gt;&lt;br /&gt;When I submitted the question for publication, I wondered that many people would find it too abstruse and esoteric to be of value. A number of participants in the &lt;a href="http://www.sqlservercentral.com/Forums/Topic654391-1181-1.aspx"&gt;discussion of the question&lt;/a&gt; pointed out that if they saw comments similar to this in a code review, they would send the developer back to his or her desk to clean up. And point taken. Nesting comments in this fashion has very little, if any, upside and almost 100% downside in that it can lead to largely unexpected results. A couple of articles on the site help to establish best practices for commenting: &lt;a href="http://www.sqlservercentral.com/scripts/30672/"&gt;"The case against using single-line comments"&lt;/a&gt;, by Cade Bryant, and &lt;a href="http://www.sqlservercentral.com/articles/Miscellaneous/worstpracticebadcomments/893/"&gt;"Worst Practice - Bad Comments"&lt;/a&gt;, by Andy Warren.&lt;br /&gt;&lt;br /&gt;This problem originally came to mind when I started work on a tool for &lt;a href="http://jessesql.blogspot.com/2009/02/comparing-stored-procedures-part-1.html"&gt;comparing the text from stored procedures&lt;/a&gt;. I wanted to allow the option of omitting all comments from the comparison, which of course would require me to identify and parse out comments from the text. In my research, I came across the question of how to treat nested comments - the "order of operation" of inline versus block comments, nested blocks, etc., and that's when I came across the scenario that formed this question of the day. I have to say that I was surprised, almost to the point of shock, that commenting functioned this way.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-3232777559251324023?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/3232777559251324023/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2009/02/complicated-comments.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/3232777559251324023'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/3232777559251324023'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2009/02/complicated-comments.html' title='Complicated Comments'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-4846732705392148261</id><published>2009-02-10T05:47:00.000Z</published><updated>2009-02-23T20:30:58.404Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Mathematical Series'/><category scheme='http://www.blogger.com/atom/ns#' term='Big O Notation'/><category scheme='http://www.blogger.com/atom/ns#' term='SQL'/><category scheme='http://www.blogger.com/atom/ns#' term='Complexity Analysis'/><title type='text'>Comparing Spds Part 3 - Performance</title><content type='html'>In the previous post, &lt;a href="http://jessesql.blogspot.com/2009/02/comparing-stored-procedures-part-2.html"&gt;"Comparing Stored Procedures, Part 2,"&lt;/a&gt; we looked at an algorithm that would match smaller and smaller subsequences between two sequences of interest. When I coded it, I approached it as a proof-of-concept, and thus sidelined any performance concerns. When I tested it on two stored procedures of about 1000 lines a piece on an idle server, the comparison took about 6.5 hours! Some possible reasons why:&lt;br /&gt;&lt;ul&gt;&lt;br /&gt;&lt;li&gt;the code is &lt;a href="http://www.sqlservercentral.com/articles/T-SQL/61539/"&gt;RBAR-based&lt;/a&gt; rather than set-based (see &lt;a href="https://www.sqlservercentral.com/blogs/philfactor/archive/tags/tsql+rbar/default.aspx"&gt;this article&lt;/a&gt; for a refutation of that)&lt;/li&gt;&lt;li&gt;there is no indexing of the text elements&lt;/li&gt;&lt;li&gt;very little optimization has been done&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;How many comparisons are going on in this algorithm? To figure this out, let's split the accounting between how many subsequences exist, and how those are compared to the other sequence. &lt;/p&gt;&lt;br /&gt;&lt;p&gt;Let's say that we have two sequences, N and M, of lengths &lt;em&gt;n&lt;/em&gt; and &lt;em&gt;m&lt;/em&gt; respectively, where &lt;em&gt;m&lt;/em&gt; &gt;= &lt;em&gt;n.&lt;/em&gt; Let's take subsequences out of N and compare to M. So how many are there? This count will be denoted by x. If &lt;em&gt;n&lt;/em&gt; = 1, then there is only one subsequence, so x = 1. If &lt;em&gt;n&lt;/em&gt; = 2, then x = 3 (with sequence 'AB', there is 'AB', 'A', and 'B'). If &lt;em&gt;n&lt;/em&gt; = 3, then x = 6. I will omit the proof here, but this results in the simple series (taken from &lt;a href="http://en.wikipedia.org/wiki/List_of_mathematical_series"&gt;Wikipedia&lt;/a&gt;):&lt;/p&gt;&lt;br /&gt;&lt;p&gt;&lt;a href="http://en.wikipedia.org/wiki/List_of_mathematical_series"&gt;&lt;img id="BLOGGER_PHOTO_ID_5301056013547136850" style="WIDTH: 153px; CURSOR: hand; HEIGHT: 62px" alt="" src="http://3.bp.blogspot.com/_rPFttr3fnLI/SZEiW63s51I/AAAAAAAAAAk/SbaCoseyWZo/s320/20090210+-+math+series.GIF" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;/p&gt;&lt;br /&gt;&lt;p&gt;If n = 1000, as is the case in my comparison, then x &gt; 500,000! So we have upwards of a half-million comparisons to do already, and we're only counting the left side! Note that this is the worst-case scenario, where none of the elements match between the sequences (any elements contained in a matching subsequence are excluded from future comparisons). &lt;/p&gt;&lt;br /&gt;&lt;p&gt;&lt;/p&gt;&lt;br /&gt;&lt;p&gt;So now the question is, how many subsequences on the right side must we compare to those on the left? Let's look at a simple example comparing strings 'ABC' and 'XYZ':&lt;/p&gt;&lt;br /&gt;&lt;ol&gt;&lt;li&gt;compare subsequences of length 3: ABC:XYZ (1 comparison)&lt;/li&gt;&lt;li&gt;compare subsequences of length 2: AB:XY, AB:YZ, BC:XY, BC:YZ (4 comparisons)&lt;/li&gt;&lt;li&gt;compare subsequences of length 1: A:X, A:Y, A:Z, B:X, B:Y, B:Z, C:X, C:Y, C:Z (9 comparisons)&lt;/li&gt;&lt;/ol&gt;&lt;br /&gt;&lt;p&gt;The total number of comparisons is 1 + 4 + 9 = 14. Note that the number of comparisons of length n is n^2, and that the sum is another series:&lt;/p&gt;&lt;br /&gt;&lt;p&gt;&lt;/p&gt;&lt;a href="http://en.wikipedia.org/wiki/List_of_mathematical_series"&gt;&lt;img id="BLOGGER_PHOTO_ID_5301206210295853522" style="WIDTH: 320px; CURSOR: hand; HEIGHT: 57px" alt="" src="http://1.bp.blogspot.com/_rPFttr3fnLI/SZGq9hXHcdI/AAAAAAAAAAs/imEe_2sl1wI/s320/20090210+-+math+series+square.GIF" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Now let's say that we were comparing 'ABCD' and 'XYZ'. How many extra comparisons does this add (extras &lt;strong&gt;highlighted in bold&lt;/strong&gt;)?&lt;br /&gt;&lt;br /&gt;&lt;ol&gt;&lt;br /&gt;&lt;li&gt;compare subsequences of length 3: ABC:XYZ, &lt;strong&gt;BCD:XYZ&lt;/strong&gt; (1 extra comparison)&lt;/li&gt;&lt;li&gt;compare subsequences of length 2: AB:XY, AB:YZ, BC:XY, BC:YZ, &lt;strong&gt;CD:XY, CD:YZ&lt;/strong&gt; (2 extra comparisons)&lt;/li&gt;&lt;li&gt;compare subsequences of length 1: A:X, A:Y, A:Z, B:X, B:Y, B:Z, C:X, C:Y, C:Z, &lt;strong&gt;D:X, D:Y, D:Z&lt;/strong&gt; (3 extra comparisons)&lt;/li&gt;&lt;/ol&gt;&lt;br /&gt;&lt;p&gt;Does this look familiar? The extra comparisons are the first series summation we saw earlier. And although it certainly adds processing resources, what we are most interested in is the &lt;a href="http://en.wikipedia.org/wiki/Big_O_notation"&gt;order of complexity&lt;/a&gt;, which in our case is&lt;br /&gt;&lt;br /&gt;&lt;a href="http://en.wikipedia.org/wiki/Big_O_notation"&gt;&lt;img id="BLOGGER_PHOTO_ID_5301215422632449570" style="WIDTH: 67px; CURSOR: hand; HEIGHT: 43px" alt="" src="http://2.bp.blogspot.com/_rPFttr3fnLI/SZGzVv_UViI/AAAAAAAAAA0/iAmgHisuWMY/s320/20090210+-+cubic+order.GIF" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Examining our example comparison of two stored procedures of around 1000 lines each, this results in processing involving on the order of 1 billion comparisons. In an unoptimized algorithm, it's no wonder why this took almost seven hours to complete!&lt;/p&gt;&lt;p&gt;Next: &lt;a href="http://jessesql.blogspot.com/2009/02/comparing-spds-part-4-performance.html"&gt;testing our hypothesis in estimating performance&lt;/a&gt;.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-4846732705392148261?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/4846732705392148261/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2009/02/comparing-spds-part-3-performance.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/4846732705392148261'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/4846732705392148261'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2009/02/comparing-spds-part-3-performance.html' title='Comparing Spds Part 3 - Performance'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_rPFttr3fnLI/SZEiW63s51I/AAAAAAAAAAk/SbaCoseyWZo/s72-c/20090210+-+math+series.GIF' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-2289823485793636632</id><published>2009-02-06T18:30:00.000Z</published><updated>2009-02-25T15:59:19.326Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Text Matching'/><category scheme='http://www.blogger.com/atom/ns#' term='SQL'/><title type='text'>Comparing Stored Procedures, Part 2</title><content type='html'>In the previous posting on this topic, &lt;a href="http://jessesql.blogspot.com/2009/02/comparing-stored-procedures-part-1.html"&gt;Comparing Stored Procedures, Part 1&lt;/a&gt;, we found that the stored procedures of interest were more than 90% alike, when counting matching lines of code between them. This result is certainly helpful, but the approach has certain downsides. For example, if the two spds were composed of many blank lines, then we might have a false positive result. If the spds had exactly the same lines, but in different orders, then we would see a 100% match, but we couldn't really say that they were the same. Another thing I was really interested in was, For the lines of matching code, how large were the blocks of matches (matching lines where the next lines matched also)? That sort of result could lead on to something developers familiar with Visual Studio 98 had at their disposal - &lt;a href="http://en.wikipedia.org/wiki/Windiff"&gt;Windiff&lt;/a&gt;, a visual comparison of the spds.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;In order to provide this more detailed picture of how the stored procedures matched one another, I devised an algorithm that would compare blocks of lines between them, starting with the largest possible sequence to compare (the size of this would be the number of lines of the smaller spd), and then compare smaller and smaller sequences until the comparison size was just one line (which is what was compared in &lt;a href="http://jessesql.blogspot.com/2009/02/comparing-stored-procedures-part-1.html"&gt;Comparing Stored Procedures, Part 1&lt;/a&gt;). The trick to this approach is that if a match was found between to large sequences of lines, the subsequences in them would never be compared. Although this certainly helps with performance, the real reason behind this is that the fact that large blocks of code matched is very meaningful. Take this example: the string 'ABCD' matches 'ABDC' more than it matches 'DCBA'. How much "more" it matches is the subject of a later posting, where we establish our scoring algorithm.&lt;br /&gt;&lt;br /&gt;For a complete list of the code, please go &lt;a href="http://www.sqlservercentral.com/scripts/T-SQL/65798/"&gt;here&lt;/a&gt;. The code runs a comparison of the strings "AZXYZBC" and "BCDAEAXYZ" (regarding each letter as a line of code to compare). Because "AZXYZBC" is the smaller string, the size of subsequences to compare starts at its length, 7, and decrements from there. The comparison first checks to see if any instances of "AZXYZBC" are in "BCDAEAXYZ"; obviously there are none, so it next checks for matching subsequences of length 6. No matches are found until it reaches length 3, at which point it finds matches of "XYZ" in both sequences. When it finds that subsequence, it marks each row in the subsequence to indicate the starting matching position in the other sequence; this mark will prevent the letter from being compared again. This allows the comparison to show the matching subsequences in their greatest length, which also indicates the strength of the match.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;This image shows the final matching between the sequences:&lt;br /&gt;&lt;br /&gt;&lt;a href="http://4.bp.blogspot.com/_rPFttr3fnLI/SZEGfc_ztcI/AAAAAAAAAAc/lcNOXeaLxac/s1600-h/20090209+-+sample+compare.GIF"&gt;&lt;img id="BLOGGER_PHOTO_ID_5301025373821318594" style="WIDTH: 320px; CURSOR: hand; HEIGHT: 213px" alt="" src="http://4.bp.blogspot.com/_rPFttr3fnLI/SZEGfc_ztcI/AAAAAAAAAAc/lcNOXeaLxac/s320/20090209+-+sample+compare.GIF" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Notice that row 2 on the left shows no match, even though there is a 'Z' on the right in row 9 - this is because the matching 'XYZ' subsequences are much more important given their longer length.&lt;br /&gt;&lt;br /&gt;Next: &lt;a href="http://jessesql.blogspot.com/2009/02/comparing-spds-part-3-performance.html"&gt;performance analysis of our algorithm&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-2289823485793636632?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/2289823485793636632/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2009/02/comparing-stored-procedures-part-2.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/2289823485793636632'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/2289823485793636632'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2009/02/comparing-stored-procedures-part-2.html' title='Comparing Stored Procedures, Part 2'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_rPFttr3fnLI/SZEGfc_ztcI/AAAAAAAAAAc/lcNOXeaLxac/s72-c/20090209+-+sample+compare.GIF' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-9171994814594890540</id><published>2009-02-04T17:45:00.000Z</published><updated>2009-02-11T21:12:34.170Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='ETL'/><category scheme='http://www.blogger.com/atom/ns#' term='BI'/><category scheme='http://www.blogger.com/atom/ns#' term='SQL'/><title type='text'>Combination of Columns with Unique Values</title><content type='html'>One of the side projects that I am most proud of is the code I wrote to query a table and return a list of columns comprising a potential unique key, which can be found &lt;a href="http://www.sqlservercentral.com/scripts/T-SQL/62086/"&gt;here&lt;/a&gt;. Here's a snippet of the write-up I did for it:&lt;br /&gt;&lt;br /&gt;"As an ETL developer, I often receive files in which the natural key is either unidentified or misidentified, and I need to determine a combination of columns that uniquely identifies each row (the natural key, which is usually also the primary or unique key of the table holding this data), in order to integrate the data from that file into the database. Many times it is simply not clear what the unique key should be, and this requires a painstaking cycle of guessing and testing combinations of columns for unique values. To address this problem, I developed a stored procedure that will automatically query the table to discover a candidate for the unique key. At first, I wanted to create a tool that would report all unique combos, but due to performance reasons, I later limited that to just the first one discovered. I added a way to retest and exclude that result if the first one is not satisfactory."&lt;br /&gt;&lt;br /&gt;You can also follow this link for the complete article:&lt;br /&gt;&lt;a href="http://www.sqlservercentral.com/scripts/T-SQL/62086/"&gt;http://www.sqlservercentral.com/scripts/T-SQL/62086/&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-9171994814594890540?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/9171994814594890540/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2009/02/combination-of-columns-with-unique.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/9171994814594890540'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/9171994814594890540'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2009/02/combination-of-columns-with-unique.html' title='Combination of Columns with Unique Values'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-115397154611799658</id><published>2009-02-04T03:34:00.001Z</published><updated>2009-02-23T20:27:19.909Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Text Matching'/><category scheme='http://www.blogger.com/atom/ns#' term='SQL'/><title type='text'>Comparing Stored Procedures, Part 1</title><content type='html'>I recently completed a data mart requiring me to reproduce, and improve upon, a critical departmental report created in Excel from the results of three stored procedures. These spds appeared to be almost, but not quite identical - they queried similar data sources and output results in identical data structures, using slightly varying processes. I surmised that there was a single stored procedure at one time, that was copied and modified to create the other two, almost like a genetic mutation. Unfortunately all three were actively used - and independently maintained (or not maintained - the risk of keeping duplicate code in use). What would've simplified the analysis greatly is if the original developer had overloaded the original spd to output each of the three result sets, perhaps accepting a parameter to indicate the result of interest.&lt;br /&gt;&lt;br /&gt;What I really wanted to know was how similar were the spds, and what made them different from one another? Given that each spd was about 1000 lines long, a casual comparison would not be feasible.&lt;br /&gt;&lt;br /&gt;I started my analysis by reading each spd into a table, line by line, so that each row in the table represented a line of code from the spd, and then compared the number of lines from each spd that matched lines in the other spd. &lt;a href="http://www.sqlservercentral.com/scripts/T-SQL/65787/"&gt;Please go here for the complete code.&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;This produced some interesting results:&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;spd1: rpt_ReportingSpd1 &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;spd2: rpt_ReportingSpd2 &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;Percentage match between the spds: 92.27% &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;Percentage of spd1 found in spd2: 94.64% &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;Percentage of spd2 found in spd1: 90.12% &lt;/span&gt;&lt;br /&gt;&lt;br /&gt;So based on the raw numbers of lines matching between the stored procedures, they were mostly the same.&lt;br /&gt;&lt;br /&gt;Next: an &lt;a href="http://jessesql.blogspot.com/2009/02/comparing-stored-procedures-part-2.html"&gt;improved algorithm providing more advanced matching&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-115397154611799658?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/115397154611799658/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2009/02/comparing-stored-procedures-part-1.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/115397154611799658'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/115397154611799658'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2009/02/comparing-stored-procedures-part-1.html' title='Comparing Stored Procedures, Part 1'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5937947540405767218.post-5473522070940241891</id><published>2009-02-03T16:59:00.000Z</published><updated>2009-02-04T03:23:22.698Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='introduction'/><category scheme='http://www.blogger.com/atom/ns#' term='Jesse McLain'/><category scheme='http://www.blogger.com/atom/ns#' term='SQL'/><title type='text'>Welcome to My Blog</title><content type='html'>I am a database developer focusing on ETL/BI development. My technological background centers on Microsoft SQL Server technologies such as T-SQL programming, SSIS, and SSRS, but also includes Informatica, Visual Foxpro, and application development such as Visual Basic and Java. This weblog will focus on SQL Server, especially T-SQL development and programming. My aim is to present projects and problems in SQL Server and explain my thought process at delivering a solution.&lt;br /&gt;&lt;br /&gt;My webpage is &lt;a href="http://www.jessemclain.com/"&gt;http://www.jessemclain.com/&lt;/a&gt;, and I can be reached via email at &lt;a href="mailto:jesse@jessemclain.com"&gt;jesse@jessemclain.com&lt;/a&gt;. I have contributions on SQL Server Central that can be accessed via the directory at &lt;a href="http://www.sqlservercentral.com/Authors/Scripts/Jesse_McLain/413474/"&gt;http://www.sqlservercentral.com/Authors/Scripts/Jesse_McLain/413474/&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5937947540405767218-5473522070940241891?l=jessesql.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jessesql.blogspot.com/feeds/5473522070940241891/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jessesql.blogspot.com/2009/02/welcome-to-my-blog.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/5473522070940241891'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5937947540405767218/posts/default/5473522070940241891'/><link rel='alternate' type='text/html' href='http://jessesql.blogspot.com/2009/02/welcome-to-my-blog.html' title='Welcome to My Blog'/><author><name>Jesse McLain</name><uri>http://www.blogger.com/profile/03811523504082394802</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry></feed>
