Friday, July 30, 2010

SQL Date Functions Syntax

One thing that drives me a little crazy with T-SQL is the syntax of the date functions:

DATEADD (datepart , number, date )
DATEDIFF ( datepart , startdate , enddate )
DATEPART ( datepart , date )

The designers were consistent with putting the 'datepart' as the first parameter, but for DATEADD, why did they made the base date the last parameter? I think it should be consistent with DATEPART and DATEDIFF, so that the second parameter is always a DATETIME.

Tips to optimize your SQL statements

Interesting article on SQLServerCentral.com By Brian Ellul, 2010/07/29:

There is a huge difference between writing an SQL statement which works and one which works well and performs well. Sometimes developers are too focused on just writing their SQL to perform the required task, without taking into consideration its performance and most importantly its impact on the SQL Server Instance, i.e. the amount of CPU, IO, and memory resources their SQL is consuming. Thus, they starve other SQL Server processes during the SQL statement execution bringing the whole instance to its knees. This article is intended to provide the SQL developer with a set of easy checks to perform to try and optimize the SQL Statements."

Tuesday, July 27, 2010

Level of measurement

Interesting article in Wikipedia about metrics, and the different sorts:

The "levels of measurement", or scales of measure are expressions that typically refer to the theory of scale types developed by the psychologist Stanley Smith Stevens. Stevens proposed his theory in a 1946 Science article titled "On the theory of scales of measurement"[1]. In this article Stevens claimed that all measurement in science was conducted using four different types of scales that he called "nominal", "ordinal", "interval" and "ratio".

Monday, July 26, 2010

NK Search

I just tested the NK search algorithm on an unindexed table of 9 columns and over 10 million rows, and it ran for 4 minutes before returning the result that no natural key of 4 columns or fewer exists. This ran 255 queries in that time. The biggest performance boost was the use of the sample check, whereby the duplicate count is ran on the TOP 1000 rows first, then the full data set.

I'm polishing off the code and should post it by the end of the week.

Thursday, July 15, 2010

New and Improved Natural Key Discovery Algorithm

While preparing for my presentation on code I wrote that discovers natural keys,for the Baltimore SQL Server Users Group meeting on Monday, I discovered a blind spot in the design. I hit the panic button a few days ago, but today I think I've got a new solution that solves a lot of problems.

Wednesday, July 14, 2010

Table Variables and Query Optimizer

In yesterday's post I mentioned that I replaced the temp tables in a stored procedure with table variables. This worked perfectly functionally, but doomed the performance by orders of magnitude. The problem was that the query optimizer was choosing MERGE JOINs instead of HASH JOINs for the table variables. Forcing the hash joins with query hints fixed the problem.

[Ed. note 7/26]Here's another interesting discussion about this. Note the post that reads "And also, because table variables do not have statistics, the query optimizer will often take dubious choices because it believes the row count to be 1 all the time - leading to inappropriate join methods etc..."

Tuesday, July 13, 2010

VS2008 SSRS and Temp Tables in Spds

Ran into a problem using VS2008 to report the results of data from a SQL Server 2005 stored procedure. Apparently there is an issue with VS2008 and stored procedures that make use of temp tables. According to this article, I replaced all of the temp tables with variable tables, and the problem went away.

Code Review, Years Later

I'm reviewing code I wrote a couple of years ago to determine natural keys of raw data sets, in order to do a talk on it at the local SQL Server users' group meeting next week. After an initial reading, I thought that the code worked a certain way (doing breadth-first searches), but it actually works in a hybrid breadth/depth first fashion.

The first thought one might have would be something along the lines of "comment much?", but truthfully, I do, or at least I think I do. The code is found in a SQL Server Central article, explaining much of the background of the problem I was trying to solve, and the code itself has an entire page of comments, and yet I find now that there simply isn't one concise explanation of the resultant design of my algorithm.

There must be a magic number of weeks after code is finished and polished off, when the original intent of the developer is still in his or her brain, but beginning to fade. That is the perfect time to finish commenting the code, because at that time the developer will have to approach the code as an outsider, but still be able to summon those fuzzy thoughts about why something was done a certain way. Does anyone know what that magic number of weeks is?

Sunday, July 11, 2010

Recursive Stored Procedures and Temp Tables

If a stored procedure creates a temp table and then calls itself recursively, will the newly instantiated stored procedure have access to the temp table?

Friday, July 2, 2010

Temp Table Scope

One of the gotchas of TSQL is temp table scope. The fact that I can create a temp table within a stored procedure, then call a second stored procedure and reference that temp table, scares me a little. It reminds me of the 3G languages, with their global variables and modified private scope and such.

A big enhancement in SQL in this department: table variables. The scope of a table variable is the same as that of a "regular" variable - local to the scope of the procedure. So replacing the use of a temp table in a stored procedure eliminates the concern that a called inner procedure will unexpectedly change the data. However, there are major performance declines when using table variables to hold large amounts of data, as this article explains.

Another link: http://www.mssqltips.com/tip.asp?tip=1556