Being a database developer rather than a DBA, I rarely deal with collation types, but I came across a situation recently where I had to dig into the issue. My objective was to produce a breakdown of how many rows contained each ASCII character. I considered two approaches: slice the values into one-character chunks, or loop through all 256 ASCII values and count the number of rows containing each character. The former approach has the advantage of not only counting the rows but the frequency of characters (e.g., "100 rows contain 250 instances of the character 'A'"), but I opted for the second approach since it intuitively seemed faster. If your database was created with case-insensitive collation (such as "SQL_Latin1_General_CP1_CI_AS"), checking for the characters 'A', would pull in values of 'a':
USE NorthWindGOSELECT DISTINCT CityFROM dbo.CustomersWHERE CHARINDEX('A', City) > 0data:image/s3,"s3://crabby-images/eea15/eea1543064c947b16b82bf7b59b4cd12f96d24b8" alt=""
To fix, simply add the "COLLATE" clause to the query:
SELECT DISTINCT CityFROM dbo.CustomersWHERE CHARINDEX('A' COLLATE Latin1_General_BIN, City) > 0
No comments:
Post a Comment