While doing some research on data quality, I came across an article about implementing the Jaro-Winkler distance metric in SSIS. Given two character strings, this algorithm will return a numeric between 0 and 1 indicating similarity (so that 0 is no similarity and 1 means exact match). The Wikipedia entry uses the names MARTHA and MARHTA as an example, producing the result 0.94 (indicating a 94% similarity).
Wednesday, March 25, 2009
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment