Wednesday, March 25, 2009

Fuzzy Matching

While doing some research on data quality, I came across an article about implementing the Jaro-Winkler distance metric in SSIS. Given two character strings, this algorithm will return a numeric between 0 and 1 indicating similarity (so that 0 is no similarity and 1 means exact match). The Wikipedia entry uses the names MARTHA and MARHTA as an example, producing the result 0.94 (indicating a 94% similarity).

No comments:

Post a Comment