Sunday, May 23, 2010

Bulk Insert CSV with Text Qualifiers

One of the biggest shortcomings of SQL Server's BCP/Bulk Insert tool is the lack of specifying a text qualifier in a comma-delimited ("CSV"), text file for import. Let's take a look at an example of a record from a CSV file that we wish to import into a table:

"John Smith", "123 Main St, Apt 3", "Anytown", "XX", "12345"

So the format of the record is Name, Address, City, State, ZIP. The fields are separated by commas, and encapsulated with double-quotes. The "Address" field demonstrates the need for the text qualifier: the embedded comma that separates the first address line from the second. Without text qualifiers each line of address would appear as a separate field, rather than one discrete value. That in itself would not be a problem, but because some addresses only have one line, the number of delimiters becomes variable, and the import is then faulty.

The problem crystalizes when we try to import a file of that format into SQL Server. The BCP/Bulk Insert utilities do not have an option to specify the text qualifier. We have a couple of "kloodgy" options before us: We can specify the delimiter as ','. The problem here is that every field encapsulated with the double quotes will retain that character in the database, leaving us to update the table to remove those characters. We can specify that the delimiter is '","' instead of ','; this is a step in the right direction, but it will still leave a double quote leading in the first field and trailing in the last, leaving us with less work than in the first case.

Given the maturity of the SQL Server product, I'm surprised that Microsoft hasn't added this feature. I suppose that is their way of moving developers towards SSIS, which of course does have it.

If we really want to properly import this file using BCP or BULK INSERT without any weird cleanup kludges, we have to use a format file. Here's a good article on MSDN about how to create a format file from scratch using a BCP option. To complete my task, I will take the resultant format file and modify it to account for my comma delimiters and text qualifiers.