Importing Data with SQL Server 2005
Janae and I recently did a particularly tricky data import into SQL Server, utilizing the SQL Server Import and Export wizard. The originating data was a text file that originated on a Mac, exported from a FilemakerPro database on
a Mac. The simple data import from the text file into a table in SQL
Server 2005 took us more than an entire day, so I thought I’d share a
few tips we learned from our experience:
Import From Excel
After experimenting with importing CSV files, tab-delimited files and Excel files, I found that if you first import the data into Excel you will avoid several common problems.
We ran into difficulties with fields that had existing commas, tabs or
quotation marks in the actual data. They caused problems because CSV or
tab-delimited files use these characters as delimiters – so the fields didn’t break out properly.
Massage Data in Excel
Do as much of your data clean-up as possible directly in Excel.
We performed search and replace on funky Macintosh characters that we
didn’t want. We updated column names to match the destination SQL
Server table. We did as many bad-data checks as possible, such as
numeric columns not having non-numeric characters, and removing words
such as “n/a” or “no data.” We removed columns we wouldn’t need.
We were also importing fields that would be ultimately displayed in
HTML, so we added as much HTML markup directly in Excel as possible.
This can be done using concatenation and other Excel functions. One
simple example would be to replace all line breaks with
</p><p> and adding <p> to the beginning and
</p> to the end of the field. We also replaced all dashes with
</li><li> and added <ul class=”listClass”> to the beginning and
</ul> to the end of some fields, creating HTML unordered lists with only minor cleanup needed after the fact.
Data Types and Lengths
This was one of the lessons that took me over an hour to learn, so perk up and listen to this excellent advice: move any of your exceptional data into the first 8 rows.
When you do an import into a new table in SQL Server 2005, it does not inspect all of your data before it decides on column data types – it bases each column’s data type on those first 8 rows. So if you have some rows with particularly long values, put those at the top. Similarly, if you have a numeric field – make sure that the first 8 rows don’t happen to have null values. Read Microsoft’s article on the details of how data types are determined.
Hopefully these simple tricks will save you from the many hours of troubleshooting we’ve already been through. Best of luck with your import!