Thursday, April 5, 2007

Script for Importing Casualty Data

A few readers may know that the Pentagon publishes US military casualty data on a web page, and that every death from Operation Iraqi Freedom and Operation Enduring Freedom is cataloged and available for download on an Excel spreadsheet. (It's debatable whether every death is cataloged, but speculation is not the purpose of this post.) I have been attempting to mine data on this Excel sheet for some time, but it's not nearly as convenient as having it in a real database.

So after some thought and a little tinkering, I've developed a rough solution that will automatically download the latest spreadsheet, do some conversion, and import the data into a MySQL database. It's accomplished through a Linux shell script and a Perl script, so if you're on a Windows machine, you'll have to search for a different solution.


This is the shell script, I call it oifdata.sh:


The Perl script reference here is a slight edit from xls2csv from the Spreadsheet-ParseExcel Perl package (which you will need to install). Instead of comma-separated values, it outputs tab-separated values. Also, it is specialized to start and end at the correct rows. I would like to generalize this in the future, but it's a lot of extra code at the moment.


Lastly, my database table is set up like so. This was generated using the mysqldump script included with MySQL. I'm considering adding this file to the main script so that the auto-incrementing key will go back to zero each time the data is refreshed.


There you have it. Now you too can have a script to fill your database with the latest casualty information from the DoD. How you use that data is up to you...

No comments:

Post a Comment