Workflow: The Age of Fort Worth interactive map


Creating the interactive web map The Age of Fort Worth required the use of multiple data sources and pieces of software. All data used to create the map is open and free to download - however, it all came in different formats! Liberal arts students in AddRan College at TCU learn to solve these types of data problems in our courses; in fact, we teach students how to use all of the software that I used in this workflow.

The workflow below is described in a linear fashion for clarity; however, the real workflow required testing things out and moving back and forth between the steps!

Step 1: Data acquisition

To get started, I need to locate and acquire the requisite data sources to build the Age of Fort Worth visualization. Building footprint data for Fort Worth are available in shapefile format from the city’s GIS portal at http://mapitwest.fortworthtexas.gov/fwgisdata/. The footprint data themselves do not have building age information encoded; however, the data do have a field to link them to parcel data from the Tarrant Appraisal District, available at http://www.tad.org/data/downloads. I choose the “PropertyData(Delimited)” link, which allows for a data download in pipe-delimited format. Both datasets are quite large; the building footprints shapefile includes 313,370 buildings, and the Tarrant Appraisal District file is over 600MB unzipped, with nearly 1.7 million records!

Before going too much further with the data, I want to make it a little more manageable to work with. The Tarrant Appraisal District file includes a number of columns that I don’t need, which take up a lot of space; all I need, specifically, is the year the property was built and the Tarrant Appraisal District ID. To parse the pipe-delimited TAD file, I use csvkit, a Python command-line utility built for working with these types of files. You can get csvkit with pip install csvkit, so long as you have Python installed. To work with the TAD data, I cd into the data directory in my command prompt, and enter the following command:

in2csv -d "|" -e "latin-1" PropertyData.csv | csvcut -c "Account_Num","Year_Built" > tad.csv

This code requires some translation. The in2csv option will convert a text file to a comma-separated values (CSV) file, which csvkit can work with; the -d option refers to the delimiter of the input file (a pipe); and the -e option refers to the file encoding, which in this case is Latin-1. I then pass the output with the pipe operator to the csvcut command, specifying the two columns I want to keep with the -c option, and write the output to a new CSV file, tad.csv. This reduces the size of the file to 21MB - much better!

I’m just about ready now to import my data into a Geographic Information System, QGIS, for further data processing. However, there is a catch. The building age data I obtained come from Tarrant County; however, some buildings in Fort Worth are in Denton County, not Tarrant County - such as the buildings around Texas Motor Speedway. In turn, I needed to get the equivalent file from Denton County, which is available from this link as “GIS DCAD Data.”" The Fort Worth building footprint data have a field to link it to the corresponding parcel IDs in data from Tarrant County and Denton County; however, there are some problems with this. Several buildings in the dataset have ID codes that match parcels in both Tarrant and Denton Counties - and there is no information about which one is the correct match! In turn, I need to do some GIS work to make sure I identify the buildings correctly. I download the “Denton County Parcel Map” geographic data as well, and move to QGIS for the next step.

Step 2: GIS processing

Once I have all of the data sources I need, I open up QGIS (http://qgis.org/en/site/), a free and open-source geographic information system. I add the Fort Worth building footprints, the processed TAD CSV, the Denton County parcel data, and the Denton County appraisal data with information about building age. This file comes in DBF format, which is readable by QGIS. A quick note - csvkit can process DBF files, but only in Python 2, not 3, which I run; to keep things simpler for this write-up, I worked with the DBF in QGIS.