Workflow: The Age of Fort Worth interactive map

Creating the interactive web map The Age of Fort Worth required the use of multiple data sources and pieces of software. All data used to create the map is open and free to download - however, it all came in different formats! Liberal arts students in AddRan College at TCU learn to solve these types of data problems in our courses; in fact, we teach students how to use all of the software that I used in this workflow.

The workflow below is described in a linear fashion for clarity; however, the real workflow required testing things out and moving back and forth between the steps!

Step 1: Data acquisition

To get started, I need to locate and acquire the requisite data sources to build the Age of Fort Worth visualization. Building footprint data for Fort Worth are available in shapefile format from the city’s GIS portal at http://mapitwest.fortworthtexas.gov/fwgisdata/. The footprint data themselves do not have building age information encoded; however, the data do have a field to link them to parcel data from the Tarrant Appraisal District, available at http://www.tad.org/data/downloads. I choose the “PropertyData(Delimited)” link, which allows for a data download in pipe-delimited format. Both datasets are quite large; the building footprints shapefile includes 313,370 buildings, and the Tarrant Appraisal District file is over 600MB unzipped, with nearly 1.7 million records!

Before going too much further with the data, I want to make it a little more manageable to work with. The Tarrant Appraisal District file includes a number of columns that I don’t need, which take up a lot of space; all I need, specifically, is the year the property was built and the Tarrant Appraisal District ID. To parse the pipe-delimited TAD file, I use csvkit, a Python command-line utility built for working with these types of files. You can get csvkit with pip install csvkit, so long as you have Python installed. To work with the TAD data, I cd into the data directory in my command prompt, and enter the following command:

in2csv -d "|" -e "latin-1" PropertyData.csv | csvcut -c "Account_Num","Year_Built" > tad.csv

This code requires some translation. The in2csv option will convert a text file to a comma-separated values (CSV) file, which csvkit can work with; the -d option refers to the delimiter of the input file (a pipe); and the -e option refers to the file encoding, which in this case is Latin-1. I then pass the output with the pipe operator to the csvcut command, specifying the two columns I want to keep with the -c option, and write the output to a new CSV file, tad.csv. This reduces the size of the file to 21MB - much better!

I’m just about ready now to import my data into a Geographic Information System, QGIS, for further data processing. However, there is a catch. The building age data I obtained come from Tarrant County; however, some buildings in Fort Worth are in Denton County, not Tarrant County - such as the buildings around Texas Motor Speedway. In turn, I needed to get the equivalent file from Denton County, which is available from this link as “GIS DCAD Data.”" The Fort Worth building footprint data have a field to link it to the corresponding parcel IDs in data from Tarrant County and Denton County; however, there are some problems with this. Several buildings in the dataset have ID codes that match parcels in both Tarrant and Denton Counties - and there is no information about which one is the correct match! In turn, I need to do some GIS work to make sure I identify the buildings correctly. I download the “Denton County Parcel Map” geographic data as well, and move to QGIS for the next step.

Step 2: GIS processing

Once I have all of the data sources I need, I open up QGIS (http://qgis.org/en/site/), a free and open-source geographic information system. I add the Fort Worth building footprints, the processed TAD CSV, the Denton County parcel data, and the Denton County appraisal data with information about building age. This file comes in DBF format, which is readable by QGIS. A quick note - csvkit can process DBF files, but only in Python 2, not 3, which I run; to keep things simpler for this write-up, I worked with the DBF in QGIS.

fig1

In the previous section, I noted that there is no field in the Fort Worth building footprint data that identifies whether buildings are in Tarrant or Denton Counties. However, as I have the parcel GIS data for Denton, I can link the data together spatially and identify which buildings match with Denton County in this way, which QGIS is ready-made to do. To get this done, I carry out the following steps:

Join the TAD data to the building footprint data on the TAD ID code, which is TAD_ACCOUN in the footprint data and Account_Num in the TAD data. There is one issue, however: QGIS has interpreted the Account_Num field in the TAD data as a string, not an integer, and added leading zeroes accordingly. I add a new field to the footprint data called AccountNum, left-padding the account number field with the following command:

lpad(  to_string(  "TAD_ACCOUN" ),  8, '0')

Now, I can perform the join. Joins in QGIS are handled from the Joins menu in the Layer Properties of my footprints data. Once I carry out the join, I save to a new shapefile to make it permanent.

fig2

I do the same for the Denton County parcel data and assessor data, with PROP_ID as the common field, saving to a new shapefile once more to make the join permanent. I limit the joined fields from the assessor data, however, to the field containing building ages.
I now need to identify which buildings are located in Denton County so that I get the building ages correct. To do this, I use a spatial join, which transfers attributes from one layer to another based on location. As such, I can transfer information on building age - which is currently contained in the Denton County parcel data - to buildings that are located on those parcels. Further, this operation will only match buildings that are located in Denton County, meaning that I can now distinguish between Tarrant and Denton in my dataset. Even better, the building footprint data and Denton County parcel data are already in the same projected coordinate system - NAD 1983/Texas State Plane North Central - so I don’t have to worry about misalignment issues.

Spatial joins in QGIS are accessed from Vector > Data Management Tools > Join Attributes by Location. I specify my footprints as the target vector layer, and the parcels as the layer I’d like to merge attributes from. I make sure to select the “Keep all records (including non-matching target records)” option, as I want all buildings available in the output shapefile.

fig3

In the output shapefile, I now have a field that shows the correct building age for buildings in Denton County, and NULL values for all Tarrant County buildings, which is what I want. I now need to take one more step to consolidate the building age information into one field. This can be accomplished in the QGIS field calculator with the following syntax, where DCADDATA_Y currently represents the Denton building year, and tad_Year_B represents the Tarrant building year:

CASE
    WHEN  "DCADDATA_Y"  IS NULL THEN  "tad_Year_B" 
    ELSE  "DCADDATA_Y"  
END

This SQL-like syntax tells QGIS to return the Tarrant value if there was no match for Denton, and otherwise return the Denton value. I now have a field that accurately represents building ages for Fort Worth. Before leaving QGIS, I create a “label” field (more to come on this), remove fields I don’t need with the Table Manager plugin to reduce file size, and then take a quick look at a potential map:

As I want the final product to be interactive, however, and because I have a lot of features, I want to display the Fort Worth building ages as a tiled web map. In turn, I move to Mapbox Studio to generate vector tiles from my data.

Step 3: Building vector tiles and styling the data

I’ve used Mapbox to build web maps before with its raster-tile generating software TileMill; however, Mapbox has since replaced TileMill with Studio (https://www.mapbox.com/mapbox-studio/). Studio is free cartographic design software that optionally can be used to create vector tiles to be hosted on and served from their cloud. In Mapbox Studio, users can upload their own data to their Mapbox account, which can then be styled with a custom stylesheet alongside OpenStreetMap data. Mapbox has recently released a new version of Studio that runs right in the web browser; this map was designed in Mapbox Studio Classic, a desktop application that preceded Studio’s current web-based iteration. TCU students will be working with the new Studio application next semester - so stay tuned for future examples with the new software!

To add my footprint data as a new data source, I tell Studio Classic to create a new project, and specify a “Blank source” to create custom vector tiles rather than use one of Mapbox’s pre-designed templates for OpenStreetMap. I then save my source and upload it to my Mapbox account, so that Mapbox can give the footprints back to me in Studio as vector tiles.

Once my data have been uploaded to Mapbox, I can create a new “style” from my source, meaning that I can apply a CartoCSS stylesheet to my data to style it just like I would any of the OpenStreetMap layers in Studio. Additionally, because my data have already been converted to vector tiles hosted by Mapbox, styling all 314k buildings in Fort Worth is very fast.

To style my buildings data, I am using the new viridis colormap, which is the new default color palette for the Python visualization library matplotlib. Watch this video for more information about the palette. Viridis satisfies most of the criteria I want for styling the buildings. It is a sequential palette, which represents the linear progression of time, and it passes through multiple hues, which is important given that buildings will be symbolized by decade, making up 11 categories. Additionally, it is a colorblind-safe palette, which is a big plus. The bright colors of the palette also make buildings stand out against the dark base layers I’ll be using. The following CartoCSS code styles my buildings in Studio:

#fw_footprints_mb {
  line-width: 0;
  [year_built = 0] { polygon-fill: #bdbbbb; }
  [year_built > 0] { polygon-fill: #440154; }
  [year_built >= 1920] { polygon-fill: #482575; }
  [year_built >= 1930] { polygon-fill: #414487; }
  [year_built >= 1940] { polygon-fill: #345F8D; }
  [year_built >= 1950] { polygon-fill: #2A788E; }
  [year_built >= 1960] { polygon-fill: #21908C; }
  [year_built >= 1970] { polygon-fill: #22A884; }
  [year_built >= 1980] { polygon-fill: #43BF71; }
  [year_built >= 1990] { polygon-fill: #7AD151; }
  [year_built >= 2000] { polygon-fill: #BCDF27; }
  [year_built >= 2010] { polygon-fill: #FDE725; }
}

One of the brilliant features of Studio is that it gives you access to all of the OpenStreetMap layers to customize and add to your map as you like. However, this can make map-making very challenging, as you have so many options available to you! I want to include enough layers to give spatial reference to my buildings, but not too many as I want the buildings to be the focus of my map. I won’t go into every little detail here; however, I chose to include roads, water features, and surrounding place names, with some zoom dependencies baked in, drawing heavily from the “Mapbox Dark” built-in style; take a look at the stylesheet on GitHub if you want to know more. I opted against showing the OSM building footprints, which would include any missing buildings in Fort Worth and buildings in neighboring cities, as I want the map to be a visual representation of the city & county data.

Before finishing my style, I want to add some interactivity to the map, so that users can get addresses and building age information when hovering over buildings. This interactivity can be achieved in Studio with Mapbox’s UTFGrid specification, which it has inherited from Studio’s predecessor, TileMill. Enabling interactivity requires editing the project.yml file located in my style’s directory on disk. I close Studio and open up project.yml in a text editor, and specify my layer name, fw_footprints_mb, as the interactivity_layer. I then set an interactivity template with the following syntax:

  <strong>Address: </strong>{{ADDRESS}}
  <br /><strong>Year built: </strong>{{year_label}}

Note the year_label field inside of the mustache tags I’m using here instead of the year_built field, which I alluded to in the previous section. In QGIS, I created the year_label field to contain the “year built” information for all buildings for which I had building age data, and to say “Data unavailable” for buildings that had either NULL values or 0 in the year_built field. I save project.yml and open up Studio again to view my style; I now have a tooltip that appears when hovering over each building. My style is now ready for uploading to my Mapbox account; I click the “Upload to Mapbox” button in Studio to send my style to my Mapbox account.

Step 4: Building the website

Now that my style is in my Mapbox account, I need to build a website to share the map with the world. I only have a few criteria for the website:

It should be simple and easy-to-use;
It should have a legend and attribution for the data sources;
It should have a navigation bar for visitors to get more information about the map and link to the Center for Urban Studies website.

For simplicity and ease of use, I turn to Mapbox.js. Mapbox Studio integrates very nicely with Mapbox.js, Mapbox’s extension of the popular Leaflet library for web mapping. In my Mapbox account, I can add my new map style to a new map, save it, and get a map ID that I can use to build my website around.

The Mapbox.js documentation has many examples to help developers build websites around their maps, and I rely heavily on the documentation to make the map. I start with the “simple map” example, and replace the map ID with my own ID. I then also add the capacity for geolocation to the map so users can look up building ages based on where they are; I follow the Mapbox instructions here as well.

I now want to add a legend. Again, I use the Mapbox.js example, located here.
The legend requires three elements: CSS that controls the legend appearance enclosed in <style></style> tags; a new <div> in the HTML to contain my legend; and a line of JavaScript that instructs the page to add the legend contained in the HTML, map.legendControl.addLegend(document.getElementById('legend').innerHTML);. However, I need to make some modifications to the code template provided by Mapbox, which is optimized for a five-class legend, and I have eleven classes. I first set up my legend <div>, with information corresponding to my classes and colors, and attribution for my sources.

<div id='legend' style='display:none;'>
  <nav class='legend clearfix'>
    <span style='background:#440154;'></span>
    <span style='background:#482575;'></span>
    <span style='background:#414487;'></span>
    <span style='background:#345F8D;'></span>
    <span style='background:#2A788E;'></span>
    <span style='background:#21908C;'></span>
    <span style='background:#22A884;'></span>
    <span style='background:#43BF71;'></span>
    <span style='background:#7AD151;'></span>
    <span style='background:#BCDF27;'></span>
    <span style='background:#FDE725;'></span>
    <label>&lt;1920</label>
    <label>1920s</label>
    <label>1930s</label>
    <label>1940s</label>
    <label>1950s</label>
    <label>1960s</label>
    <label>1970s</label>
    <label>1980s</label>
    <label>1990s</label>
    <label>2000s</label>
    <label>2010s</label>
    <small>Data: <a href="http://www.mapbox.com">Mapbox</a>; &copy; <a href="https://www.openstreetmap.org/copyright">OSM contributors</a>; <a href="http://mapitwest.fortworthtexas.gov/fwgisdata/">City of Ft. Worth</a>; <a href="http://www.tad.org/">Tarrant Appr. Dist.</a>; <a href="https://www.dentoncad.com/">Denton CAD</a></small>
</div>

However, this does not work out-of-the-box with the CSS provided by Mapbox, as the underlying map_legends class has a maximum width defined of 300 pixels, which causes my color boxes and labels to be mis-aligned. In turn, I increase the maximum width of the legend in my legend CSS, and make some small modifications to the legend style until it looks just the way I want. Still, there is one catch: I want the map to be accessible to mobile users as well. On small screens, the legend not only is mis-aligned, but also takes up too much space. In turn, I decide to simply hide the legend on small screens (on my IPhone 6, it is hidden in portrait orientation, but shows up again in landscape orientation). The following CSS gets this done:

@media (max-width: 600px) {
    .legend {display: none;}
    .map-legends {background-color: transparent;}
}

Finally, I want to add a navigation bar to the map - and I want it to be responsive, given my desire for mobile functionality. I’ve always liked the Bootswatch themes for Bootstrap, and the Cyborg theme pairs very well with my map style. I’ve long admired Bryan McBride’s BootLeaf, a template that integrates Leaflet with Bootstrap; however, I don’t need all that functionality. In turn, I use Bryan’s Leaflet/Bootstrap template that he provides here, and customize it to get the Bootswatch theme I want.

After all of this - I have the Age of Fort Worth map! In any data science/data visualization workflow like this, practitioners must be able to move back and forth between multiple types of data sources and software packages, as there is no “magic bullet” that can do everything at once. At TCU and in the Center for Urban Studies, we endeavor to train students to manage these types of complex workflows, and apply them in real-world scenarios to prepare them for their professional futures. Thanks for reading!