Showing Change with a Scatterplot in Tableau

Recently the VizWiz blog had a post showing how to create a 45-degree reference/trend line in a scatterplot. The original purpose was comparing countries based on literacy rates for females and males. Those above the line were countries where females had higher literacy rates and below were countries where males had the higher literacy rate. From there the post went on to talk about change in a couple of other datasets. But, there was an issue in producing the correct color coding to represent the change from year to year in those other datasets. This got me thinking that I wanted to see if I could figure out why the color coding of change wasn’t working as expected in those examples. (NOTE: Description of how to get the 45-degree line and fix the issue he encountered are at the end of this post.) Of course, trying to tie things to higher education, I decided to use my standby of accountability data from the Texas Higher Education Coordinating Board (THECB) which I’ve used previously.

To keep things simple I decided to only use a handful of comparison variables and limit the comparison years to Fall 2010 versus Fall 2014. The actual dataset contains more variables and years, so this could be expanded in the future to allow the user to select which years to compare as well as which set of colleges/universities. Anyway, here is what I came up with…

Notice that rather than the calculated field just determining whether change was positive or negative, I used the actual value. This allowed me to use color and shading to indicate not only the direction of change but also the degree of change over time.

To generate the 45-degree reference line you need to use a dual axis chart where you plot one of the values against itself. You then display the trend line for that chart and hide the 2nd axis header and marks (set them to transparent and make them as small as possible). Be sure to check out the VizWiz post linked above for more information on doing that.

To get the color coding, a calculated field was used to determine if the change was positive or negative. The problem is that in the original example (male vs female literacy rates) the values for each measure are in the same record (row). But in my case, and the last couple of datasets used in his example, the measures are in different records. The disaggregate approach attempts to compare the values in the same row, which results in a comparison against a null value. Using the aggregate measure in the calculated field, in my case SUM(), solved the problem for me since it now compares across the different records.