Sankey Diagram using D3.js Part 1 of 2

Among other things , I’ve been itching to master some D3.js tricks, mainly because the plugin lets you do some pretty gorgeous stuff, and there’s a wide variety of visualizations which are highly customizable. Recently, I finally had a few minutes to try something out. Since my work entails working with Statistics Canada data, or anything to do with start ups in Ontario I figured I would go for something that has nothing to do directly with that world.

This led me to tracking down some elections donation data from the city of Toronto’s open data repository which was the donor list from the 2006 mayoral election. The title said it included 2009 as well, which sucked me in because that’s what I really wanted to use. I was disappointed when I found out it was only 2006, but figured it was OK because either way I was just playing around.

The first part of this two-part series will describe how to lay out the data to get it ready for a Sankey diagram. The second part will talk about how I actually got the visualization going in D3.js. The data as it is presented shows each donor, their postal code, and whether they are a corporation or not and (of course) the candidate who received the donation. Sankey diagrams don’t need that level of detail, and I just wanted to show the movement of money from different parts of the GTA (and beyond) and how that money flowed to each candidate. So the first thing you do is you summarize by FSA (the first three digits of the postal code), while keeping the dollar amount, candidate name and type of donor (corporation vs. individual). I don’t really care how you do it: just run a pivot table, or something, just get that dollar amount by FSA.

Next, get the area names by FSA region from Wikipedia so that you can distinguish different areas in a readable manner. Now you want to present the data in well-formed JSON like this:
{"nodes":[
{"name":"Brockville"},
{"name":"Central Toronto"},
.....
],
"links":[
{"source":0, "type": "Individual", "target":26, "value":500},
{"source":5, "type": "Individual", "target":16, "value":200},
....
]}

A quick note here that sankey.js (the library plugin to include with D3.js) is kind of picky and “value” (as above) is pretty immutable.
Don’t get caught like I did by using “amount” instead of “value”.

Finally, every single node (both region and mayoral candidate) gets included in the list of “names”. Then the “source” and “target is whomever is on the list of nodes, in order, starting at zero. In the example above, Brockville = 0. So now that you have the JSON explained, just create the JSON file and you are ready to go to part 2.

Related posts