Interesting blog post by Steve Wexler of Data Revelations. Long article, long title: “Circles, Labels, Colors, Legends, and Sankey Diagrams – Ask These Three Questions”.

The really interesting part for the Sankey diagram aficionados is Steve’s advice on when to use Sankey diagrams, and when you should avoid using them.

Steve illustrates his point with the below example by ‘Music Major – Data Miner’ Jeffrey A. Shaffer (original post is here)

A combination of a stacked bar chart with a distribution diagram, nicely decorated with a trumpet … “Within this context, this very creative chart works”, Steve writes.

He then goes on and shows another one by Shaffer, also a distribution diagram: the original pie chart data from an energy bill has been redesigned and was presented as a distribution diagram (two stacked bars with bands to link them)

In this case, Steve concludes, the choice of a Sankey diagram is maybe not that wise, since the actual important information (44% of energy cost is for heating) doesn’t really come across quickly and clearly. A bar chart might work better here. Sankey diagrams can create a “cool!” or a “crap!” response, depending on the context. See the original Shaffer post here.

Adding my 2c from a technical perspective I would say that both diagrams have a shortcoming: The bands don’t maintain their width as they cross over the others diagonally. Somewhat acceptable in the trumpet diagram as the right bar on the right side listing the music composers is higher than the one at trumpet bell (sound spreading out). Not acceptable in the second diagram where the two stacked bars have the same height. This is obviously an error in the curve radius calculation (read ‘The Math Behind those Curves’)

The below is a section from a larger Sankey diagram by Adrián Chiogna, shown at visualize.org. This is for budget flow and activity based costing.

Check out the full image at visualize.org.

The blog post I mentioned yesterday also has two fine examples of Sankey diagrams done with Mathematica. Nice colorful ones. Black background.

The first one is on cost flows and their distribution onto accounts, expenses, activities, and products (each represented by a column with nodes).

The other is by Sam Calisch (the author of the Mathematica workbook) and visualizes the efficiency of energy use in Australia.

Actually this second diagram is made up of two Sankey diagrams: the main one for Australia, the overlay one with the values for New South Wales. A cool idea to show the share of NSW.

The author of the post at visualign says that he “wouldn’t be surprised to see Sankey Diagrams make their way into modern data visualization tools such as Tableau or QlikView, perhaps even into Excel some day…”. This is of course an idea I like…

Chiqui Esteban who runs the Spanish blog infografistas.com had two posts back in March/April about a discussion he had with his colleague Xocas on how to name Sankey diagrams. Or, to be more precise: how a certain type of diagram that is more and more used in infographics should be named correctly.

They are absolutely funny, so I am trying to give you a translation of these two blog posts. This is part 1 for a post from March 17 titled “Gráficos de erogación”. I left some words in Spanish and my comments in square brackets.

– translation start –

Distribution Graphics

A couple of months ago, Xocas and I discussed via GTalk what the name, or what should be the name of the diagrams with the little arms ['gráficos de bracitos']. As it turned out, the winner name was volume flow graphics ['gráfico de caudales'].

Today, we decided to withdraw our proposal and we are going to call them ‘distribution graphics’ instead ['gráficos de erogación'].

This is because of the coffee. The coffee machine of my new employer www.lainformacion.com (click the link, we are already up running), shows the message ‘distributing’ ['erogando'] while you wait for your cup to be filled. Looking in the RAE [note: Real Academia Española], the verb ‘erogar’ is defined as:

(Del lat. erogāre).

1. tr. Distribuir, repartir bienes o caudales. [distribute, share the goods or funds]
2. tr. Méx. y Ven. Gastar el dinero. [México and Venezuela: spend money]

This definition is spot on. So we shouldn’t continue to call them ‘little arms’ ['de bracitos'], ‘tubing’ ['de tubería'], ‘squid’ ['de pulpo'], ‘tree-roots’ ['raíces'] or whatever diagrams any more. But don’t say that we didn’t work hard in finding the correct nomenclature. As we have to do. So Tufte will… ['A Tuftear'].

– translation end –

The accompanying Sankey diagram apparently is from the New York Times and shows how 21.4 billion $ in federal aid for NYC after 9/11 were distributed (hey! there you are, a ‘distribution diagram’ ;-) ). Funny enough, the caption says: “The figure above is an attempt to bring sources of funds together and show how they add up (sic!) to $ 21.3 billion”.

So what is distribution for one, is “adding up” from another perspective.

Part 2, the translation of “Caudales, erogación… ¿flujo?” and a summary of the comments to follow.

Note (Aug 19): A case of DYRF, do your research first! I just detected that Chiqui himself has an English version of his article here. So, now you got the choice between two versions!

GWP guy at Green World Pictures blog posted an article on average spendings on energy in an U.S. household.

Data is from an Energy Star flyer, that presents the data in a pie chart. The average yearly 1900 US$ for energy are spent as follows:


Heating and cooling is almost half of the spendings on energy, followed by water heating and lighting.

Nathan at FlowingData – Strength in Numbers presented a Sankey diagram by AP’s Nicolas Rapp and Damiko Morris (originally from this post on Nicolas’ blog). It shows where the $173 billion AIG received from government went to.

I especially like the inverse waterfall arrow endings and how they intersect with the grid of beneficiaries.

Nicolas, who works in Information Graphics for Associated Press, later presented another Sankey diagramm, displaying how the “nearly $12 trillion that was allocated in programs affecting the financial services industry” were used.

The author says “I spent the day researching and realizing this graphic” (@Nick: how much time was the research, how much the drawing?)

He adds “Fun stuff”, a comment which probably refers to the Sankey graphics part rather than to the content depicted… :-(

Sam Brenner, interactive design and development student at the Rochester Institute of Technology, has finished version 0.2 of his ‘Sankey Generator’ tool.

Inspired by state federal budgets Sam pursues to display financial figures in a clear and comprehensible way. Sources of state income are on the left, spendings on the right. As Sam says himself, this is still work in progress. “I’m trying to make a dynamic Sankey Diagram generator (…) What I would like to end up with is a program that can take numeric data like a budget and turn it into a diagram…”.

See that small step at the bottom of the middle part? Hey, here you have the “deficit”…

Interesting new tool. Not sure if the Sankey Generator tool will reach a status that would allow Sam to release it publicly, but have added it to my Sankey software list anyway. Hope version 0.3 has some fancier colors, though ;)

In early November I was pointed to an image on the Innovation Strategy Canada website [the website itself is not accessible any more] by a reader of this blog. Peter asked whether I know of any Sankey diagrams for financial flows, like they are shown in the one below.

The diagram visualizes the sources of R&D funding, and the institutions receiveing this funds. Data is from Statistics Canada for 2006 and shown in Mio (supposedly) Canadian Dollars.

While there are only four different arrow widths to show the financial flows, the interesting thing is that the sums of funds from each source and received by each beneficiary are shown as cylinders (database symbols, tanks, …).

I quickly did several versions of the diagram, but was not too happy with the results. The flow quantities are OK, but as it turns out, it is difficult to see the volume of the cylinder, supposedly to scale with the sums. This information is redundant anyway, since the width of the joined arrows at their base or at their head is exactly the sum that is supposedly to be shown by the cylinder volume.

Here is one version of my Sankey diagram for R&D funding in Canada for 2006 based on the original image. I decided to make the boxes in different sizes (the problem remains the same: can one immediately grasp the area of each box).

Your comments are welcomed. Is there a better way to display the sums?