In this post I had criticized a Sankey diagram depicting FIFA accounts published at BBC News. By drawing operating profits disproportionally they would overemphasize certain arrows.

Here is my version of the diagram, based on the values given in the article by Paul Sargeant (no warranty for the accuracy of these numbers). The orange arrow represents the operating profits, this time at the same scale.

Found out via the news feed from ifu Hamburg, maker of e!Sankey that they have released an SDK based on e!Sankey that allows software makers to integrate Sankey visualizations into their application.

Two main features help to achieve this: (1) building Sankey diagrams from an XML file that contains structural and layout information (2) feeding values into a Sankey diagram template by reading ID/value pairs from a CSV file.

Interesting blog post by Steve Wexler of Data Revelations. Long article, long title: “Circles, Labels, Colors, Legends, and Sankey Diagrams – Ask These Three Questions”.

The really interesting part for the Sankey diagram aficionados is Steve’s advice on when to use Sankey diagrams, and when you should avoid using them.

Steve illustrates his point with the below example by ‘Music Major – Data Miner’ Jeffrey A. Shaffer (original post is here)

A combination of a stacked bar chart with a distribution diagram, nicely decorated with a trumpet … “Within this context, this very creative chart works”, Steve writes.

He then goes on and shows another one by Shaffer, also a distribution diagram: the original pie chart data from an energy bill has been redesigned and was presented as a distribution diagram (two stacked bars with bands to link them)

In this case, Steve concludes, the choice of a Sankey diagram is maybe not that wise, since the actual important information (44% of energy cost is for heating) doesn’t really come across quickly and clearly. A bar chart might work better here. Sankey diagrams can create a “cool!” or a “crap!” response, depending on the context. See the original Shaffer post here.

Adding my 2c from a technical perspective I would say that both diagrams have a shortcoming: The bands don’t maintain their width as they cross over the others diagonally. Somewhat acceptable in the trumpet diagram as the right bar on the right side listing the music composers is higher than the one at trumpet bell (sound spreading out). Not acceptable in the second diagram where the two stacked bars have the same height. This is obviously an error in the curve radius calculation (read ‘The Math Behind those Curves’)

I got alerted by Google to a blog post by Maruthi Jampani at the Express Analytics blog. Sure, I am always excited to get fresh new Sankey diagrams worth to be reported here. But more and more I find distribution diagrams like the one shown in the article ‘Power of Sankey Diagram in Data Visualization’ … and get disappointed. Well, not really. The term ‘Sankey diagram’ has gained a certain popularity over the past years, which is good. With the increase in use of d3.js, Parsets or Fineo we see more of these distribution diagrams.

My two posts back in 2009 (‘Infographics Experts on Sankey Diagrams (Part 1)’ and ‘Infographics Experts on Sankey Diagrams (Part 2)’) were based on a good and funny article by Chiqui Esteban at He suggested several names (in Spanish) for this type of diagram and concluded that the best term is distribution diagram.

The Parsets page explains that they are a “visualization … for categorical data, like census and survey data, inventory, and many other kinds of data that can be summed up in a cross-tabulation. (…) Between the dimension bars are ribbons that connect categories and split up. This shows you how combinations of categories are distributed, and how a particular subset (…) can be further subdivided.”

So we have categories and dimensions. And ribbons that connect them.

Distribution diagrams have commonalities with Sankey diagrams. In fact, one very central characteristic is that the width of the band is proportional to the quantity it represents. In Sankey diagrams the width of the arrow (!) is proportional to the quantity of the flow represented. So they do qualify as Sankey diagrams, but I would say they should be considered a subset or specific type of Sankey diagrams. As I pointed out in a May 2012 post:

It is exactly the fact that these are not directed flows, but rather quantities that are distributed over categories (or dimensions). There is no time relation in them, neither are there flows “from” (e.g. Finance) “to” (e.g. Reporting) or the other way round. These are bands hooked between nodes rather than arrows leading from one node to another. Each category could be represented by a pie chart as well

So I do agree that distribution diagrams (or spaghetti diagrams, swim lane diagrams) are a subset of Sankey diagrams. But Sankey diagrams are more, there is more to them.

I may have to emphasize the genuine Sankey diagrams in the future. Flows in process systems, from one machine to another. Energy input into a boiler, and heat being distributed as steam to other parts of the plant. Streams of people moving between halls at a trade fair. Water being pumped back in loops. Value streams along a supply chain, where each processing step adds to the value of the product. And much more…

Only a few hours left until the kick-off of the FIFA World Cup in Brazil … A reader from Germany recently sent me a clipping from the May edition of Germanwings inflight magazine (read it online here). The article on page 36/37 has this Sankey diagram:

Interesting visualization, though not fully in line with the basic rules for Sankey diagrams. The width of the bands represents the number of times the world cup has been won. The main issue is that only eight of the participating countries have ever won the cup (Brazil, the pentacampeão won it 5 times, so far…). For most of the nations shown, the green stream or arrow thus stands for zero wins. Zero (nil) however is impossible to display in a Sankey diagram, if you want to maintain the basic rule of arrows being proportional in width to the quantity displayed by them.

Several approaches have been proposed for the “zero quantity flows” such as a thin dotted line, or a thin line with a label “no flow”, or a colourless line. In the above case the choice of the diagram type is – in my opinion – not the luckiest one. The main message is that all teams are dreaming of getting to Rio’s Maracanã stadium on July 13.

Also see my two posts for the 2010 world cup here and here with a slightly different Sankey diagram.

This article on ‘A Pilot for Measuring Energy Retrofits’ describes how researchers from the EEB Hub used an old navy building in Philadalphia to “determine detailed system performance”.

EEB Hub researchers outfitted Building 101 with sensors and a data acquisition system to determine detailed system performance, building energy loads, indoor environmental quality (IEQ), and a detailed operation of the building control system. … The sensors read data from 509 sensing points, collecting 1,048 pieces of data at one-minute intervals. These data points track indoor air quality, occupant comfort, and building energy use.

The result of that “inverse modelling” (i.e. measuring) approach are presented in Sankey diagrams and are used “to identify discrepancies in the predicted versus actual energy balance”.

There are significant differences between the January energy use…

… and the energy picture in July

While in winter mainly natural gas is used for heating, the gas consumption in summer is down. In July electricity consumption is significantly higher due to air conditioning.

Unfortunately no unit of measurement is given (it could be kWh), but nevertheless proportions of the energy flows are correct.

Just came across this video featuring a “Sankey diagram of the Taiwan economy, jobs and energy in 2010″ by ARUP (uploaded to vimeo by user Simon Roberts).

The underlying model is called “4see-TW” framework and has been created to “investigate the structure and function of an economy in a resource-constrained world”.

This is certainly exciting… howevever one must be warned that the Sankey diagram includes different “dimensions”: energy flows, value streams (money flows) and jobs. These three perspectives probably have different unit types and units (such as, e.g. TJ for energy, Euro or US$ or New Taiwan Dollar TWD for values, and persons or workplaces for jobs). Hence the width of the Sankey arrows mustn’t be compared to each other across the unit types.

Haven’t found the time yet to dig more into the 4see-TW model, but here is one starting point (edit: link doesn’t work any more) for those interested.

A great post on Sankey diagrams at the visualign blog led me to Sam Calisch’s PDF at github. It contains some insight on the maths behind the drawing of Sankey diagram curves, especially the type known as spaghetti diagram or distribution Sankey diagram (see discussion here).

There are some great scribbles in this paper that I wanted to share.

And I especially like this one, with the little man using the Sankey arrow as a slide…

The article is well-worth reading, so if you are into programming a Sankey software (the Mathematica workbook for Sankey might be a starting point), please download and study it.