Interesting blog post by Steve Wexler of Data Revelations. Long article, long title: “Circles, Labels, Colors, Legends, and Sankey Diagrams – Ask These Three Questions”.

The really interesting part for the Sankey diagram aficionados is Steve’s advice on when to use Sankey diagrams, and when you should avoid using them.

Steve illustrates his point with the below example by ‘Music Major – Data Miner’ Jeffrey A. Shaffer (original post is here)

A combination of a stacked bar chart with a distribution diagram, nicely decorated with a trumpet … “Within this context, this very creative chart works”, Steve writes.

He then goes on and shows another one by Shaffer, also a distribution diagram: the original pie chart data from an energy bill has been redesigned and was presented as a distribution diagram (two stacked bars with bands to link them)

In this case, Steve concludes, the choice of a Sankey diagram is maybe not that wise, since the actual important information (44% of energy cost is for heating) doesn’t really come across quickly and clearly. A bar chart might work better here. Sankey diagrams can create a “cool!” or a “crap!” response, depending on the context. See the original Shaffer post here.

Adding my 2c from a technical perspective I would say that both diagrams have a shortcoming: The bands don’t maintain their width as they cross over the others diagonally. Somewhat acceptable in the trumpet diagram as the right bar on the right side listing the music composers is higher than the one at trumpet bell (sound spreading out). Not acceptable in the second diagram where the two stacked bars have the same height. This is obviously an error in the curve radius calculation (read ‘The Math Behind those Curves’)

Austrian technical consulting firm pro-wel offers process engineering services to its customers. Their website features two Sankey diagrams, one of which is a rare circular one with curved arrows (see others).

I also like the technical frame around the diagram, a must have in engineering and architecture.

I really liked Will Stahl-Timmins’ article on how he developed an infographic on energy consumption in a city.

Will’s blog is called ‘Seeing is Believing’ and his central claim is that information graphics are “the visual transformation of data into understanding”. I agree: infographics are more than just a diagram and labels. They are much more “visual” and their design elements add to a better understanding. Diagrams convey data, infographics convey information. Typically they also have a broader audience: you would find a diagram in a scientific paper, but an infographic in a daily newspaper.

The article ‘Visualising city energy policies’ gives a very good insight into the reasoning of an infographer/designer when creating an infographic. Will describes how he started out from an ordinary Sankey diagram, to get to an infographic step-by-step. This involved studies of different alternatives, sketches on paper, discussions with colleagues, presentations, and many different versions of the infographic in Illustrator…

He experimented with an isometric or what he calls a “pseudo-3D” perspective, but also discovered some shortcomings in using them.

Crossing arrows were an issue. So were the stacked nodes (cubes) that hid parts of flows and were difficult to label.

The “intermediate” outcome of his meticulous work was the below infographic. It seemed to have been a long learning process to achieve this result.

Will went on to include feedback he had gotten from fellow researchers, and decided to add more information on imported energy. At the same time he had to reduce the level of detail. This is the final infographic.

Good work, I think! The resulting infographic is not a genuine Sankey diagram anymore. There are only three arrow widths left, quantities are clustered in these groups. But as I said, an infographic has a different purpose.

It is not mentioned clearly how this infographic will finally be used, and who the target audience is. I imagine it will be used as an illustration in a brochure that summarizes the findings of the URGENCHE project, but to a wider, non-technical audience.

Make sure you read the full blog post at ‘Seeing is Believing’.

I liked the below 3-in-1 Sankey diagram from the e!Sankey website. Actually three different Sankey diagrams of the a steam generation process.

The first is a quantitative (mass) view of the process where water, steam, gaseous emissions are shown in kilograms:

Using the same basic structure, the second shows the energy content within the flows. Values are in MJ. Temperature is shown as additional information with a lighter color.

And finally the temperature only Sankey diagram of the steam generation process. Here the width of the arrows shows the temperature of the steam or gas.

In the background is a transparent technical process diagram of the steam process. Thanks to Michael for providing these Sankey diagrams.

Rob has made a new online tool for distribution diagrams using d3.js. Read more about it here. Sankeybuilder.com can be tried out at the heatmap.ca website.

I added Sankeybuilder.com to the list of Sankey software.

I got alerted by Google to a blog post by Maruthi Jampani at the Express Analytics blog. Sure, I am always excited to get fresh new Sankey diagrams worth to be reported here. But more and more I find distribution diagrams like the one shown in the article ‘Power of Sankey Diagram in Data Visualization’ … and get disappointed. Well, not really. The term ‘Sankey diagram’ has gained a certain popularity over the past years, which is good. With the increase in use of d3.js, Parsets or Fineo we see more of these distribution diagrams.

Time to talk again about distribution diagrams again?

My two posts back in 2009 (‘Infographics Experts on Sankey Diagrams (Part 1)’ and ‘Infographics Experts on Sankey Diagrams (Part 2)’) were based on a good and funny article by Chiqui Esteban at infografistas.blogspot.com. He suggested several names (in Spanish) for this type of diagram and concluded that the best term is distribution diagram.

The Parsets page explains that they are a “visualization … for categorical data, like census and survey data, inventory, and many other kinds of data that can be summed up in a cross-tabulation. (…) Between the dimension bars are ribbons that connect categories and split up. This shows you how combinations of categories are distributed, and how a particular subset (…) can be further subdivided.”

So we have categories and dimensions. And ribbons that connect them.

Distribution diagrams have commonalities with Sankey diagrams. In fact, one very central characteristic is that the width of the band is proportional to the quantity it represents. In Sankey diagrams the width of the arrow (!) is proportional to the quantity of the flow represented. So they do qualify as Sankey diagrams, but I would say they should be considered a subset or specific type of Sankey diagrams. As I pointed out in a May 2012 post:

It is exactly the fact that these are not directed flows, but rather quantities that are distributed over categories (or dimensions). There is no time relation in them, neither are there flows “from” (e.g. Finance) “to” (e.g. Reporting) or the other way round. These are bands hooked between nodes rather than arrows leading from one node to another. Each category could be represented by a pie chart as well

So I do agree that distribution diagrams (or spaghetti diagrams, swim lane diagrams) are a subset of Sankey diagrams. But Sankey diagrams are more, there is more to them.

I may have to emphasize the genuine Sankey diagrams in the future. Flows in process systems, from one machine to another. Energy input into a boiler, and heat being distributed as steam to other parts of the plant. Streams of people moving between halls at a trade fair. Water being pumped back in loops. Value streams along a supply chain, where each processing step adds to the value of the product. And much more…

Another Sankey diagram of Canada’s energy flows is featured in a blog post titled ‘Dividing the Big Picture: Visualizing Provincial Diversity’. The post appeared May 5, 2014 on the Canadian Energy Systems Analysis Research (CESAR) blog by David B. Layzell, Professor at the University of Calgary. It is a follow up to a previous CESAR blog post that showed “the big picture” for Canada (featured in a recent post here on the blog).

“The Sankey diagram below shows only the domestic portion of Canada’s energy systems. (…) It also shows how much of that demand is met by oil/petroleum (red), natural gas (blue), electricity (yellow), biomass-derived products (green) or other energy resources.”

Flows are in GJ per capita. This relative unit is different to the other national energy flow diagrams I have presented here on the blog. But it is interesting for differentiating energy consumption in the different provinces.

The article explains:

“There are significant inter-provincial differences associated with each end-use category. For example, British Columbia (BC) residents had the lowest residential energy use in the nation, at 63% of the per capita energy use in Alberta (52 GJ/capita), the national leader in this category. The balmy BC climate compared to what Albertans face each winter accounts for most of this difference. However, our model also draws on government data showing that many BC buildings tend to be better insulated than those from much colder Alberta.”

Check out all Sankey diagrams tagged ‘Canada’ here.

Only a few hours left until the kick-off of the FIFA World Cup in Brazil … A reader from Germany recently sent me a clipping from the May edition of Germanwings inflight magazine (read it online here). The article on page 36/37 has this Sankey diagram:

Interesting visualization, though not fully in line with the basic rules for Sankey diagrams. The width of the bands represents the number of times the world cup has been won. The main issue is that only eight of the participating countries have ever won the cup (Brazil, the pentacampeão won it 5 times, so far…). For most of the nations shown, the green stream or arrow thus stands for zero wins. Zero (nil) however is impossible to display in a Sankey diagram, if you want to maintain the basic rule of arrows being proportional in width to the quantity displayed by them.

Several approaches have been proposed for the “zero quantity flows” such as a thin dotted line, or a thin line with a label “no flow”, or a colourless line. In the above case the choice of the diagram type is – in my opinion – not the luckiest one. The main message is that all teams are dreaming of getting to Rio’s Maracanã stadium on July 13.

Also see my two posts for the 2010 world cup here and here with a slightly different Sankey diagram.