I got alerted by Google to a blog post by Maruthi Jampani at the Express Analytics blog. Sure, I am always excited to get fresh new Sankey diagrams worth to be reported here. But more and more I find distribution diagrams like the one shown in the article ‘Power of Sankey Diagram in Data Visualization’ … and get disappointed. Well, not really. The term ‘Sankey diagram’ has gained a certain popularity over the past years, which is good. With the increase in use of d3.js, Parsets or Fineo we see more of these distribution diagrams.

Time to talk again about distribution diagrams again?

My two posts back in 2009 (‘Infographics Experts on Sankey Diagrams (Part 1)’ and ‘Infographics Experts on Sankey Diagrams (Part 2)’) were based on a good and funny article by Chiqui Esteban at infografistas.blogspot.com. He suggested several names (in Spanish) for this type of diagram and concluded that the best term is distribution diagram.

The Parsets page explains that they are a “visualization … for categorical data, like census and survey data, inventory, and many other kinds of data that can be summed up in a cross-tabulation. (…) Between the dimension bars are ribbons that connect categories and split up. This shows you how combinations of categories are distributed, and how a particular subset (…) can be further subdivided.”

So we have categories and dimensions. And ribbons that connect them.

Distribution diagrams have commonalities with Sankey diagrams. In fact, one very central characteristic is that the width of the band is proportional to the quantity it represents. In Sankey diagrams the width of the arrow (!) is proportional to the quantity of the flow represented. So they do qualify as Sankey diagrams, but I would say they should be considered a subset or specific type of Sankey diagrams. As I pointed out in a May 2012 post:

It is exactly the fact that these are not directed flows, but rather quantities that are distributed over categories (or dimensions). There is no time relation in them, neither are there flows “from” (e.g. Finance) “to” (e.g. Reporting) or the other way round. These are bands hooked between nodes rather than arrows leading from one node to another. Each category could be represented by a pie chart as well

So I do agree that distribution diagrams (or spaghetti diagrams, swim lane diagrams) are a subset of Sankey diagrams. But Sankey diagrams are more, there is more to them.

I may have to emphasize the genuine Sankey diagrams in the future. Flows in process systems, from one machine to another. Energy input into a boiler, and heat being distributed as steam to other parts of the plant. Streams of people moving between halls at a trade fair. Water being pumped back in loops. Value streams along a supply chain, where each processing step adds to the value of the product. And much more…

Another Sankey diagram of Canada’s energy flows is featured in a blog post titled ‘Dividing the Big Picture: Visualizing Provincial Diversity’. The post appeared May 5, 2014 on the Canadian Energy Systems Analysis Research (CESAR) blog by David B. Layzell, Professor at the University of Calgary. It is a follow up to a previous CESAR blog post that showed “the big picture” for Canada (featured in a recent post here on the blog).

“The Sankey diagram below shows only the domestic portion of Canada’s energy systems. (…) It also shows how much of that demand is met by oil/petroleum (red), natural gas (blue), electricity (yellow), biomass-derived products (green) or other energy resources.”

Flows are in GJ per capita. This relative unit is different to the other national energy flow diagrams I have presented here on the blog. But it is interesting for differentiating energy consumption in the different provinces.

The article explains:

“There are significant inter-provincial differences associated with each end-use category. For example, British Columbia (BC) residents had the lowest residential energy use in the nation, at 63% of the per capita energy use in Alberta (52 GJ/capita), the national leader in this category. The balmy BC climate compared to what Albertans face each winter accounts for most of this difference. However, our model also draws on government data showing that many BC buildings tend to be better insulated than those from much colder Alberta.”

Check out all Sankey diagrams tagged ‘Canada’ here.

Only a few hours left until the kick-off of the FIFA World Cup in Brazil … A reader from Germany recently sent me a clipping from the May edition of Germanwings inflight magazine (read it online here). The article on page 36/37 has this Sankey diagram:

Interesting visualization, though not fully in line with the basic rules for Sankey diagrams. The width of the bands represents the number of times the world cup has been won. The main issue is that only eight of the participating countries have ever won the cup (Brazil, the pentacampeão won it 5 times, so far…). For most of the nations shown, the green stream or arrow thus stands for zero wins. Zero (nil) however is impossible to display in a Sankey diagram, if you want to maintain the basic rule of arrows being proportional in width to the quantity displayed by them.

Several approaches have been proposed for the “zero quantity flows” such as a thin dotted line, or a thin line with a label “no flow”, or a colourless line. In the above case the choice of the diagram type is – in my opinion – not the luckiest one. The main message is that all teams are dreaming of getting to Rio’s Maracanã stadium on July 13.

Also see my two posts for the 2010 world cup here and here with a slightly different Sankey diagram.

Featured on the Canadian Energy Systems Analysis Research (CESAR) blog is the below Sankey diagram on Canada’s Energy Flows in 2010. The article reports about a new model called ‘CanESS’ (Canadian Energy Systems Simulator) developed by Technologies Inc. and the University of Calgary.

Pulling together data from different sources the tool can visualize energy flows as Sankey diagrams.

The big picture of Canadian energy in 2010 is as follows:

“Canadian primary energy production in 2010 was nearly 25,600 PJ, and after including 3,700 PJ of imports, total primary energy availability was 29,500 PJ. As the Sankey diagram shows, 58% was exported, with the remaining 42% or 12,500 PJ being used domestically, 910 PJ for non-energy applications and 11,652 PJ for the provision of energy end use services to Canadians.”

Read the full article by Ralph Torrie on “the big picture” here.

There is an interactive version that allows you to choose the year (1978-2010), to break down the data onto each Canadian province, or change the unit. Try it out!

Blog reader Panalion sent me a photo taken in Amsterdam’s Botanical Garden. It is of a map showing coffee and tea flows from producing countries to mainly Europe and North America. Panalion writes “I thought you might like this Sankey map I found attached to a cable between two palm trees. There were chairs set up to accomodate school classes”.

This map is for didactic purposes and features no absolute figures and no year. In addition to the export flows of coffee and tea shown as arrows the map also has circles of three different sizes representing percentage of world production of coffee, tea and cocoa in the originating country.

Infographers might have better ways of showing this information. But in this case I think it is sufficient to get the message across to the target audience, the school kids.

Following yesterday’s post on d3.js with Sankey library used to allowing online creation of Sankey diagrams I went to update myself on developments around d3.js/Sankey.

Found this discussion on color gradients in Sankey diagrams at Stackoverflow quite interesting. User Amelia BR has created this example:

AmeliaBR points out that this solution “will only work because the paths are almost straight lines, so a linear gradient will look half-decent — setting a path stroke to a gradient does not make the gradient curve with the path!”.

I remembered that e!Sankey had color gradients too and checked how the software handles this. Here are two different versions I quickly did:

The first one shows the nodes (starting point/destination point of an arrow) and confirmed my guess that the definition of the too/from colors of the gradient must be based on the fill color of the node. In fact you have the option too choose “Gradient from Source” and/or “Gradient to Destination” flags separately.

When dragging the nodes around the arrow colors do show some artifacts and the images refreshes when the action is terminated. I then did an improved version of the diagram by hiding the nodes (left only node B for comparison) and setting a negative undercut at the node. This moves the spear arrow head closer to the foot of the following arrow which gives a nice effect, I think.

Finally, I also learned it is probably wiser to limit oneself to a range of harmonizing colors rather than using too many different colors (as I did).

Steve Bogart has released a website for autoMATICally creating simple horizontal distribution diagrams. No need to install a tool, just go to sankeymatic.com and enter your values. On each line define source node, quantity in square brackets, destination node (e.g. “Budget [450] Housing” or “Budget [300] Food”). Columns and bands will be created automatically.

A number of options can be set, such as colors, spacing and labels. Finally, when you have created your diagram you can download it directly (three sizes/resolutions available).

This simple online tool is based on the open source tool D3.js and its Sankey library.

Try it out yourself!

I have added SankeyMATIC to the list of software tools for Sankey diagrams (seriously thinking about creating an own group for d3.js-based products).

An energy flow chart for energy use in the residential building sector is shown on the Autodesk Sustainability Workshop page ‘Measuring Building Energy Use’. There is also a similar Sankey diagram for energy sources consumption in the commercial building sector.

Both are taken from a Pacific Northwest National Laboratory (PNNL) report from 2006 prepared for Department of Energy (DOE) titled ‘Energy End-Use Flow Maps for the Buildings Sector by D.B. Belzer (PNNL-16263).

Residential building sector energy flow chart:

Commercial building sector energy flow chart:

Both Sankey diagrams are built up the same way. The top part of each diagram shows electricity generation, the bottom part the energy flows for heating. Significant conversion and transmission losses can be identified by the arrow branching out at the top. Flows from the left represent the energy sources: coal (brown), natural gas (blue), biomass/solar (green). To the right the flows are broken down to the individual consumption, such as heating, cooling, lighting, other electric appliances, etc.

All units are in quadrillion BTUs for the U.S in 2004.