Category: Methodology

US Trade Flows, yet another option

After showing two variants for visualizing the U.S trade balance in my last post, I got aware of yet another option. The first figure (infographic by Spiegel Online) used the length of the arrows to express the value of imported and exported goods. My remake version used the magitude (width) of the arrows, as is typical for Sankey diagrams.

In this figure (by Anthony Cohen, University of Illinois, 2012 / Wikicommons) for US trade in 2011 the arrows for import (red) and export (green) are proportional to the total value of goods, just as we are used to see it in a Sankey diagram. But the arrows are superimposed, with the narrower green export arrow on top of the wider red import arrow. This creates another, somewhat more dramatic impression.

Data shown is for 2011 in billion USD for the 15 most important trade partners. Arrows are not labeled with absolute figures, instead a legend at the bottom indicates the width of five default arrows. The arrow from and to Mexico is a problem (no joke intended!), but the legend clarifies that arrows don’t indicate a specific geographic routing.

US Trade Balance, two versions

When German Chancellor Angela Merkel meets with POTUS today, one topic that’s most likely going to be addressed is the trade deficit between the United States and the EU, Germany in particular.

The Spiegel, a major German news outlet, has illustrated recent articles on this subject with the figure below. It shows the volume of trade between the United States and ‘selected countries’ (China, Canada, Mexico and the EU) in 2015. The values indicate the value of goods exported (green arrows) to these countries, and imported (blue) from them into the U.S. in billion US$.


Source: Spiegel Online

The interesting thing in this infographic is that the length of the arrows represents the value of goods traded. For example, the arrow for exports from the US to Europe (274 bnUS$ in 2015) is little over half the length of the blue incoming arrow (431 bnUS$ in 2015). This works fine, with the only exception being the green arrow for exports to Mexico.

This infographic of course invited a remake as Sankey diagram. As you all know, in Sankey diagrams the widths of the arrows represent the quantity.

I did two or three different versions, all very similar to the original infographic in style and color, even using the lower states map icon (sorry Alaska and Hawaii). I was not sure at first whether the separate arrows for Germany were values already included in the EU trade volume, or if they were meant to be on top of it. A quick look into the original data revealed that indeed they are included in the EU figures already. I therefore decided to highlight the German share in the Sankey diagram with a slightly brighter color, but keep those arrows stacked.

Here is my Sankey diagram version of the Spiegel infographic.

Not sure which version I prefer, but using the length instead the widths of the arrows to represent the flow quantity is definitely a unique approach. Worth sharing with you, I think.

River Flow Volume and Temperature

A nice idea for the use of Sankey diagrams can be found on this web page of the U.S. Army Corps of Engineers (USACE) in the Portland OR area.

The diagram shows the flow of the Rogue river and its tributary streams. The fact that the river flows east to west makes this one of the rare examples of a right-to-left orientated Sankey diagrams.

The water volume is represented by the width of the arrow in each segment. Flows are in cubic feet per second (cfs)? At some points along the river the volume seems to increase much more than the feed contributes (e.g. at Bear Creek influx).

As an additional layer of information the color of the Sankey arrows indicates the trailing 7-day average temperature. Temperature color codes shown below.

Graedel REE wheel Sankey remake

In this post on rare earths I have recently featured an alluvial diagram depicting rare earths use from a presentation by T.E.Graedel (Yale). That same presentation also lead me to another article by X. Du & T.E. Graedel titled ‘Uncovering the Global Life Cycles of the Rare Earths Elements’ (open access) that has a number of circular flow diagrams I would call “REE wheels”.

The article describes how quantitative data on rare earths is available for mining and processing, but “very little quantitative information is available concerning the subsequent life cycle stages”. Also, data is mostly available for the overall REE production, but not individually for every single rare earth element. They therefore aim to estimate and approximate the quantities for ten REEs, based on sources from China and Japan.

Here is the REE wheel for Yttrium (element Y) from the article:

The diagram can be read from 7 o’clock to 5 o’clock in a clockwise direction. The processing steps are “Mi” (mining), “S” (separation), “F”(fabrication), “Ma” (manufacturing), “U” (use) and “W” (waste management), thus showing the flow of the rare earth element through the economic cycle.

I did a Sankey diagram version of the above Yttrium REE wheel to have the arrow magnitude representing the quantities. Flows are in Gigagrams (million metric tons) per year.

Due to the fact that the arrows connect horizontally and vertically to the node (and do not run diagonally like in the original) my remake looks less “circular” somehow… in fact it resembles more one of those retro indoor AM/FM loop antennas you would hook to your HiFi. So I am not fully satisfied with the outcome. Would it be better if the nodes were tilted 45°?

What’s nice is that the extraction of ore (17.4 Gg) can be directly compared to the 2.9 Gg Yttrium release to the environment. I switched ore input and tailings output at the mining node to have them side-by-side.

Comments and improvement suggestions welcomed.

FIFA accounts – my version

In this post I had criticized a Sankey diagram depicting FIFA accounts published at BBC News. By drawing operating profits disproportionally they would overemphasize certain arrows.

Here is my version of the diagram, based on the values given in the article by Paul Sargeant (no warranty for the accuracy of these numbers). The orange arrow represents the operating profits, this time at the same scale.

Compare for yourself what impression the two diagrams create in you… and let me know by leaving a comment.

Integration of Sankey diagrams, e!Sankey

Found out via the news feed from ifu Hamburg, maker of e!Sankey that they have released an SDK based on e!Sankey that allows software makers to integrate Sankey visualizations into their application.

Two main features help to achieve this: (1) building Sankey diagrams from an XML file that contains structural and layout information (2) feeding values into a Sankey diagram template by reading ID/value pairs from a CSV file.

Where do Sankey diagrams work? Data Revelations

Interesting blog post by Steve Wexler of Data Revelations. Long article, long title: “Circles, Labels, Colors, Legends, and Sankey Diagrams – Ask These Three Questions”.

The really interesting part for the Sankey diagram aficionados is Steve’s advice on when to use Sankey diagrams, and when you should avoid using them.

Steve illustrates his point with the below example by ‘Music Major – Data Miner’ Jeffrey A. Shaffer (original post is here)

A combination of a stacked bar chart with a distribution diagram, nicely decorated with a trumpet … “Within this context, this very creative chart works”, Steve writes.

He then goes on and shows another one by Shaffer, also a distribution diagram: the original pie chart data from an energy bill has been redesigned and was presented as a distribution diagram (two stacked bars with bands to link them)

In this case, Steve concludes, the choice of a Sankey diagram is maybe not that wise, since the actual important information (44% of energy cost is for heating) doesn’t really come across quickly and clearly. A bar chart might work better here. Sankey diagrams can create a “cool!” or a “crap!” response, depending on the context. See the original Shaffer post here.

Adding my 2c from a technical perspective I would say that both diagrams have a shortcoming: The bands don’t maintain their width as they cross over the others diagonally. Somewhat acceptable in the trumpet diagram as the right bar on the right side listing the music composers is higher than the one at trumpet bell (sound spreading out). Not acceptable in the second diagram where the two stacked bars have the same height. This is obviously an error in the curve radius calculation (read ‘The Math Behind those Curves’)

Yet another Distribution Diagram

I got alerted by Google to a blog post by Maruthi Jampani at the Express Analytics blog. Sure, I am always excited to get fresh new Sankey diagrams worth to be reported here. But more and more I find distribution diagrams like the one shown in the article ‘Power of Sankey Diagram in Data Visualization’ … and get disappointed. Well, not really. The term ‘Sankey diagram’ has gained a certain popularity over the past years, which is good. With the increase in use of d3.js, Parsets or Fineo we see more of these distribution diagrams.

Time to talk about distribution diagrams again?

My two posts back in 2009 (‘Infographics Experts on Sankey Diagrams (Part 1)’ and ‘Infographics Experts on Sankey Diagrams (Part 2)’) were based on a good and funny article by Chiqui Esteban at infografistas.blogspot.com. He suggested several names (in Spanish) for this type of diagram and concluded that the best term is distribution diagram.

The Parsets page explains that they are a “visualization … for categorical data, like census and survey data, inventory, and many other kinds of data that can be summed up in a cross-tabulation. (…) Between the dimension bars are ribbons that connect categories and split up. This shows you how combinations of categories are distributed, and how a particular subset (…) can be further subdivided.”

So we have categories and dimensions. And ribbons that connect them.

Distribution diagrams have commonalities with Sankey diagrams. In fact, one very central characteristic is that the width of the band is proportional to the quantity it represents. In Sankey diagrams the width of the arrow (!) is proportional to the quantity of the flow represented. So they do qualify as Sankey diagrams, but I would say they should be considered a subset or specific type of Sankey diagrams. As I pointed out in a May 2012 post:

It is exactly the fact that these are not directed flows, but rather quantities that are distributed over categories (or dimensions). There is no time relation in them, neither are there flows “from” (e.g. Finance) “to” (e.g. Reporting) or the other way round. These are bands hooked between nodes rather than arrows leading from one node to another. Each category could be represented by a pie chart as well

So I do agree that distribution diagrams (or spaghetti diagrams, swim lane diagrams) are a subset of Sankey diagrams. But Sankey diagrams are more, there is more to them.

I may have to emphasize the genuine Sankey diagrams in the future. Flows in process systems, from one machine to another. Energy input into a boiler, and heat being distributed as steam to other parts of the plant. Streams of people moving between halls at a trade fair. Water being pumped back in loops. Value streams along a supply chain, where each processing step adds to the value of the product. And much more…