## Sankey Charts in Tableau

Found an interesting blog post over at TabVizExplorer, a casual blog maintained by Mithun Desai. Data visualization is one focus of his work.

He uses Tableau to draw Sankey charts (I prefer to call them relationship diagrams, alluvial diagrams or even Spaghetti diagrams). Here is a rather simple one, showing the relation between top 20 cricket players and their country of origin.

The diagram has two data categories. The country of origin shown in the left stacked column in no particular order, and the top 20 players ordered according to their ICC ranking score.
In between are the streams or bands (or ‘Spaghettis’ for the sake of it) color coded by country of origin.

Now, it is not up to me criticizing the choice of diagram type for conveying this specific information. The author seems to have chosen the cricket topic just as a sample, to explain how to do Sankey charts in Tableau in general. Actually the colored list of top 20 (right column) already tells us all we need to know and you wouldn’t even need the left column and the streams.

The main reason I am not happy with this diagram is the fact that it does not stick to the most important characteristic of a Sankey diagram. The post itself comes with the definition: “Sankey diagrams are specific type of flow diagram in which the width of the arrows is shown proportionally to the flow quantity.”

So, what is the flow quantity here? I was thinking of net worth in \$\$\$ of each player, or at least a translation of the ranking score to the width of the bands. But then Babar Azam, who ranked 4th with a score of 846 wouldn’t be shown with a band narrower than the one of E.J.G.Morgan coming in 20th with a score of 650. My guess is, that the the widths of the streams are chosen deliberately…

Where the bands merge, they overlap rather than merge to show the sum of the flow quantities. This makes for a very odd visual effect, at least in terms of Sankey diagrams.

The blog article gives away some of the math behind the curves, so called Sigmoid curves, which is interesting.

This capture taken from the embedded Tableau graph shows how the curves are made up and how the width is maintained along the routing of each curve: You do it with cricket balls 😉 … or christmas bulbs.

Other implementations of relationship diagrams use Beziers curves (which sometimes come with another downside, read here). But that’s for another time…

## Handling Different Scales in one Diagram

Those of you who have already created Sankey diagrams might have come across the issue: As long as the flow data you are about to visualize is more or less in the same value range everything is fine, and there should be no problem in coming up with an nice Sankey diagram. However, sometimes we have very small flow quantities, while at the same time there are some large flows dominating the picture.

Sticking to the “golden rule” of Sankey diagrams (i.e. the width of the Sankey arrow corresponds to the flow quantity represented) and ensuring the proportionality of flows in relation to each other becomes very difficult. If you opt to show the larger flows at “normal” width, the smaller flows become difficult to perceive and are shown as hairlines (sometimes even invisible on a screen or in print). If, on the other hand, you decide to push up the scaling factor so that these smaller flow quantities can be seen in the diagram, then the large flows are really fat and spoil your diagram.

This seems to be an irresolvable issue… Nevertheless, there are some approaches to tackle this. Most of them resort to taking out the tiny flows or the very large flows of being to scale used in the Sankey diagram. You may opt to use a minimum width (e.g. 1 or 2 pixels) for arrows that carry only a small flow quantity, or you may decide to set an upper flow threshold, corresponding to a maximum width for the Sankey arrow, independent of the actual flow quantity (beyond the threshold value). In both cases I would strongly recommend to denote this decision in the diagram (e.g. in a footnote), since otherwise the person looking at the Sankey diagram will get a wrong idea of the quantities/proportions.

The Sankey diagram from the PROSUM report I recently featured in this post has another, quite unique solution. Here is a zoomed cropped section:

The metals in the end-of-life vehicle (ELV) stream of 8 million tons (in 2016) are mainly aluminium, copper and iron. This stream is on the same scale as the overall Sankey diagram (see full diagram here). However, the other metals in the stream (such as gold, silver or platinum) are contained in comparatively much smaller amounts. The authors of the Sankey diagram hence opted to emphasize them by switching to another scale (1:5.000). As a result the arrow representing the flow of approximately 660 tons of critical raw materials (CRMs) is almost a wide as the arrow that shows 6780 ktons!

The fact that the precious metal stream is highlighted and not to scale with the rest of the flows in the diagram is clearly signalled with a note, a dotted line that separates this diagram area, and even an exclamation mark symbol.

Since CRMs were the focus of the PROSUM study I think such a “trick” is justified. What are your experiences with flows on different scales? How would you handle this “dimension challenge” in a Sankey diagram? Let me know your ideas!

## Landscapes of Climate Finance, I4CE

What is landscape of climate finance? A paper published December 2016 by I4CE tells us that “Landscapes of climate finance are comprehensive studies mapping financial flows dedicated to climate change action and the energy transition. Covering both end-investment and supporting financial flows from public and private stakeholders, [they] draw the picture of how the financial value chain links sources, intermediaries, project managers and the end investment.”

The paper by Hadrian Hainaut (I4CE), Andreas Barkman (EEA) and Ian Cochran (I4CE) titled ‘Landscapes of domestic climate finance in Europe: Supporting and improving climate and energy policies for a low-carbon, resilient economy’ features two interesting Sankey diagrams.

This is the ‘Landscape of Climate Finance in France 2014’:

Flows are in billion Euro. Sources and receiving sectors indicated with distinctive black boxes. The authors opted for strictly horizontal/vertical arrow routing. There are no individual quantities at each arrow, so the actual numbers can only be estimated from the arrow proportions.

This is the ‘National Climate Finance in Belgium 2013’:

Flows are in million Euros. Some muddle here at the exit of the top light blue box where the arrows overlap instead of showing the sum of roughly 2000 m€ spending. This coincides with three overemphasized arrow heads for the arrows leading to “Public Investments”, “Policy Incentives” and “Grants”. Arriving arrows at the box “Climate Mitigation” overlap and the Sankey diagram could benefit from clearing up here.

Not sure about the ESDC voting: “France: huit points, La Belgique: dix points” maybe 😉

I had reported on climate finance diagrams back in 2014 when the concept was first presented by Climate Policy Initaitive (CPI) but had since lost sight of them. I am happy to see that the idea is still alive and being taken up in a number of countries in Europe. Also good to see that the diagrams are not yet regulated by a standard and there is some “diversity” among these diagrams.

## Cape Town Water Use Sankey Diagram

From a post ‘Cape Town’s water crisis : Towards a more water secure future’ on the Future Cape Town blog comes this Sankey diagram on the water use in the city of Cape Town (South Africa).

The author of the diagram, Rebecca Cameron, is with MCA Urban and Environmental Planners and looks at how Cape Town could transition towards a more water secure future. This Sankey diagram was originally published in her article Cameron, R and Katzschner, T. 2016. The role of spatial planning in enhancing Integrated Urban Water Management in the City of Cape Town. South African Geographical Journal. 99(2), pp. 196 – 216.

Absolute flow values are not given in this version of the Sankey diagram. Flows are in million cubic metres per year (Mm³/a). Water from five different sources outside the municipality feed the city of Cape Town, as well as five sources within the city. A breakdown of water supplied by the municipal water works is shown. Additional color coding of the arows indicate water quality (dark green = sewage, light green = treated water).

The author explains:

“This diagram is helpful in that it places all aspects of the water system in to one diagram. Here, water supply, water use, wastewater treatment and stormwater have been considered as a single system where too often the urban water cycle is fragmented when addressed within different sectors. The arrows of flow follow a key to represent the quantity and quality of water. The size of the arrow of flow is proportionally indicative of the quantity of water that flows from one process to one another. The colour of the arrows indicates the quality of the water flow; this includes non-potable, potable, sewage, treated sewage, and treated sewage for reuse. This is important to represent as, to intervene in an urban water cycle, both quantity and quality of water must be considered and used appropriately to move towards a more efficient and sustainable water system.”

From the rivers most of the water goes to the ocean. Through evaporation and precipiation it (hopefully) replenishes the reservoirs again that feed the city (this last part not shown in the diagram).

## US Trade Flows, yet another option

After showing two variants for visualizing the U.S trade balance in my last post, I got aware of yet another option. The first figure (infographic by Spiegel Online) used the length of the arrows to express the value of imported and exported goods. My remake version used the magitude (width) of the arrows, as is typical for Sankey diagrams.

In this figure (by Anthony Cohen, University of Illinois, 2012 / Wikicommons) for US trade in 2011 the arrows for import (red) and export (green) are proportional to the total value of goods, just as we are used to see it in a Sankey diagram. But the arrows are superimposed, with the narrower green export arrow on top of the wider red import arrow. This creates another, somewhat more dramatic impression.

Data shown is for 2011 in billion USD for the 15 most important trade partners. Arrows are not labeled with absolute figures, instead a legend at the bottom indicates the width of five default arrows. The arrow from and to Mexico is a problem (no joke intended!), but the legend clarifies that arrows don’t indicate a specific geographic routing.

## US Trade Balance, two versions

When German Chancellor Angela Merkel meets with POTUS today, one topic that’s most likely going to be addressed is the trade deficit between the United States and the EU, Germany in particular.

The Spiegel, a major German news outlet, has illustrated recent articles on this subject with the figure below. It shows the volume of trade between the United States and ‘selected countries’ (China, Canada, Mexico and the EU) in 2015. The values indicate the value of goods exported (green arrows) to these countries, and imported (blue) from them into the U.S. in billion US\$.

Source: Spiegel Online

The interesting thing in this infographic is that the length of the arrows represents the value of goods traded. For example, the arrow for exports from the US to Europe (274 bnUS\$ in 2015) is little over half the length of the blue incoming arrow (431 bnUS\$ in 2015). This works fine, with the only exception being the green arrow for exports to Mexico.

This infographic of course invited a remake as Sankey diagram. As you all know, in Sankey diagrams the widths of the arrows represent the quantity.

I did two or three different versions, all very similar to the original infographic in style and color, even using the lower states map icon (sorry Alaska and Hawaii). I was not sure at first whether the separate arrows for Germany were values already included in the EU trade volume, or if they were meant to be on top of it. A quick look into the original data revealed that indeed they are included in the EU figures already. I therefore decided to highlight the German share in the Sankey diagram with a slightly brighter color, but keep those arrows stacked.

Here is my Sankey diagram version of the Spiegel infographic.

Not sure which version I prefer, but using the length instead the widths of the arrows to represent the flow quantity is definitely a unique approach. Worth sharing with you, I think.

## River Flow Volume and Temperature

A nice idea for the use of Sankey diagrams can be found on this web page of the U.S. Army Corps of Engineers (USACE) in the Portland OR area.

The diagram shows the flow of the Rogue river and its tributary streams. The fact that the river flows east to west makes this one of the rare examples of a right-to-left orientated Sankey diagrams.

The water volume is represented by the width of the arrow in each segment. Flows are in cubic feet per second (cfs)? At some points along the river the volume seems to increase much more than the feed contributes (e.g. at Bear Creek influx).

As an additional layer of information the color of the Sankey arrows indicates the trailing 7-day average temperature. Temperature color codes shown below.

## Graedel REE wheel Sankey remake

In this post on rare earths I have recently featured an alluvial diagram depicting rare earths use from a presentation by T.E.Graedel (Yale). That same presentation also lead me to another article by X. Du & T.E. Graedel titled ‘Uncovering the Global Life Cycles of the Rare Earths Elements’ (open access) that has a number of circular flow diagrams I would call “REE wheels”.

The article describes how quantitative data on rare earths is available for mining and processing, but “very little quantitative information is available concerning the subsequent life cycle stages”. Also, data is mostly available for the overall REE production, but not individually for every single rare earth element. They therefore aim to estimate and approximate the quantities for ten REEs, based on sources from China and Japan.

Here is the REE wheel for Yttrium (element Y) from the article:

The diagram can be read from 7 o’clock to 5 o’clock in a clockwise direction. The processing steps are “Mi” (mining), “S” (separation), “F”(fabrication), “Ma” (manufacturing), “U” (use) and “W” (waste management), thus showing the flow of the rare earth element through the economic cycle.

I did a Sankey diagram version of the above Yttrium REE wheel to have the arrow magnitude representing the quantities. Flows are in Gigagrams (million metric tons) per year.

Due to the fact that the arrows connect horizontally and vertically to the node (and do not run diagonally like in the original) my remake looks less “circular” somehow… in fact it resembles more one of those retro indoor AM/FM loop antennas you would hook to your HiFi. So I am not fully satisfied with the outcome. Would it be better if the nodes were tilted 45°?

What’s nice is that the extraction of ore (17.4 Gg) can be directly compared to the 2.9 Gg Yttrium release to the environment. I switched ore input and tailings output at the mining node to have them side-by-side.

Comments and improvement suggestions welcomed.