Yet another Distribution Diagram

I got alerted by Google to a blog post by Maruthi Jampani at the Express Analytics blog. Sure, I am always excited to get fresh new Sankey diagrams worth to be reported here. But more and more I find distribution diagrams like the one shown in the article ‘Power of Sankey Diagram in Data Visualization’ … and get disappointed. Well, not really. The term ‘Sankey diagram’ has gained a certain popularity over the past years, which is good. With the increase in use of d3.js, Parsets or Fineo we see more of these distribution diagrams.

Time to talk about distribution diagrams again?

My two posts back in 2009 (‘Infographics Experts on Sankey Diagrams (Part 1)’ and ‘Infographics Experts on Sankey Diagrams (Part 2)’) were based on a good and funny article by Chiqui Esteban at infografistas.blogspot.com. He suggested several names (in Spanish) for this type of diagram and concluded that the best term is distribution diagram.

The Parsets page explains that they are a “visualization … for categorical data, like census and survey data, inventory, and many other kinds of data that can be summed up in a cross-tabulation. (…) Between the dimension bars are ribbons that connect categories and split up. This shows you how combinations of categories are distributed, and how a particular subset (…) can be further subdivided.”

So we have categories and dimensions. And ribbons that connect them.

Distribution diagrams have commonalities with Sankey diagrams. In fact, one very central characteristic is that the width of the band is proportional to the quantity it represents. In Sankey diagrams the width of the arrow (!) is proportional to the quantity of the flow represented. So they do qualify as Sankey diagrams, but I would say they should be considered a subset or specific type of Sankey diagrams. As I pointed out in a May 2012 post:

It is exactly the fact that these are not directed flows, but rather quantities that are distributed over categories (or dimensions). There is no time relation in them, neither are there flows “from” (e.g. Finance) “to” (e.g. Reporting) or the other way round. These are bands hooked between nodes rather than arrows leading from one node to another. Each category could be represented by a pie chart as well

So I do agree that distribution diagrams (or spaghetti diagrams, swim lane diagrams) are a subset of Sankey diagrams. But Sankey diagrams are more, there is more to them.

I may have to emphasize the genuine Sankey diagrams in the future. Flows in process systems, from one machine to another. Energy input into a boiler, and heat being distributed as steam to other parts of the plant. Streams of people moving between halls at a trade fair. Water being pumped back in loops. Value streams along a supply chain, where each processing step adds to the value of the product. And much more…