Visualization series: Insight from Cleveland and Tufte on plotting numeric data by groups

Posted on March 4, 2012 by Solomon

This article has moved: https://solomonmg.github.io/blog/2012/visualization-series-insight-from-cleveland-and-tufte-on-plotting-numeric-data-by-groups/

About Solomon

Political Scientist, Facebook Data Science

View all posts by Solomon →

This entry was posted in R. Bookmark the permalink.

26 Responses to Visualization series: Insight from Cleveland and Tufte on plotting numeric data by groups

Pingback: Research tips - Data visualization
Martin says:

March 4, 2012 at 3:15 pm

I’ve read Nathan Yau’s “Visualise This!” book which was a pretty nice intro, but I have to admit you have really succeeded in pulling all the main arguments for “perfect” visualisations together. Great article, can’t wait for the next one – good work!!

Reply
Pingback: Visualização gráfica « De Gustibus Non Est Disputandum
Pingback: HotPearls to be sorted | Pearltrees
Pingback: Visualization series: Insight from Cleveland and Tufte on plotting numeric data by groups | R | Scoop.it
Robert Young says:

March 5, 2012 at 7:27 am

While I certainly agree with you, Tufte, and Cleveland, there remains the aphorism, “figures don’t lie, but liars figure”. If one’s venue is political communication, one’s agenda is the prime objective, not objectivity. Thus, figuring with figures (in the pictorial sense) in an effective way, meaning to convince the unbelievers of one’s agenda (especially when said agenda is actually contrary to the interests of the unbelievers; see: Frank on Kansas), is the goal. If that means manipulating weaknesses in human perception, then that’ll be done. The Mad Men have been doing it for decades. Nixon, the first in my memory, used Ad Men deeply.

From an academic point of view, where objectivity and fairness is the goal, then, certainly. As a “know thine enemy” exercise, by all means, continue.

Reply
- Solomon says:
  
  March 9, 2012 at 10:08 am
  
  Thanks for the comment Robert, it gives me an excuse to delve into some social-historical issues related to data and visualization. You are right that data can always be subject to cherry-picking, biases in measurement, subtle transformations, and sometimes outright fabrication, which can serve to manipulate the audience’s interpretation of the actual state of things. This is especially concerning because people who do not analyze data are generally not prepared to detect and counter these obfuscations. In fact, it is often difficult for data analysts to detect such problems. But just because some people use visualization to manipulate doesn’t mean we should throw away such a valuable tool.
  
  Data visualization is intricately linked to the scientific revolution (i.e., the Enlightenment), which by most accounts has made the world a better place. I’ll argue that the critical catalysts of the scientific Enlightenment were Galileo for his emphasis on objective measurement of physical phenomena and experimental intervention (which eventually led to an early theory of inertia and the eventual acceptance of Copernicus’ heliocentric model of the solar system); and DeCartes, for his emphasis on deductive logic and for the invention of the the Cartesian plane, which brings together measurement, mathematics, and geometry, depicting the relationship between variables (or functions) visually. This in my view constitutes the birth of modern data visualization.
  
  The development of the Cartesian plane as a tool for understanding data, variables, functions, etc., had a powerful impact on our world. It was crucial to the discovery/invention of Calculus, without which very little of our modern world would be possible. More immediately, it would be nearly impossible to understand or teach math or science or any quantitative endeavor without visualization.
  
  While folks do use visualization as a tool of manipulation, the way to neutralize any such deception is the same for propaganda in prose—education. Rather than rejecting visualization as a tool of the propagandist, we ought to improve society’s understanding of data and visualization.
  
  Reply
Fionn Hope says:

March 6, 2012 at 10:34 am

LOL

“I did see plenty of maps, however, which I suppose one could argue are reminiscent of noodles.”
Nice and needed article
Thanks

Reply
Roberto Osorio says:

March 8, 2012 at 9:24 pm

Interesting and useful. Just one objection: Since you mention the interplay of lexical and visual expressions, I can’t help cringing when you write “I suppose I could extend the axis from 0-100.” A hyphen is not a synonim for the word “to.” The hyphenated word “0-100” already means “the range from 0 to 100,” so you’re effectively writing “from from 0 to 100.” Thought I had to comment on this because I’d hate to see any similar construction be absorbed into one of your nice graphic examples.

Reply
- Solomon says:
  
  March 9, 2012 at 8:49 am
  
  Thanks for the comment, you’re probably right. Fixed.
  
  Reply
Pingback: Are pie charts really always bad? (and other thoughts on graphs…) | Aid Writing
Pingback: Visualization series: Insight from Cleveland and Tufte on plotting numeric data by groups @ Solomon Messing | Martin Larsson
Pingback: datanalytics » Representación de datos asociados a grupos
Pingback: Visualization series: Insight from Cleveland and Tufte on plotting numeric data by groups | Estadística y R | Scoop.it
Pingback: Visualization series: Insight from Cleveland and Tufte on plotting numeric data by groups @ Solomon Messing : Martin Larsson
Pingback: Visualization series: Insight from Cleveland and Tufte on plotting numeric data by groups | Encouraging moderation: Clues from a simple model of ideological conflict | Scoop.it
Pingback: Enlaces de la semana | Politikon
Pingback: Visualization Series: Using Scatterplots and Models to Understand the Diamond Market (so You Don’t Get Ripped Off) | Solomon Messing
Pingback: When to Use Stacked Barcharts? | Solomon Messing
Pingback: visualization | Future Yada Yada Yada
Andrew says:

October 31, 2014 at 9:54 am

Great read, thank you! I realize this is over two years old, but I was linked here from a newer article.

Regarding the Primary Elections example (plot vs pie vs stacked bar). I feel it’s worth mentioning that simply adding Candidate dimension / vote percent measure labels directly to pie slices and stacked bar segments debunks the cognitive processing argument. That being said, both are still not scalable in terms of how many candidates visualize well, and clearly plot is more effective.

Reply
Pingback: How data visualizations are tools (and what you're building with them) - SHARP SIGHT LABS
Andreas says:

March 13, 2015 at 5:41 am

Hi,

many thanks for this! I tried to run your script, but it throws up a message:
“Error: Use ‘theme’ instead. (Defunct; last used in version 0.9.1)”

Being ignorant of the finer details of R (just a beginner), could you tell me how to fix the script so that it runs on a current installation of R? That would be much appreciated.

Other than that, I have profited from reading your material and would like to say thank you for sharing it!

Best,

Andreas

Reply
Dan says:

October 5, 2016 at 12:03 pm

I enjoyed the article. The link to Cleveland’s paper was broken. I found it here: https://www.cs.ubc.ca/~tmm/courses/cpsc533c-04-spr/readings/cleveland.pdf.

Reply
Charles-André COSTE says:

October 24, 2016 at 11:09 pm

Great article !
Have you the link for “primaryres.csv” dataset ?
Best,

Reply
- Solomon says:
  
  October 27, 2016 at 4:45 am
  
  Yes, I’ve now updated the code to point the right location for the data files and updated the syntax to work w/ latest versions of ggplot2 and dplyr.
  
  Reply