Visualization series: Insight from Cleveland and Tufte on plotting numeric data by groups

This article has moved: https://solomonmg.github.io/blog/2012/visualization-series-insight-from-cleveland-and-tufte-on-plotting-numeric-data-by-groups/

About Solomon

Political Scientist, Facebook Data Science
This entry was posted in R. Bookmark the permalink.

26 Responses to Visualization series: Insight from Cleveland and Tufte on plotting numeric data by groups

  1. Pingback: Research tips - Data visualization

  2. Martin says:

    I’ve read Nathan Yau’s “Visualise This!” book which was a pretty nice intro, but I have to admit you have really succeeded in pulling all the main arguments for “perfect” visualisations together. Great article, can’t wait for the next one – good work!!

  3. Pingback: Visualização gráfica « De Gustibus Non Est Disputandum

  4. Pingback: HotPearls to be sorted | Pearltrees

  5. Pingback: Visualization series: Insight from Cleveland and Tufte on plotting numeric data by groups | R | Scoop.it

  6. Robert Young says:

    While I certainly agree with you, Tufte, and Cleveland, there remains the aphorism, “figures don’t lie, but liars figure”. If one’s venue is political communication, one’s agenda is the prime objective, not objectivity. Thus, figuring with figures (in the pictorial sense) in an effective way, meaning to convince the unbelievers of one’s agenda (especially when said agenda is actually contrary to the interests of the unbelievers; see: Frank on Kansas), is the goal. If that means manipulating weaknesses in human perception, then that’ll be done. The Mad Men have been doing it for decades. Nixon, the first in my memory, used Ad Men deeply.

    From an academic point of view, where objectivity and fairness is the goal, then, certainly. As a “know thine enemy” exercise, by all means, continue.

    • Solomon says:

      Thanks for the comment Robert, it gives me an excuse to delve into some social-historical issues related to data and visualization. You are right that data can always be subject to cherry-picking, biases in measurement, subtle transformations, and sometimes outright fabrication, which can serve to manipulate the audience’s interpretation of the actual state of things. This is especially concerning because people who do not analyze data are generally not prepared to detect and counter these obfuscations. In fact, it is often difficult for data analysts to detect such problems. But just because some people use visualization to manipulate doesn’t mean we should throw away such a valuable tool.

      Data visualization is intricately linked to the scientific revolution (i.e., the Enlightenment), which by most accounts has made the world a better place. I’ll argue that the critical catalysts of the scientific Enlightenment were Galileo for his emphasis on objective measurement of physical phenomena and experimental intervention (which eventually led to an early theory of inertia and the eventual acceptance of Copernicus’ heliocentric model of the solar system); and DeCartes, for his emphasis on deductive logic and for the invention of the the Cartesian plane, which brings together measurement, mathematics, and geometry, depicting the relationship between variables (or functions) visually. This in my view constitutes the birth of modern data visualization.

      The development of the Cartesian plane as a tool for understanding data, variables, functions, etc., had a powerful impact on our world. It was crucial to the discovery/invention of Calculus, without which very little of our modern world would be possible. More immediately, it would be nearly impossible to understand or teach math or science or any quantitative endeavor without visualization.

      While folks do use visualization as a tool of manipulation, the way to neutralize any such deception is the same for propaganda in prose—education. Rather than rejecting visualization as a tool of the propagandist, we ought to improve society’s understanding of data and visualization.

  7. Fionn Hope says:

    LOL

    “I did see plenty of maps, however, which I suppose one could argue are reminiscent of noodles.”
    Nice and needed article
    Thanks

  8. Roberto Osorio says:

    Interesting and useful. Just one objection: Since you mention the interplay of lexical and visual expressions, I can’t help cringing when you write “I suppose I could extend the axis from 0-100.” A hyphen is not a synonim for the word “to.” The hyphenated word “0-100” already means “the range from 0 to 100,” so you’re effectively writing “from from 0 to 100.” Thought I had to comment on this because I’d hate to see any similar construction be absorbed into one of your nice graphic examples.

  9. Pingback: Are pie charts really always bad? (and other thoughts on graphs…) | Aid Writing

  10. Pingback: Visualization series: Insight from Cleveland and Tufte on plotting numeric data by groups @ Solomon Messing | Martin Larsson

  11. Pingback: datanalytics » Representación de datos asociados a grupos

  12. Pingback: Visualization series: Insight from Cleveland and Tufte on plotting numeric data by groups | Estadística y R | Scoop.it

  13. Pingback: Visualization series: Insight from Cleveland and Tufte on plotting numeric data by groups @ Solomon Messing : Martin Larsson

  14. Pingback: Visualization series: Insight from Cleveland and Tufte on plotting numeric data by groups | Encouraging moderation: Clues from a simple model of ideological conflict | Scoop.it

  15. Pingback: Enlaces de la semana | Politikon

  16. Pingback: Visualization Series: Using Scatterplots and Models to Understand the Diamond Market (so You Don’t Get Ripped Off) | Solomon Messing

  17. Pingback: When to Use Stacked Barcharts? | Solomon Messing

  18. Pingback: visualization | Future Yada Yada Yada

  19. Andrew says:

    Great read, thank you! I realize this is over two years old, but I was linked here from a newer article.

    Regarding the Primary Elections example (plot vs pie vs stacked bar). I feel it’s worth mentioning that simply adding Candidate dimension / vote percent measure labels directly to pie slices and stacked bar segments debunks the cognitive processing argument. That being said, both are still not scalable in terms of how many candidates visualize well, and clearly plot is more effective.

  20. Pingback: How data visualizations are tools (and what you're building with them) - SHARP SIGHT LABS

  21. Andreas says:

    Hi,

    many thanks for this! I tried to run your script, but it throws up a message:
    “Error: Use ‘theme’ instead. (Defunct; last used in version 0.9.1)”

    Being ignorant of the finer details of R (just a beginner), could you tell me how to fix the script so that it runs on a current installation of R? That would be much appreciated.

    Other than that, I have profited from reading your material and would like to say thank you for sharing it!

    Best,

    Andreas

  22. Dan says:

    I enjoyed the article. The link to Cleveland’s paper was broken. I found it here: https://www.cs.ubc.ca/~tmm/courses/cpsc533c-04-spr/readings/cleveland.pdf.

  23. Charles-André COSTE says:

    Great article !
    Have you the link for “primaryres.csv” dataset ?
    Best,

    • Solomon says:

      Yes, I’ve now updated the code to point the right location for the data files and updated the syntax to work w/ latest versions of ggplot2 and dplyr.

Leave a comment