I analyze a lot of experiments and there are many times when I want to quickly look at means and standard errors for each cell (experimental condition), or the same for each cell *and* individual-level attribute level (e.g., Democrat, Independent, Republican). Many of these experiments are embedded in national or state-level surveys, for which each respondent is weighted so that the sample means for gender, age, and political affiliation approximate those of a probability sample. But if you use weights, it’s difficult to get unbiased estimates of the standard error of that weighted mean.

In this post, I’ll walk through how to use the powerful plyr package to summarize your weighted data, estimating weighted means and standard errors for each experimental cell using the bootstrap, and then generate dotplots to visualize the result using the ggplot2 package. Note that these are both Hadley Wickham’s packages; he’s also responsible for the immensely useful packages reshape and stringr, which I hope to write about in future posts.

First, I’ll describe the data we’re going to use. Just before the 2010 elections, my lab hired YouGov/Polimetrics to survey Illinois voters about a series of candidates for statewide office, most notably those for Secretary of State. The major party candidates included the Black Democrat incumbent, Jesse White, and a Hispanic Republican challenger, Robert Enriquez. We exposed respondents to different images of Jesse White, according to three experimental conditions: light, dark, and control (no picture). Our dependent measures included a question about how each respondent intended to vote and a feeling thermometer measuring affect toward each candidate. I use the net of the thermometer rating for Obama minus the thermometer rating for McCain (“netther”). For more on this type of measure, here’s a link to a study on feeling thermometers and their precision compared to questionnaire measures with fewer categories.

Even before we have to deal with the issue of weighted standard errors, it would be nice to find an elegant way to extract weighted statistics on subsets of our data corresponding to our experimental cells (or other qualitative grouping of our data, such as respondent party or ethnicity). It’s rather clunky and/or difficult to do this with other packages/functions, such as base::by or doBy::summaryBy because of the difficulty of passing more than one variable to the function in question, say “weighted.mean(),” applied to the subset of your data specified by a particular variable (see this discussion for details — thanks to David Freedman for the pointer to plyr::ddply). The survey::svymean function provides another potential alternative, but is nowhere near as flexible as is ddply.

We’ll use plyr::ddply, which handles this problem quite well. In the code below, we use “ddply(),” which takes a data frame for input and returns a data frame for output. The “.data” parameter specifies the data frame and the “.variables” parameter specifies the qualitative variable used to subset the data. We use “summarise()” as the function, which allows us to specify not only the function we want to pass over the subsets of our data (experimental cells), but also to tell ddply which variables in our data subset to pass to the function. For example, below we run the function “mean()” on the variable “netther” for each treatment group.

install.packages("plyr") library("plyr") # load the Illinois survey data load(file("dl.dropboxusercontent.com/u/25710348/Blogposts/data/IL2010.Rda")) # look at ddply: ddply(.data = IL2010, .variables= .(treat), .fun = summarise, mean = mean(x=netther, na.rm=T), se = sd(netther, na.rm=T)/sqrt(length(netther)) )

Now that we have some nice machinery to quickly summarize weighted data in hand, let’s get into bootstrapping. The computation of weighted standard errors for estimates of the weighted mean has no straighforward closed form solution, and hence bootstrapping is not a bad way to go. Bootstrapping consists of computing a statistic over and over, by resampling the data with replacement. For more see the UCLA stats webpage on bootstrapping in R.

I’ll use the boot package to implement bootstrapping, which requires us to specify a bootstrapping function that returns a statistic.

install.packages("boot") library("boot") sample.wtd.mean <- function(x, w, d) { return(weighted.mean(x = x[d], w = w[d], na.rm=T )) }

Why do we need this function? It allows the boot function to sample our data many times (w/ replacement), each time computing the statistic we want to estimate, as mentioned above. Specifying that function make this process quite elegant by making use of R’s indexing capabilities. Boot pass this function an index of items to include from our original data in each of the resamples. To illustrate this, run the following code in R:

playdata <- runif(20, 1, 10) playdata d <- sample(20, replace = TRUE) d playdata[d]

The code “playdata[d]” returns the “dth” element from playdata. This is exactly how the boot function works. It passes our “sample.wtd.mean()” function our index of items (d) over and over again based on sampling without replacement — utilizing the fast c code that R uses for indexing rather than slow R for-loops.

When boot is done, we take the mean or median of these statistics as our estimate, which is a loose definition of the bootstrap. We also use the standard deviation of these estimates as the estimate of the standard error of the statistic. Here’s what happens when we pass our function over our data:

b <- boot( IL2010$netther, statistic = sample.wtd.mean, R=1000, w=IL2010$weight) b sd(b$t)

[UPDATE 26 JAN 2012: perviously this was coded incorrectly. The “weights” parameter has been changed to “w.” Thanks to Gaurav Sood and Yph Lelkes for pointing this out!]

Now let’s put it all together using “plyr” and “ggplot2.” I use dotplots because they convey numbers more accurately than other types of plots, such as bar plots or (worst of all) pie charts. William Cleveland has conducted research showing that dots aligned on the same scale are indeed the best visualization to convey a series of numerical estimates (Correction, the actual study is here). His definition of “best” is based on the accuracy with which people interpret various graphical elements, or as he calls them, “perceptual units.”

We’ll first just visualize results by party:

install.packages("ggplot2") library("ggplot2") # Create nice party ID variable: IL2010$pid[IL2010$pid==8] <- NA IL2010$pidcut <- cut(IL2010$pid, c(-Inf, 1, 3, 4, 6, Inf) , c("Strong Democrat", "Democrat", "Independent", "Republican", "Strong Republican")) # clean up treatment variable IL2010$treat <- factor(IL2010$treat, labels = c("Dark", "Light", "Control" )) # Use plyr::ddply to produce estimates based on PID pid.therm <- ddply(.data = IL2010, .variables= .(pidcut), .fun = summarise, mean = weighted.mean(x=netther, w=weight, na.rm=T), se = sd(boot(netther, sample.wtd.mean, R = 1000, w = weight)$t)) # plot it ggplot(na.omit(pid.therm), aes(x = mean, xmin = mean-se, xmax = mean+se, y = factor(pidcut))) + geom_point() + geom_segment( aes(x = mean-se, xend = mean+se, y = factor(pidcut), yend=factor(pidcut))) + theme_bw() + opts(axis.title.x = theme_text(size = 12, vjust = .25))+ xlab("Mean thermometer rating") + ylab("Treatment") + opts(title = expression("Thermometer ratings for Jessie White by party"))

That’s a pretty big ggplot command, so let’s discuss each element. The first is the data to plot, which is the object produced via ddply. The second specifies the aesthetic elements of the plot, including the x and y values of the plot, and the boundaries of the plot based on specified minimum and maximum endpoints (“xmin = mean-se, xmax = mean+se”). Next, we add points to the plot by envoking “geom_point().” If we only wanted points, we could stop here. We plot lines representing standard errors with “geom_segment()” specifying x and xend. Next, we specify that we want to use theme_bw(), which I find cleaner than the default parameters. Then we make some minor adjustments via opts for aesthetics. Lastly, we set the y and x labels and the title.

Well, it’s pretty clear from our plot that there’s a lot of action on partisan ID. Let’s look at how each group is affected by the treatment. To do so, we’ll produce the plot at the top of this article, putting each partisan group in a panel via “facet_wrap(),” and plotting each treatment group. First, we’ll generate the summary data using plyr, all we need to do is add another variable (“treat”).

# PID x Treatment trtbypid.therm <- ddply(.data = IL2010, .variables= .(treat, pidcut), .fun = summarise, mean = weighted.mean(x=netther, w=weight, na.rm=T), se = sd(boot(netther, sample.wtd.mean, R = 1000, w = weight)$t)) ggplot(na.omit(trtbypid.therm), aes(x = mean, xmin = mean-se, xmax = mean+se, y = factor(treat))) + geom_point() + geom_segment( aes(x = mean-se, xend = mean+se, y = factor(treat), yend=factor(treat))) + facet_wrap(~pidcut, ncol=1) + theme_bw() + opts(axis.title.x = theme_text(size = 12, vjust = .25))+ xlab("Mean thermometer rating") + ylab("Treatment") + opts(title = expression("Thermometer ratings for Jessie White by treatment condition and pid"))

It looks like Republicans generally don’t like the dark image, while Democrats do. But what if we want to look at the way that racial attitudes interact with the treatments? We measured implicit racial attitudes via the IAT). Let’s take a look.

# Fix up the data to remove the "Control" condition and all "na's" ilplot <- IL2010[-which(is.na(IL2010$pidcut)),] ilplot$treat[ilplot$treat=="Control"] <- NA ilplot$treat <- ilplot$treat[drop=T] ilplot <- ilplot[-which(is.na(ilplot$treat)),] ilplot$dscore[which(ilplot$dscore< -1)] <- NA # Plot it ggplot(ilplot, aes(x = dscore, y = netther, colour=treat)) + geom_point() + geom_smooth(method = "loess", size = 1.5) + facet_wrap(~pidcut, nrow=1) + theme_bw() + opts(axis.title.x = theme_text(size = 12, vjust = .25))+ xlab("IAT d-score (- pro black, + anti-black)") + ylab("Thermometer rating") + opts(title = expression("Thermometer ratings for Jessie White by treatment condition and iat"))

With this plot, we don’t really need to subset the data because we are using all of the points and just using lowess to draw lines representing the conditional means across every value of IAT score. The plot shows what seems to be an interaction between IAT score and treatment, such that those with high (anti-black) IAT scores generally favor the lighter skinned image.