+ - 0:00:00
Notes for current slide
Notes for next slide

Data Visualization for Political Scientists

Session 2 - The Anatomy of a ggplot2 plot


Felix Haass

15 Januar 2018

1 / 33

ggplot2

The "gg" in ggplot2 stands for the "Grammar of Graphics." The grammar of graphics is a philosophy of data visualization which forces you to think about what you want to visualize how. Hadley Wickham followed this philosophy to implement the ggplot2 package.

2 / 33

The anatomy of a ggplot2 plot

The grammar of graphics specifies building blocks out of which an analyst builds a plot. These include, in the order of application:

  1. Data (what do you want to plot?)
  2. Aesthetic mapping (what comes on the x and y axes? )
  3. Geometric object (geoms) (How do we want to see our data? Points, lines, bars, ...)
  4. Add more geoms (e.g. add regression lines to a scatterplot)
  5. Polish labels, scales, legends, and appearance.

(see this link for more details)

3 / 33

Useful tips from the dataviz ninja

  1. Think hard about what you want to visualize!

"Think of graphs as comparison" - Andrew Gelman

4 / 33

ggplot2 building blocks

Let's look at the ggplot2 building blocks in practice:

library(gapminder) # loads the gapminder data
library(tidyverse) # loads ggplot2 and other packages
example_plot <- ggplot(data = gapminder, # specify which dataset to use
aes(x = year, # what goes on the x axis?
y = lifeExp )) + # what's on the y axis?
geom_point() # with which geometric object should the data be displayed?

Note the + that ties the building blocks together.

5 / 33

ggplot2 building blocks

print(example_plot)

6 / 33

Aesthetics - Size

library(gapminder)
library(tidyverse)
example_plot <- ggplot(data = gapminder,
aes(x = year, # the aes() function defines aesthetics
y = lifeExp,
size = gdpPercap)) + # map the aesthetic 'size' to gdp/pc
geom_point()
# print(example_plot)
7 / 33

Aesthetics - Size

print(example_plot)

8 / 33

Aesthetics II - Color

library(gapminder)
library(tidyverse)
example_plot <- ggplot(data = gapminder,
# the aes() function defines aesthetics
aes(x = year, # x axis
y = lifeExp, # y axis
color = continent, # map color to continent
size = gdpPercap)) + # map the aesthetic 'size' to gdp/pc
geom_point()
9 / 33

Aesthetics II - Color

print(example_plot)

10 / 33

Useful tips from the dataviz ninja

  1. Think hard about what you want to visualize!

  2. Don't use too many aesthetics - just use those that help you clarify your comparison!

    "When ggplot successfully makes a plot but the result looks insane, the reason is almost always that something has gone wrong in the mapping between the data and aesthetics for the geom being used" - Kieran Healy

11 / 33

geoms

library(gapminder)
library(tidyverse)
example_plot <- ggplot(data = gapminder,
aes(x = year,
y = lifeExp)) +
geom_line() # lines instead of points
12 / 33

geoms

Whoops! What happened here?

print(example_plot)

13 / 33

geoms

library(gapminder)
library(tidyverse)
example_plot <- ggplot(data = gapminder,
aes(x = year,
y = lifeExp,
group = country)) + # tell ggplot2 which
# observations belong together
geom_line()
14 / 33

geoms

print(example_plot)

15 / 33

Combining geoms

library(gapminder)
library(tidyverse)
example_plot <- ggplot(data = gapminder,
aes(x = year,
y = lifeExp)) +
geom_point() +
geom_smooth(method = "lm") # add regression line
16 / 33

Combining geoms

print(example_plot)

17 / 33

Combining geoms II

library(gapminder)
library(tidyverse)
example_plot <- ggplot(data = gapminder,
aes(x = year,
y = lifeExp)) +
geom_point() +
geom_smooth(method = "lm") +
geom_smooth(method = "loess",
color = "firebrick") # fix smoother color

Bonus question: in this example we fix the color, i.e. we map it to a fixed value (firebrick which is red). What happens if we would map color to a variable in the gapminder dataset, such as continent?

18 / 33

Combining geoms II

print(example_plot)

19 / 33

Manipulate and Preprocess Data

Subsetting/filtering data helps to reduce complexity & get at the comparison that we want. To do that, we use the dplyr package which is part of the tidyverse.

To filter data, we use the filter() function.

library(tidyverse) # loads dplyr package, among others
library(gapminder)
gapminder_americas <- gapminder %>% # the %>% `chains` together functions
filter(continent == "Americas") # that's two "="
head(gapminder_americas, 5)
## # A tibble: 5 x 6
## country continent year lifeExp pop gdpPercap
## <fctr> <fctr> <int> <dbl> <int> <dbl>
## 1 Argentina Americas 1952 62.485 17876956 5911.315
## 2 Argentina Americas 1957 64.399 19610538 6856.856
## 3 Argentina Americas 1962 65.142 21283783 7133.166
## 4 Argentina Americas 1967 65.634 22934225 8052.953
## 5 Argentina Americas 1972 67.065 24779799 9443.039
20 / 33

Manipulate and Preprocess Data

Modify/add variables to existing data frame. We modify data with the mutate() function and chain them together using the pipe operator %>%.

library(tidyverse) # loads dplyr package, among others
library(gapminder)
gapminder_americas <- gapminder %>%
filter(continent == "Americas") %>%
# create a character/categorical variable
# to distinguish between North/South America
mutate(north_america = ifelse(country == "United States" |
country == "Canada",
"north_america",
"south_america"))
head(gapminder_americas,3)
## # A tibble: 3 x 7
## country continent year lifeExp pop gdpPercap north_america
## <fctr> <fctr> <int> <dbl> <int> <dbl> <chr>
## 1 Argentina Americas 1952 62.485 17876956 5911.315 south_america
## 2 Argentina Americas 1957 64.399 19610538 6856.856 south_america
## 3 Argentina Americas 1962 65.142 21283783 7133.166 south_america
21 / 33

Manipulate and Preprocess Data

Use filtered and preprocessed data to highlight comparisons in ggplot:

ggplot(gapminder_americas, # only use data for Americas
aes(x = year,
y = gdpPercap,
color = north_america)) + # map "north_america" category to color
geom_point()

22 / 33

Exercise

Plot the development of population size (pop variable in the gapminder data) over time (year variable in the gapminder data) in Asia (hint: continent == "Asia"). Add a trend line and/or smooth line.

Bonus exercise: Plot the relationship between population size pop and gdpPercap! (hint: might make sense to wrap pop and gdpPercap in log()).

23 / 33

Solution

library(tidyverse)
library(gapminder)
gapminder_asia <- gapminder %>%
filter(continent == "Asia")
asia_pop <- ggplot(gapminder_asia,
aes(x = year, y = pop)) +
geom_point() +
geom_smooth(method = "lm")
print(asia_pop)

24 / 33

Walkthrough Exercise

Goal:

25 / 33

What do we want to visualize?

Think about the data! What is the comparison?

Genocide vs. non-genocide countries => Rwanda vs. rest of Africa

library(gapminder)
library(tidyverse)
gapminder_africa <- gapminder %>%
# filter only African countries
filter(continent == "Africa") %>%
# create a categorical variable that distinguishes
# between Rwanda and other African countries
mutate(color_plot = ifelse(country != "Rwanda", # != = "!" + "="
"Other African Countries",
"Rwanda"))
26 / 33

Add geom_line() + map color/alpha

rwanda_plot <- ggplot(gapminder_africa,
aes(x = year,
y = lifeExp,
group = country,
color = color_plot)) +
geom_line(aes(alpha = color_plot)) # map alpha to "color_plot" variable
# ggplot chooses alpha level automatically
print(rwanda_plot)

27 / 33

Add color/alpha scales

rwanda_plot <- ggplot(gapminder_africa,
aes(x = year,
y = lifeExp,
group = country,
color = color_plot)) +
geom_line(aes(alpha = color_plot)) +
# we assign colors/alpha values/other "aes" through "scale" functions
scale_alpha_discrete("", range = c(0.5, 1)) +
scale_color_manual("", values = c("lightgrey", "black"))
print(rwanda_plot)

28 / 33

Manipulate appearance: add theme

rwanda_plot <- ggplot(gapminder_africa,
aes(x = year,
y = lifeExp,
group = country,
color = color_plot)) +
geom_line(aes(alpha = color_plot)) +
scale_alpha_discrete("", range = c(0.5, 1)) +
scale_color_manual("", values = c("lightgrey", "black")) +
# add theme
theme_bw() + # black and white theme
theme(legend.position = "bottom", # legend position
panel.grid = element_blank()) # remove grid lines
29 / 33

Manipulate appearance: add theme

print(rwanda_plot)

30 / 33

Manipulate appearance: change labels

rwanda_plot <- ggplot(gapminder_africa,
aes(x = year,
y = lifeExp,
group = country,
color = color_plot)) +
geom_line(aes(alpha = color_plot)) +
scale_alpha_discrete("", range = c(0.5, 1)) +
scale_color_manual("", values = c("lightgrey", "black")) +
theme_bw() +
theme(legend.position = "bottom",
panel.grid = element_blank()) +
# labels, captions, and title/subtitle
labs(x = "", y = "Life Expectancy in Years",
title = "The Impact of Genocide on Life Expectancy",
subtitle = "Life expectancy for newborns extrapolated from mortality rate in a given year.",
caption = " Data source: gapminder.org")
31 / 33

Manipulate appearance: change labels

print(rwanda_plot)

32 / 33

Useful tips from the dataviz ninja

  1. Think hard about what you want to visualize!

  2. Don't use too many aesthetics - just use those that help you clarify your comparison!

  3. Trial and error is your friend!

    "If you are unsure of what each piece of code does, take advantage of ggplot's additive character. Working backwards from the bottom up, remove each + some_function(...) statement one at a time to see how the plot changes." - Kieran Healy

33 / 33

ggplot2

The "gg" in ggplot2 stands for the "Grammar of Graphics." The grammar of graphics is a philosophy of data visualization which forces you to think about what you want to visualize how. Hadley Wickham followed this philosophy to implement the ggplot2 package.

2 / 33
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow