class: center, middle, inverse, title-slide # Data Visualization for Political Scientists ## Session 3 - Facets and Small Multiples
### Felix Haass ### 15 Januar 2018 --- # facets `faceted` plots (or small multiple plots) are a way to divide your data up by a categorical variable. Facets are "not a geom, but rather a way of organizing a series of geoms" ([Kieran Healy](http://socviz.co/groupfacettx.html#facet-to-make-small-multiples)). <img src="session3_facets_files/figure-html/unnamed-chunk-1-1.png" style="display: block; margin: auto;" /> --- # facets: think about the comparison! GDP/pc development by *continent*. In ggplot, we use the `facet_wrap()` building block to specify the faceting variable(s). ```r p <- ggplot(data = gapminder, mapping = aes(x = year, y = gdpPercap)) + geom_line(color="gray70", aes(group = country)) + # recall we need to map group to country facet_wrap(~ continent, # "~" ncol = 5) # how many columns? ``` --- # facets: think about the comparison! ```r print(p) ``` <img src="session3_facets_files/figure-html/unnamed-chunk-3-1.png" style="display: block; margin: auto;" /> --- # facets: all elements ```r library(tidyverse) library(gapminder) p <- ggplot(data = gapminder, mapping = aes(x = year, y = gdpPercap)) + geom_line(color="gray70", aes(group = country)) + # add smoother geom_smooth(size = 1.1, method = "loess", se = FALSE) + # log y axis (could've also wrapped y=log(gdpPercap) in aes() above) scale_y_log10(labels=scales::dollar) + # facet command facet_wrap(~ continent, ncol = 5) + # labels and appearance tweaks labs(x = "Year", y = "GDP per capita", title = "GDP per capita on Five Continents") + theme_bw() + theme(axis.text.x = element_text(size = 5)) ``` --- # facets: all elements ```r print(p) ``` <img src="session3_facets_files/figure-html/unnamed-chunk-5-1.png" style="display: block; margin: auto;" /> --- # facets: more applications Recall our example: relationship between GDP per capita and population in Asia. ```r gdp_pop_plot <- ggplot(gapminder %>% filter(continent == "Asia"), aes(x = log(pop), y = log(gdpPercap))) + geom_point(alpha = 0.5, size = 2) + geom_smooth(method = "lm") print(gdp_pop_plot) ``` <img src="session3_facets_files/figure-html/unnamed-chunk-6-1.png" style="display: block; margin: auto;" /> --- class: inverse, center, middle # Really a negative relationship? --- # facets: more applications Plot regression line by country (without facets) ```r gdp_pop_plot <- ggplot(gapminder %>% filter(continent == "Asia"), aes(x = log(pop), y = log(gdpPercap))) + geom_point(aes(color = country), alpha = 0.5, size = 2) + geom_smooth(aes(fill = country, color = country), method = "lm") + theme(legend.position = "none") ``` --- # facets: more applications <img src="session3_facets_files/figure-html/unnamed-chunk-8-1.png" style="display: block; margin: auto;" /> --- # facets: more applications Prior plot useful for MoMA, but not for the data analyst. How to do better? facets! ```r gapminder_asia <- gapminder %>% filter(continent == "Asia") gdp_pop_plot <- ggplot(gapminder_asia, aes(x = log(pop), y = log(gdpPercap))) + geom_point(alpha = 0.5, size = 1) + geom_smooth(method = "lm", size = 0.7) + facet_wrap(~ country, scales = "free") + # scales = "free" to vary axis limits + theme_bw() + theme(axis.text = element_text(size = 4), strip.text = element_text(size = 6)) ``` --- # facets: more applications <img src="session3_facets_files/figure-html/unnamed-chunk-10-1.png" style="display: block; margin: auto;" /> --- class: inverse background-image: url("Ninja-header.svg_opacity1.png") background-size: contain # Useful tips from the dataviz ninja 1. Think hard about *what* you want to visualize! 2. Don't use too many aesthetics - just use those that help you clarify your comparison! 3. Trial and error is your friend! 4. **Alphabet is the least useful ways to organize information.** --- # facets: order by summary statistic ```r library(forcats) # useful to reorder factors or ordered categorical variables gapminder_asia <- gapminder %>% filter(continent == "Asia") %>% # do all data manipulation by country group_by(country) %>% # extract beta coefficient from reg of GDP on pop mutate(beta = coef(lm(log(gdpPercap) ~ log(pop)))[2]) %>% # remove country grouping ungroup() %>% # sort "country" variable by beta mutate(country_order = fct_reorder(country, beta)) head(gapminder_asia, 5) ``` ``` ## # A tibble: 5 x 8 ## country continent year lifeExp pop gdpPercap beta ## <fctr> <fctr> <int> <dbl> <int> <dbl> <dbl> ## 1 Afghanistan Asia 1952 28.801 8425333 779.4453 -0.03799305 ## 2 Afghanistan Asia 1957 30.332 9240934 820.8530 -0.03799305 ## 3 Afghanistan Asia 1962 31.997 10267083 853.1007 -0.03799305 ## 4 Afghanistan Asia 1967 34.020 11537966 836.1971 -0.03799305 ## 5 Afghanistan Asia 1972 36.088 13079460 739.9811 -0.03799305 ## # ... with 1 more variables: country_order <fctr> ``` --- # facets: order by summary statistic <img src="session3_facets_files/figure-html/unnamed-chunk-12-1.png" style="display: block; margin: auto;" /> --- # facets: Exercise Plot the relationship between `gdpPercap` and `lifeExp` in the Americas, faceted by `country`. Bonus: sort `country` by the direction + strength of the relationship between `gdpPercap` and `lifeExp` What is surprising? --- # facets: Exercise Solution ```r gapminder_americas <- gapminder %>% filter(continent == "Americas") %>% group_by(country) %>% mutate(beta = coef(lm(lifeExp ~ log(gdpPercap)))[2]) %>% ungroup() %>% mutate(country_order = fct_reorder(country, beta)) gdp_lifeexp_americas <- ggplot(gapminder_americas, aes(x = log(gdpPercap), y = lifeExp)) + geom_point(alpha = 0.5, size = 1) + geom_smooth(method = "lm", size = 0.7) + facet_wrap(~ country_order, # facet by "country_order"! scales = "free") + # scales = "free" to vary axis limits + theme_bw() + theme(axis.text = element_text(size = 4), strip.text = element_text(size = 6)) ``` --- # facets: Exercise Solution <img src="session3_facets_files/figure-html/unnamed-chunk-14-1.png" style="display: block; margin: auto;" />