Felix Haass
ABI Freiburg / GIGA Hamburg
Political economy; conflict, democratization, United Nations
Briefly introduce yourself!
Your Name
Your Research
Your Motivation for Participation in the Workshop
ggplot2
package to create informative data visualizationsYou're doing it right if you get frustrated: if you're not frustrated, you're (probably) not stretching yourself mentally
— Hadley Wickham (@hadleywickham) 11. Februar 2015
Time | Topic |
---|---|
9.30-10.00 | Session 1: Introduction and R refresher |
10.00-10.30 | Session 1 continued: Reading data, project organization |
10.30-10.45 | Coffee Break |
10.45-12.15 | Session 2: The logic of a grammar of graphics & its implementation in ggplot2 |
12.15-13.15 | Lunch Break |
13.15-14.45 | Session 3: Use-Cases I - Facets and small multiples; sorting facets |
14.45-15.00 | Coffee Break |
15.00-16.00 | Session 4: Use-Case II - Coefficient plots |
16.00-17.00 | Session 5: Wrap-up - Exporting plots; questions; where to get help |
R
is a programming language for statistical analysis
RStudio
is the interactive software with which we write and execute R code, plot things, view the R memory environment (…and much more)R uses different libraries or packages to load specific functions (read excel files, talk to Twitter, generate plots, …): https://cran.r-project.org/. You load a package or a library with the command
library(read_excel) # read_excel is the package name (without quotation marks)
If a command throws an error, chances are you either
To install a package we use:
install.packages("gapminder") # with quotation marks!
In R, we assign stuff (numbers, characters, data frames) to things (objects)
url <- "http://gmi.bicc.de/index.php?page=ranking-table"
url
: object, in this case: a character vector"http://gmi.bicc.de/index.php?page=ranking-table"
: “stuff” (URL, could be any text or number)<-
: assign command, type <
and -
(shortcut: alt + - in RStudio)In R, everything is an object–and you can have multiple objects in your memory at the same time!
# 1st object: assign numbers to a vector
numbers <- 1:5
# 2nd object: read data from an excel sheet
sipri <- read_excel("./data/SIPRI-Milex-data-1949-2016_cleaned.xlsx",
sheet = 5,
na = c("xxx", ". ."))
Executing this command yields to objects in memory, numbers
the vector of numbers and the data frame sipri
.
Data frames are rectangular data tables, like an Excel spreadsheet.
library(gapminder)
library(tidyverse)
gapminder
## # A tibble: 1,704 x 6
## country continent year lifeExp pop gdpPercap
## <fctr> <fctr> <int> <dbl> <int> <dbl>
## 1 Afghanistan Asia 1952 28.8 8425333 779
## 2 Afghanistan Asia 1957 30.3 9240934 821
## 3 Afghanistan Asia 1962 32.0 10267083 853
## 4 Afghanistan Asia 1967 34.0 11537966 836
## 5 Afghanistan Asia 1972 36.1 13079460 740
## 6 Afghanistan Asia 1977 38.4 14880372 786
## 7 Afghanistan Asia 1982 39.9 12881816 978
## 8 Afghanistan Asia 1987 40.8 13867957 852
## 9 Afghanistan Asia 1992 41.7 16317921 649
## 10 Afghanistan Asia 1997 41.8 22227415 635
## # ... with 1,694 more rows
library()
(load) them or install.packages()
them!<-
for assignments!class()
)!help(command_name)
if you can’t remember a command’s options.Having a structured way to organize your R code is useful for reproducibility (and your future sanity!)
There are two ways to improve your R code organization:
A useful way to organize your project folders:
project_name/ # name of your project
|-- code/ # here go all the .R script files
|-- data/ # here's your data
| |-- input/ # raw input data file (experimental results, existing datasets)
|-- output/ # transformed and cleaned datasets for analysis
|-- manuscript/ # your manuscript, i.e. .docx or LaTeX files
|-- figures/ # your figures as separate files
|-- output/ # tables
An RStudio project takes care of several useful steps in your project. When you load an RStudio project, the following steps are taken:
In RStudio, go to File => New Project => “Existing Directory”
To read .csv
files, the the read_csv()
function in the readr
package is useful (automatically loaded through library(tidyverse)
).
To read Excel files, use the read_excel()
function from the readxl
package, which needs to be loaded separately.
To read files from Stata or SPSS, use read_dta()
or read_spss()
from the haven
package, which needs to be loaded separately.
Example:
library(tidyverse)
library(readxl)
sipri <- read_excel("./data/SIPRI-Milex-data-1949-2016_cleaned.xlsx",
sheet = 5, na = c("xxx", ". ."))
To read R files (.rda
or .rdata
), simply use load("name_of_my_file.rda"
)
Also useful: the rio
package!
library(tidyverse)
library(readxl)
## Warning: package 'readxl' was built under R version 3.4.3
sipri <- read_excel("./data/SIPRI-Milex-data-1949-2016_cleaned.xlsx",
sheet = 5, na = c("xxx", ". ."))
sipri_plot <- sipri %>%
# from wide to long format with the `gather function
gather(key = year,
value = military_expenditure,
-Country) %>%
ggplot(., aes(x = year,
y = military_expenditure,
group = Country)) +
geom_line(alpha = 0.5)
print(sipri_plot)