Code
# Base R
data_base <- read.csv("my_data.csv")
# Tidyverse (readr)
library(tidyverse)
data_tidy <- read_csv("my_data.csv")read_csv() vs. read.csv()# Base R
data_base <- read.csv("my_data.csv")
# Tidyverse (readr)
library(tidyverse)
data_tidy <- read_csv("my_data.csv")Why prefer read_csv()?
Tip: To store data locally, use write_csv(data, "out.csv").
head(data_tidy) # first 6 rows
tail(data_tidy) # last 6 rows
str(data_tidy) # structure: types & sample values
glimpse(data_tidy) # tidyverse-friendly structure
dim(data_tidy) # rows, columns
names(data_tidy) # column namesWhy inspect early?
Sometimes column names contain spaces or punctuation. You can’t refer to them directly without backticks.
# Suppose “Number of deaths” was imported:
data$`Number of deaths`
# Better: rename immediately
data <- data %>%
rename(NumberOfDeaths = `Number of deaths`)Tip: Use janitor’s clean_names() to automatically convert all names to snake_case:
library(janitor)
data <- data %>% clean_names()
# “Number of Deaths” → number_of_deathsdata <- data %>%
rename(
deaths_total = `Number of deaths`,
country_code = CountryCode
)Rename multiple columns in one call using new_name = old_name syntax.
# Simple one-key join
merged <- main_data %>%
left_join(data_to_add, by = "country_year_id")
# Two-key join
merged <- main_data %>%
left_join(data_to_add, by = c("country", "year"))
# Other types:
# inner_join(): only keep rows present in both
# right_join(): keep all from data_to_addTip: Before joining, ensure keys have the same type and values:
unique(main_data$country)
unique(data_to_add$country)Tip: If one dataset uses “DEU” and the other “Germany,” recode or create a lookup table before joining.
This is how we would stack observations:
# Recommended
total <- bind_rows(data_for_germany,
data_for_france)
# Base R equivalent:
total2 <- rbind(data_for_germany, data_for_france)Note: bind_rows() will fill in missing columns with NA if one data frame has extra columns.
%>%result <- data %>%
filter(year >= 2000) %>%
select(country, year, deaths_total) %>%
arrange(desc(deaths_total))mutate() to create or transform columnsdata <- data %>%
mutate(
deaths_per_100k = deaths_total / population * 100000,
log_deaths = log(deaths_total)
)group_by() + summarise()summary <- data %>%
group_by(country) %>%
summarise(
total_deaths = sum(deaths_total, na.rm = TRUE),
avg_deaths = mean(deaths_total, na.rm = TRUE)
)data %>%
add_count(country, year) %>%
filter(n > 1)data <- data %>%
mutate(
country = as_factor(country),
date = lubridate::ymd(date_string)
)glimpse() for a prettier, horizontally oriented overview.See Grant McDermott’s excellent slides