1 Principles
1.1 There are always tradeoffs
A frequent tradeoff can be thought of as truthfulness vs. simplicity.
In other words, you will need to balance:
- Readability vs. “completeness”
- Concise vs. “attention-gabbing”
- Simplicity vs. other goals
If you drop outliers, for example, your chart’s readability will almost surely improve. But it could be less truthful.
1.2 Questions to ask yourself
- What chart type is appropriate for my situation?
- Can I try a few options?
- And is this a case where a (simple!) table would be more effective to communicate what you found?
- Did I spend at least a few minutes making some thoughtful design choice (amount of text, labels, annotations; color choice; font size) so that my chart is clear and reasonably self-contained?
- How much data would you want to display? …to convey your findings clearly & credibly?
- How much data is necessary?
- …to convince yourself that your story is truthful?
1.3 Back to readability vs. “completeness”
If you label a subset of your observations, then arguably some information “is lost”, unless you post your data.
You may agree that you should almost ever make graphs that look like this:
When you can make charts like this instead:
1.4 Example: Making scatterplots better
Starting in 2021, inflation increased in many countries and became a source of serious concern for citizens and politicians. One set of substantive debates dealt with this set of questions: should governments be blamed for excessive spending (and borrowing)? Were fiscal decisions responsible for inflation? Here, we’ll deal with one potential approach to designing visual exhibits which might faciliate some international comparisons.1
Let’s get some data (available via the Github repo)
library(tidyverse)
# Get fiscal data
<- read_csv("data_macro/imf-fiscalmonitor-apr2023.csv")
imf
# If you are just copying code from the website, use this:
# imf <- read_csv("https://raw.githubusercontent.com/zilinskyjan/DataViz/main/data_macro/imf-fiscalmonitor-apr2023.csv")
The following variables are available, all expressed as percentages of GDP:
unique(imf$variable)
[1] "General Government Expenditure" "Overall Balance"
[3] "Gross public debt"
Let’s view the first 3 rows to get a sense of the structure:
head(imf,3)
# A tibble: 3 × 12
variable country `2014` `2015` `2016` `2017` `2018` `2019` `2020` `2021`
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 General Gover… Austra… 36.9 37.4 37.4 36.9 37.0 39.1 44.6 42.2
2 General Gover… Austria 52.3 51.0 50.1 49.3 48.8 48.6 56.7 56.0
3 General Gover… Belgium 55.6 53.7 53.1 52.0 52.3 51.9 58.9 55.5
# ℹ 2 more variables: `2022` <dbl>, `2023` <dbl>
If we wanted each row to correspond to one country-year observation, we would run this snippet:
%>% pivot_longer(cols=`2014`:`2023`,
imf names_to = 'year') # this part is not necessary but it's useful
# A tibble: 1,080 × 4
variable country year value
<chr> <chr> <chr> <dbl>
1 General Government Expenditure Australia 2014 36.9
2 General Government Expenditure Australia 2015 37.4
3 General Government Expenditure Australia 2016 37.4
4 General Government Expenditure Australia 2017 36.9
5 General Government Expenditure Australia 2018 37.0
6 General Government Expenditure Australia 2019 39.1
7 General Government Expenditure Australia 2020 44.6
8 General Government Expenditure Australia 2021 42.2
9 General Government Expenditure Australia 2022 38.4
10 General Government Expenditure Australia 2023 39.0
# ℹ 1,070 more rows
But we’ll focus here on year 2022, so let’s simply create 3 informative columns:
# Reformat data to wide and keep only the latest year
<- imf %>% select(variable,country,`2022`) %>%
imf_wide2022 pivot_wider(names_from = variable, values_from = `2022`)
imf_wide2022
# A tibble: 36 × 4
country General Government Expendit…¹ `Overall Balance` `Gross public debt`
<chr> <dbl> <dbl> <dbl>
1 Australia 38.4 -3.30 55.7
2 Austria 52.4 -3.31 77.8
3 Belgium 53.7 -4.32 105.
4 Canada 41.5 -0.700 107.
5 Croatia 45.7 -0.943 67.5
6 Cyprus 39.9 2.26 86.5
7 Czechia 44.8 -3.59 42.3
8 Denmark 49.2 2.48 29.7
9 Estonia 40.2 -1.15 17.2
10 Finland 54.0 -1.86 74.8
# ℹ 26 more rows
# ℹ abbreviated name: ¹`General Government Expenditure`
Finally, we want to add inflation data to our fiscal data:
# Merge in inflation data:
<- read_csv("data_macro/inflation_WDI.csv")
inf <- inf %>% filter(year==2022)
inf2022
<- left_join(imf_wide2022,inf2022,by="country") econ2022
So we will work with the econ2022
data object for a moment.
This chart displays the relationship between public budget deficits and inflation.
# Make a simple plot:
%>%
econ2022 filter(country != "Norway") %>%
ggplot(aes(y=inflation,
x=`Overall Balance`,
label=country)) +
geom_point() +
geom_text() +
labs(title = "Economic situation in 2022",
x="Overall budget balance (% of GDP)", y= "Inflation (%)")
A few things to notice:
- We are not displaying Norway (can you check why?)
- We added an informative title to the scatterplot
- We made the questionable choice to use
geom_text()
which uses what we placed insideaes(... label=country)
.
The same data can be displayed this way; ggrepel::geom_text_repel()
is helpful in this context:
%>%
econ2022 filter(country != "Norway") %>%
ggplot(aes(y=inflation,
x=`Overall Balance`,
label=country)) +
geom_point() +
::geom_text_repel() +
ggrepellabs(title = "Economic situation in 2022",
x="Overall budget balance (% of GDP)", y= "Inflation (%)")
Or you can highlight a subset of subset of observations relevant for your analysis.
Let’s create a vector of country names:
<- c("Italy","Sweden","United States") subset
We’ll want to insert data=econ2022 %>% filter(country %in% subset)
into geom_text_repel
:
<- econ2022 %>%
scatter1 filter(country != "Norway") %>%
ggplot(aes(y=inflation,
x=`Overall Balance`)) +
geom_point() +
::geom_text_repel(data=econ2022 %>%
ggrepelfilter(country %in% subset),
aes(label=country),nudge_x=.75,nudge_y=.75) +
labs(title = "Economic situation in 2022",
x="Overall budget balance (% of GDP)", y= "Inflation (%)") +
theme_classic()
scatter1
Adding a layer and selecting attributes for a specific subset of points is a general, more widely applicable, approach:
+
scatter1 geom_point(data=econ2022 %>% filter(country %in% subset),
color="red")
Note also that I also that it could have been tempting to label the x-axis as showing “Public deficit” because we almost always talk about deficits. But that would have been misleading, given that only negative values on the x-axis would have denoted the deficit.
We see that deficit spending is not informative: moderately high inflation was common across OECD countries; neither large deficits, nor budget surpluses, were prognostic of better/worse outcomes.
1.4.1 Total government spending and (contemporaneous) inflation
What about total government spending? We can check:
%>%
econ2022 ggplot(aes(y=inflation,
x=`General Government Expenditure`)) +
geom_point() +
::geom_text_repel(data=econ2022 %>%
ggrepelfilter(country %in% c("Italy",
"Sweden",
"United States")),
aes(label=country),nudge_y=.5,nudge_x=-.5) +
labs(title = "Economic situation in 2022",
x="Public spending (% of GDP)", y= "Inflation (%)")
This would seem to suggest that higher (government) spending is not necessarily associate with faster price growth.
To be sure, a more careful analysis here would require looking it changes in government expenditures.
1.5 Inflation magnitudes
Here a few ways to show how sharply inflation increased between 2021 in 2022 in most OECD countries:
1.5.1 A basic dot plot
%>% filter(iso3c %in% econ2022$iso3c) %>%
inf filter(year>=2021) %>%
mutate(avg = mean(inflation), .by = country) %>%
ggplot(aes(x=inflation,
color= factor(year),
y=fct_reorder(country,avg))) +
geom_point() +
::geom_text_repel(aes(label=round(inflation,1)),
ggrepelshow.legend = FALSE) +
scale_color_brewer(palette = 2,type = "qual") +
labs(x="Inflation (%)",y="",color="Year")
1.5.2 A dumbbell chart
%>% filter(iso3c %in% econ2022$iso3c) %>%
inf filter(year>=2021) %>%
mutate(avg = mean(inflation), .by = country) %>%
ggplot(aes(x=inflation,
color= factor(year),
y=fct_reorder(country,avg))) +
geom_line(aes(group=country),color="grey50") +
geom_point(size=2) +
scale_color_brewer(palette = 2,type = "qual") +
labs(x="Inflation (%)",y="",color="Year")
Above we simply added a geom_line(aes(group=country),color="grey50")
and made sure that the line was plotted before the points were added for each country (for aesthetic reasons).
1.5.3 A standard chart
Or you could make a bar chart:
%>% filter(iso3c %in% econ2022$iso3c) %>%
inf filter(year>=2021) %>%
mutate(avg = mean(inflation), .by = country) %>%
ggplot(aes(x=inflation,
fill= factor(year),
y=fct_reorder(country,avg))) +
geom_col(position = position_dodge()) +
scale_fill_brewer(palette = 3,type = "qual") +
labs(x="Inflation (%)",y="",fill="Year",title = "Inflation in OECD countries")
Note that we:
- had to change
color
tofill
withinaes()
- we also had to make that change within
labs()
- changed
scale_fill_brewer
toscale_fill_brewer
- replaced
geom_point()
withgeom_col(position = position_dodge())
1.6 Exercise: Inflation and government spending
Use the provided datasets to explore whether an increase in government spending predicts (current or future) inflation.
Consider the following issues and provide brief justifications:
- Let the focal year of interest be T = 2022. Is it sensible to compare T with T-1? What if the fiscal effect materializes with a lag? And would it be appropriate to compare current spending to pre-pandemic spending?
- Should past spending be subtracted from “current” spending at time T? If your answer is yes, remember that you would be reporting differences expressed in percentage points. (Don’t slip, using the symbol “%” would be misleading…)
But also remember that evidence of this kind can inform factual debates but it wouldn’t settle the debate.↩︎