1  Principles

1.1 There are always tradeoffs

A frequent tradeoff can be thought of as truthfulness vs. simplicity.

In other words, you will need to balance:

  • Readability vs. “completeness”
  • Concise vs. “attention-gabbing”
  • Simplicity vs. other goals

If you drop outliers, for example, your chart’s readability will almost surely improve. But it could be less truthful.

1.2 Questions to ask yourself

  1. What chart type is appropriate for my situation?
    • Can I try a few options?
    • And is this a case where a (simple!) table would be more effective to communicate what you found?
  2. Did I spend at least a few minutes making some thoughtful design choice (amount of text, labels, annotations; color choice; font size) so that my chart is clear and reasonably self-contained?
    • How much data would you want to display? …to convey your findings clearly & credibly?
  3. How much data is necessary?
    • …to convince yourself that your story is truthful?

1.3 Back to readability vs. “completeness”

If you label a subset of your observations, then arguably some information “is lost”, unless you post your data.

You may agree that you should almost ever make graphs that look like this:

When you can make charts like this instead:

1.4 Example: Making scatterplots better

Starting in 2021, inflation increased in many countries and became a source of serious concern for citizens and politicians. One set of substantive debates dealt with this set of questions: should governments be blamed for excessive spending (and borrowing)? Were fiscal decisions responsible for inflation? Here, we’ll deal with one potential approach to designing visual exhibits which might faciliate some international comparisons.1

Let’s get some data (available via the Github repo)

library(tidyverse)
# Get fiscal data
imf <- read_csv("data_macro/imf-fiscalmonitor-apr2023.csv")

# If you are just copying code from the website, use this:
# imf <- read_csv("https://raw.githubusercontent.com/zilinskyjan/DataViz/main/data_macro/imf-fiscalmonitor-apr2023.csv")

The following variables are available, all expressed as percentages of GDP:

unique(imf$variable)
[1] "General Government Expenditure" "Overall Balance"               
[3] "Gross public debt"             

Let’s view the first 3 rows to get a sense of the structure:

head(imf,3)
# A tibble: 3 × 12
  variable       country `2014` `2015` `2016` `2017` `2018` `2019` `2020` `2021`
  <chr>          <chr>    <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
1 General Gover… Austra…   36.9   37.4   37.4   36.9   37.0   39.1   44.6   42.2
2 General Gover… Austria   52.3   51.0   50.1   49.3   48.8   48.6   56.7   56.0
3 General Gover… Belgium   55.6   53.7   53.1   52.0   52.3   51.9   58.9   55.5
# ℹ 2 more variables: `2022` <dbl>, `2023` <dbl>

If we wanted each row to correspond to one country-year observation, we would run this snippet:

imf %>% pivot_longer(cols=`2014`:`2023`,
                     names_to = 'year')  # this part is not necessary but it's useful
# A tibble: 1,080 × 4
   variable                       country   year  value
   <chr>                          <chr>     <chr> <dbl>
 1 General Government Expenditure Australia 2014   36.9
 2 General Government Expenditure Australia 2015   37.4
 3 General Government Expenditure Australia 2016   37.4
 4 General Government Expenditure Australia 2017   36.9
 5 General Government Expenditure Australia 2018   37.0
 6 General Government Expenditure Australia 2019   39.1
 7 General Government Expenditure Australia 2020   44.6
 8 General Government Expenditure Australia 2021   42.2
 9 General Government Expenditure Australia 2022   38.4
10 General Government Expenditure Australia 2023   39.0
# ℹ 1,070 more rows

But we’ll focus here on year 2022, so let’s simply create 3 informative columns:

# Reformat data to wide and keep only the latest year
imf_wide2022 <- imf %>% select(variable,country,`2022`) %>%
  pivot_wider(names_from = variable, values_from = `2022`)
imf_wide2022
# A tibble: 36 × 4
   country   General Government Expendit…¹ `Overall Balance` `Gross public debt`
   <chr>                             <dbl>             <dbl>               <dbl>
 1 Australia                          38.4            -3.30                 55.7
 2 Austria                            52.4            -3.31                 77.8
 3 Belgium                            53.7            -4.32                105. 
 4 Canada                             41.5            -0.700               107. 
 5 Croatia                            45.7            -0.943                67.5
 6 Cyprus                             39.9             2.26                 86.5
 7 Czechia                            44.8            -3.59                 42.3
 8 Denmark                            49.2             2.48                 29.7
 9 Estonia                            40.2            -1.15                 17.2
10 Finland                            54.0            -1.86                 74.8
# ℹ 26 more rows
# ℹ abbreviated name: ¹​`General Government Expenditure`

Finally, we want to add inflation data to our fiscal data:

# Merge in inflation data:
inf <- read_csv("data_macro/inflation_WDI.csv")
inf2022 <- inf %>% filter(year==2022)

econ2022 <- left_join(imf_wide2022,inf2022,by="country")

So we will work with the econ2022 data object for a moment.

This chart displays the relationship between public budget deficits and inflation.

# Make a simple plot:
econ2022 %>%
  filter(country != "Norway") %>%
  ggplot(aes(y=inflation,
         x=`Overall Balance`,
         label=country)) + 
  geom_point() +
  geom_text() +
  labs(title = "Economic situation in 2022",
        x="Overall budget balance (% of GDP)", y= "Inflation (%)")

A few things to notice:

  • We are not displaying Norway (can you check why?)
  • We added an informative title to the scatterplot
  • We made the questionable choice to use geom_text() which uses what we placed inside aes(... label=country).

The same data can be displayed this way; ggrepel::geom_text_repel() is helpful in this context:

econ2022 %>%
  filter(country != "Norway") %>%
  ggplot(aes(y=inflation,
         x=`Overall Balance`,
         label=country)) + 
  geom_point() +
  ggrepel::geom_text_repel() +
  labs(title = "Economic situation in 2022",
        x="Overall budget balance (% of GDP)", y= "Inflation (%)")

Or you can highlight a subset of subset of observations relevant for your analysis.

Let’s create a vector of country names:

subset <- c("Italy","Sweden","United States")

We’ll want to insert data=econ2022 %>% filter(country %in% subset) into geom_text_repel:

scatter1 <- econ2022 %>%
  filter(country != "Norway") %>%
  ggplot(aes(y=inflation,
         x=`Overall Balance`)) + 
  geom_point() +
  ggrepel::geom_text_repel(data=econ2022 %>% 
                             filter(country %in% subset),
           aes(label=country),nudge_x=.75,nudge_y=.75) +
  labs(title = "Economic situation in 2022",
        x="Overall budget balance (% of GDP)", y= "Inflation (%)") +
  theme_classic()

scatter1

Adding a layer and selecting attributes for a specific subset of points is a general, more widely applicable, approach:

scatter1 +
  geom_point(data=econ2022 %>% filter(country %in% subset),
             color="red")

Note also that I also that it could have been tempting to label the x-axis as showing “Public deficit” because we almost always talk about deficits. But that would have been misleading, given that only negative values on the x-axis would have denoted the deficit.

We see that deficit spending is not informative: moderately high inflation was common across OECD countries; neither large deficits, nor budget surpluses, were prognostic of better/worse outcomes.

1.4.1 Total government spending and (contemporaneous) inflation

What about total government spending? We can check:

econ2022 %>%
  ggplot(aes(y=inflation,
         x=`General Government Expenditure`)) + 
  geom_point() +
  ggrepel::geom_text_repel(data=econ2022 %>% 
                             filter(country %in% c("Italy",
                                                   "Sweden",
                                                   "United States")),
           aes(label=country),nudge_y=.5,nudge_x=-.5) +
  labs(title = "Economic situation in 2022",
        x="Public spending (% of GDP)", y= "Inflation (%)") 

This would seem to suggest that higher (government) spending is not necessarily associate with faster price growth.

To be sure, a more careful analysis here would require looking it changes in government expenditures.

1.5 Inflation magnitudes

Here a few ways to show how sharply inflation increased between 2021 in 2022 in most OECD countries:

1.5.1 A basic dot plot

inf %>% filter(iso3c %in% econ2022$iso3c) %>%
  filter(year>=2021) %>%
  mutate(avg = mean(inflation), .by = country) %>%
  ggplot(aes(x=inflation,
             color= factor(year),
             y=fct_reorder(country,avg))) +
  geom_point() +
  ggrepel::geom_text_repel(aes(label=round(inflation,1)),
                           show.legend = FALSE) +
  scale_color_brewer(palette = 2,type = "qual") +
  labs(x="Inflation (%)",y="",color="Year")

1.5.2 A dumbbell chart

inf %>% filter(iso3c %in% econ2022$iso3c) %>%
  filter(year>=2021) %>%
  mutate(avg = mean(inflation), .by = country) %>%
  ggplot(aes(x=inflation,
             color= factor(year),
             y=fct_reorder(country,avg))) +
  geom_line(aes(group=country),color="grey50") +
  geom_point(size=2) +
  scale_color_brewer(palette = 2,type = "qual") +
  labs(x="Inflation (%)",y="",color="Year")

Above we simply added a geom_line(aes(group=country),color="grey50") and made sure that the line was plotted before the points were added for each country (for aesthetic reasons).

1.5.3 A standard chart

Or you could make a bar chart:

inf %>% filter(iso3c %in% econ2022$iso3c) %>%
  filter(year>=2021) %>%
  mutate(avg = mean(inflation), .by = country) %>%
  ggplot(aes(x=inflation,
             fill= factor(year),
             y=fct_reorder(country,avg))) +
  geom_col(position = position_dodge()) +
  scale_fill_brewer(palette = 3,type = "qual") +
  labs(x="Inflation (%)",y="",fill="Year",title = "Inflation in OECD countries")

Note that we:

  • had to change color to fill within aes()
  • we also had to make that change within labs()
  • changed scale_fill_brewer to scale_fill_brewer
  • replaced geom_point() with geom_col(position = position_dodge())

1.6 Exercise: Inflation and government spending

Use the provided datasets to explore whether an increase in government spending predicts (current or future) inflation.

Consider the following issues and provide brief justifications:

  • Let the focal year of interest be T = 2022. Is it sensible to compare T with T-1? What if the fiscal effect materializes with a lag? And would it be appropriate to compare current spending to pre-pandemic spending?
  • Should past spending be subtracted from “current” spending at time T? If your answer is yes, remember that you would be reporting differences expressed in percentage points. (Don’t slip, using the symbol “%” would be misleading…)

  1. But also remember that evidence of this kind can inform factual debates but it wouldn’t settle the debate.↩︎