Behavioral Approaches to Political Persuasion
Understand the central tension between standard economic models and behavioral science
Principles of economic reasoning (Optimization, Equilibrium) and why this makes persuasion seem hard
Introduce Behavioral Economics as an approach that accounts for human limitations (limited attention, willpower, etc.)
Key component: they would have been better off with another alternative, as judged by themselves.
A systematic study of political activities and institutions.
Choosing the best option (or trying to do so)
Everybody behaves this way
The recipient is skeptical of most claims.
Senders expect that their advertizing / lobbying efforts will matched by opponents.
In equilibrium, no benefit from deviation from (optimally chosen) actions.
But is information acquisition costless? (Frictionless information flow is assumed in many models…)
“the standard economic view that persuasion is conveyance of information seems to run into a rather basic problem that advertising is typically emotional, associative, and misleading — yet nonetheless effective” (Shleifer 2012)
The standard framework works well in markets (ideally: commodities with a clear features)
Preferences.
Some function translates “goods” into “utility”
Consumption decisions are subject to constraints
People “maximize utility”
\[ max ~ u(c) ~~~ s.t.~ p \times x \leq B \]
\[ Pr(H|R) = \frac{Pr(R|H)Pr(H)}{Pr(R)} \]
where Pr(R) = Pr(R|H)Pr(H) + Pr(R|Low)Pr(Low)
Pr(R|H) is the essential element.
More to come about cognitive distortions soon.
Some people object because “utility maximization” seems unnatural or unrealistic.
But consider unpleasant experiences that you wish to minimize
Each minute you spend waiting “hurts”.
Does each additional minute hurt equally?
\[ u(wait) = -minutes \]
\[ \text{or} ~~ u(wait) = -minutes^2 \]
What people are minimizing is not some spatial length/position but minutes spent waiting in the line.
U = value of groceries - time costs - monetary costs
Optimization: people decide what to do by consciously or unconsciously weighing the pros and cons of the different available options.
Equilibrium: systems tend toward equilibrium, a state in which no agent would benefit by changing his or her own behavior.
Example: What is Pr(Social Democrats will increase the minimum wage)?
Assume that \(U(h) = V - h\).
h | grade | Value |
---|---|---|
0 hours | F | -5 |
… | . | … |
10 hour | B | 0 |
20 hours | A | 30 |
U(0) = - 5
U(10) = -10
U(20) = 30 - 20 = 10
Continue to assume that \(U(h) = V - h\).
h | grade | Value |
---|---|---|
0 hours | F | -5 |
… | . | … |
10 hour | B | 0 |
20 hours | Pr(A)=50% | ? |
20 hours | Pr(B)=50% |
Then U(0) = - 5; U(10) = -10; and \(U(20) = 0.5 \times 30 + 0.5 \times 0 - 20 = - 5\).
You can think of any risky choice as making a gamble (used in a value-neutral way here).
In a gamble in which there is a \(p_1\)% chance of winning $X, and a \(p_2\)% chance of winning $Y, the expected take-home payout, or the expected value is:
\[ EV=p_1X + p_2Y \]
Generalizing:
\[ \text{EU(gamble)} = \sum_{i=1}^{N} u(x_i)p_i \]
. . .
Changing behavior means that you need to target one of them
When there are two prospects, X and Y:
\[ EV=p_1X + p_2Y \]
Conceptualizing persuasive technologies those messages that will change perceptions about any component of the payoff.
Behavioral economics: similar to other fields, but includes psychological factors
A more aggressive defintion (from a newspaper):the study of “how people actually make decisions rather than how the classic economic models say they make them.”
“Modern economics takes a relatively simple view of human behavior as governed by unlimited cognitive ability applied to a small number of concrete goals and unencumbered by emotion.” (Brunnermeier and Parker, 2005)
The traditional model assumes “that each individual has stable and coherent preferences and that she rationally maximizes those preferences given a set of options and beliefs” (Rabin)
Study political competition
Actors have different levels of sophistication
But they all seek to achieve well-defined objectives
\[ Pr(H|R) = \frac{Pr(R|H)Pr(H)}{Pr(R|H)Pr(H) + Pr(R|\neg H)Pr(\neg H).} \]
. . .
90% or 95%
. . .
What we know
That’s useful, but \(P(+|D)\) is not what we want.
Instead, we want \(P(D|+)\).
Some people will stress “good sensitivity”. That’s not enough…
\[ P(D|+) = \frac{P(+|D)P(D)}{P(+)} \]
\[ P(D|+) = \frac{P(+|D)P(D)}{P(+|D)P(D) + P(+|ND)P(ND)} \]
\[ P(D|+) = \frac{. 9 P(D)}{.9 P(D) + .05 P(ND)} \]
\[ P(D|+) = \frac{. 9 \times .01}{.9 \times .01 + .05 \times .99} = \frac{.009}{.009 + .0495} = 15.38\% \]
Disease | No disease | |
---|---|---|
Positive test | \(Pr(+ \cap D) \times N\) | \(Pr(+ \cap ND) \times N\) |
Negative | \(Pr(- \cap D) \times N\) | \(Pr(- \cap ND) \times N\) |
Disease | No disease | |
---|---|---|
Positive test | 90 | 495 |
Negative | 10 | 9,405 |
The correct answer is: \(\frac{90}{(90+495)}\)
Disease | No disease | |
---|---|---|
Positive test | 90 | \(\delta\) |
Negative | 10 | 9,405 |
Most people underestimate the \(\delta\) term: \(\frac{90}{(90+ \delta)}\)
When they do, the estimated \(P(D|+)^{biased}\) will approach 1, as \(\delta \rightarrow 0\).
Disease | No disease | |
---|---|---|
Positive test | 90 | \(\delta\) |
Negative | 10 | 9,405 |
Disease | No disease | |
---|---|---|
Positive test | \(Pr(+|D)\) | \(Pr(+|ND)\) |
Negative | \(Pr(-|D)\) | \(Pr(-|ND)\) |
Disease | No disease | |
---|---|---|
Positive test | \(Pr(D|+)\) | \(Pr(ND|+)\) |
Negative | \(Pr(D|-)\) | \(Pr(ND|-)\) |
Think of these objects as likelihoods of different things you can observe - you only have tests (you never observe the virus)
Disease | No disease | |
---|---|---|
Positive test | \(Pr(+|D)\) | \(Pr(+|ND)\) |
Negative | \(Pr(-|D)\) | \(Pr(-|ND)\) |
Think of these objects as hypotheses:
Disease | No disease | |
---|---|---|
Positive test | \(Pr(D|+)\) | \(Pr(ND|+)\) |
Negative | \(Pr(D|-)\) | \(Pr(ND|-)\) |
You form beliefs about hypotheses based on:
signals
priors
The likelihood ratio \(LR = \frac{Pr(+|D)}{Pr(+|ND)}\) would ideally be greater than 1.
Disease | No disease | |
---|---|---|
Positive test | \(Pr(+|D)\) | 5% |
Negative | 10% | \(Pr(-|ND)\) |
Disease | No disease | |
---|---|---|
Positive test | ? | ? |
Negative | ? | ? |
Disease | No disease | |
---|---|---|
Positive test | 90% | 5% |
Negative | 10% | 95% |
Disease | No disease | |
---|---|---|
Positive test | True positive | False positive |
Negative | False negative | True negative |
Disease | No disease | |
---|---|---|
Positive test | 90% | 5% |
Negative | 10% | 95% |
Disease | No disease | |
---|---|---|
Positive test | True positive | False positive |
Negative | False negative | True negative |
Disease | No disease | |
---|---|---|
Positive test | 90% \(\times P(D) \times N\) | 5% \(\times P(ND) \times N\) |
Negative | 10% \(\times P(D) \times N\) | 95% \(\times P(ND) \times N\) |
Once cells are populated with people…
Disease | No disease | |
---|---|---|
Positive test | 90 | 495 |
Negative | 10 | 9,405 |
… simply calculate proportions PER ROW.
Recall that for hypotheses we are conditioning on rows.
Disease | No disease | |
---|---|---|
Positive test | TP = 90 / (90+495) | FP = 495 / (90+495) |
Negative | FN = 10 / (10+9405) | 9405 / (10+9405) |
The correct beliefs, given the test results shown in a given row:
Disease | No disease | |
---|---|---|
Positive test | TP = 15.38% | FP = 84.62% |
Negative | FN = 0.11% | TN = 99.89% |
So what does accuracy mean?
It’s the ratio of all correct predictions over all classification attempts.
(90 + 9,405 ) / All observations = 94.95%
False positives / All positive results = False discovery rate
True positives / All positive results = Positive predictive value (PPV), or precision
A popular measure is the F-score: the harmonic mean of precision and recall.
\[ P(honest | B) = ? \]
\[ P(honest | B)^{BAYES} = \frac{P(h)P(B|h)}{P(h)P(B|h) + P(d)P(B|d)} = \frac{\pi}{\pi + (1-\pi) \lambda} \]
Then \[ P(honest | B)^{AVAIL} = \frac{P(h)P(B|h)}{P(h)P(B|h) + \widehat{P(d)}P(B|d)} \]
Given that \(\widehat{P(d)} > P(d)\):
\[ P(honest | B)^{AVAIL} < P(honest | B)^{BAYES} \]
A cognitive bias is a systematic error in thinking that occurs when people are processing and interpreting information in the world around them and affects the decisions and judgments that they make.
Important (but sometimes difficult) to remember
What is the probability that, during the next year, your car could be a “total loss” due to an accident?
What is the probability that, during the next year, your car could be a “total loss” due to:
Usual pattern: the first estimated probability is significantly lower.
(But the second way of asking questions simply suggests many reasons why accidents could happen.)
Hypothesis: people sample from memory
In 4 pages of a novel, do you expect to find more than 10 words that have the form
n
in the sixth position. _____
n_
____ing
Most respondents say that B is more likely.
a type of cognitive bias that involves favoring information that confirms your previously existing beliefs or biases.
For example, imagine that a person holds a belief that left-handed people are more creative than right-handed people. Whenever this person encounters a person that is both left-handed and creative, they place greater importance on this “evidence” that supports what they already believe.
A specific type of confirmation bias where people favor information that confirms their own existing beliefs or hypotheses. Involves focusing on the strengths of one’s own arguments while simultaneously concentrating on the weakest aspects of opposing claims.
Once you know something, it becomes obvious to you and you assume others know / should know.
A tendency to overestimate how much other people agree with us, view the world like us, etc.
Expectations about the behavior of others based on one’s own
. . . Students who cheat on their statistics exams believe that many others cheat as well whereas honest students think that cheating is rare.
Grossman and Hopkins find that
Blaming external forces for bad outcomes, claiming credit when good outcomes materialize
If you want something to happen
Poor Affective forecasting
Evaluating a choice based on realized consequences, rather than the quality of the decision at the time
The tendency to process information in a way that aligns with pre-existing beliefs and desired conclusions.
Rather than assessing evidence objectively, individuals are motivated to arrive at a particular outcome, often by selectively seeking out confirming evidence and dismissing contradictory information.
The public expression of condemnation of a moral transgressor. This can serve to enhance one’s own reputation and signal trustworthiness and adherence to group norms.
The act of publicly expressing opinions or sentiments intended to demonstrate one’s good character or moral standing within a community.
Key claim: what matters is whether a recipient is motivated to engage with a message
A subset of dual-processing theories.
Note: “elaboration” refers to the amount of effort devoted to processing new information and relating it to existing beliefs.
Considerations: are they ethical? Are they effective?
Channel factors (face-to-face, vs. digital; audio or video inclusion, etc.).
Style
Language
The route from initial (raw) information to an explicit (reported) response:
Incorrect report (color of the cab, % of people with an attribute, etc.) could mean there was an issue during any of the stages.
Raw information was biased in step #1
Bias in step #2 was “introduced when creating a perception from this raw information”
Translating of a perception into a survey report (#3) was too difficult
The subject misreported their perception on purpose
That’s the idea, born decades ago, that the way to understand the effect of media is to think of it as a direct injection of information, straight into your brain. Your reaction to this information will likely be rapid, predictable, and potent — just as much as a shot of adrenaline to the heart. You learn something terrible about a political candidate; this information injection makes you decide, instantly, not to vote for him.
Journalists and people around media are sophisticated enough to know that the work we produce doesn’t have huge, life-changing effects on people’s opinions every time we publish.
On the other hand, the hypodermic model can sometimes make media seem all-powerful, which holds a certain allure to those of us who make it.
It can be an especially comfortable mode to fall into when we talk about misinformation and “fake news.”
In general, people think they are sophisticated consumers of information, able to weight new facts appropriately — but other people? Other people? If Facebook tells ’em 2 + 2 is 5, they’ll throw out their calculator.
(Related concept: Oppositional reading.)
\[ Pr(H|R) = \frac{Pr(R|H)Pr(H)}{Pr(R|H)Pr(H) + Pr(R|\neg H)Pr(\neg H).} \]
“over-simplified image of a type of person or thing”
(Dictionary definition)
View 1: Humans are good intuitive statisticians1
View 2: Predictions are insensitive to reliable evidence2
One subset of “middle-ground” explanations includes uncertainty-based re-scaling:
“an attribute is representative of a class if it is very diagnostic; that is, the relative frequency of this attribute is much higher in that class than in a relevant reference class.” (Kahneman and Tversky 1983)
\(Pr(Red~ hair|Irish)\) is mistakenly over-estimated. Why?
Formally, define representativeness as:
\(t_G^* = argmax_{t \in \{dark,light,red\}}\frac{Pr(t|G)}{Pr(t|\neg G)}\)
It follows that the representative color is red: \(t_{IRISH}^* = 10 > \frac{40\%}{14\%} > \frac{50\%}{85\%}\).
\(Pr(Irish|Red~ hair) = \frac{Pr(RH|I) \times Pr(I)}{\underbrace{Pr(RH|I) \times Pr(I) + Pr(RH|\neg I) \times Pr(\neg I)}_{Pr(\text{Red hair})}}\)
If P(I) is not properly taken into account, then updating is hindered by base rate neglect..
Following Bordalo et al. 2016: \(Pr(RH|I)^{st} \neq Pr(RH|I)\).
\(Pr(RH|I)^{st} = Pr(RH|I) \times \frac{h(R(RH,I))}{\sum{t'}Pr(t'|I) h(R(t',G))}\)
Where:
G = African-American; T = {poor, middle-income, rich}
G = Democrat; T = {socialist,…}
G = Republican; T = {alt-right,…}
If you hear “Democrat”…
… is it easy to imagine a young person/protester/voter?
But it can be easy to ask the wrong question here…
According to Catalist:
Among voters age 18-29, the (two-party) Democratic vote share was 62%.
So yes, most young voters supported Biden.
What beliefs would a Bayesian hold about the age distribution among Democrats?
Age groups are categories or groups which have a distribution of types (Democratic vs. Republican voters).
Pr(t|G) = Pr(Voted Dem|Age 18-29) is a likelihood, or signal, or (in other settings) an outcome of a test.
Pr(Age 18-29|Voted Dem) is the degree of confidence in a hypothesis that a randomly observed person is a member of an (age) group, given their trait/type. Also known as: posterior probability.
“Non-rational” learning can occur for any reason when departure from Bayesian learning is observed. Even if Bayesian learning is attempted, the inputs may be “wrong” (e.g. misrecorded, misremembered…).
In 2020, 62% of young voters voted for Biden.
Then what is Pr(Age 18-29|Voted Dem)?
Shorten notation:
Bayes rule:
\(Pr(Young|VD) = \frac{Pr(Voted~Dem|Y) \times Pr(Y)}{\underbrace{Pr(Voted~Dem|Y) \times Pr(Y) + Pr(Voted~Dem|Age 30+) \times Pr(Age 30+)}_{Pr(\text{Voted Dem})}}\)
2020 election data:
\(t^*_{YOUNG}= \frac{Pr(Dem|18-29)}{Pr(Dem|30+)} > \frac{Pr(Rep|18-29)}{Pr(Rep|30+)}\), thus a young voter is representative of Democrats…
\(t^*_{65+}= \frac{Pr(Rep|65+)}{Pr(Rep|64 or less)} > \frac{Pr(Dem|65+)}{Pr(Dem|64 or less)}\)
But a Bayesian would correctly calculate that Pr(65+|Rep) = 26.1%.
In fact, Pr(65+|Rep) < Pr(45-64|Rep) = 35.4%, because more voters are middle-aged than 65+ years old.
Multiple perspectives
Information is served selectively
Persuasion has many forms
People observe data, and update their beliefs (applying Bayes’ rule)
Persuasion is data presented strategically
Truth will come to light because the market creates incentives for firms to reveal the truth about competitors: the relevant information will ultimately emerge.
Problem: Deception is excluded / inadmissible in the formal framework.
Prediction: some facts will be selectively omitted.
Faced with bad choices by consumers, such as smoking or undersaving, economists as System 2 thinkers tend to focus on education as a remedy. Show people statistics on deaths from lung cancer, or graphs of consumption drops after retirement, or data on returns on stocks versus bonds, and they will do better. As we have come to realize, such education usually fails. Kahneman’s book explains why: System 2 might not really engage until System 1 processes the message.
– Shleifer (2012)
Social persuasion
Authority (church leaders, parents)
Reciprocity (salespeople offer free samples or small gifts)
Consistency with the past self
Scarcity: an opportunity seems more attractive if it is scarce
Distant persuasion (intentional messaging) can only sometimes be social.
Individuals modify their opinions either through cognitive shortcuts (rules of thumb / heuristics / peripheral paths) or effortful reasoning.
Next slide: The Elaboration Likelihood Model of Persuasion (chart from Perloff, R. M. (2017). The dynamics of persuasion)
Social sciences
Shared interest in decision-making of humans