-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathFinalProject_AnanyaPujary.qmd
772 lines (524 loc) · 43.9 KB
/
FinalProject_AnanyaPujary.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
---
title: "Final Project"
author: "Ananya Pujary"
description: "Analyzing Snapchat Political Ads in the US in 2020"
date: "09/04/2022"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- final-project
- snapchat-political-ads
- ggplot
- dplyr
- stringr
- lubridate
- janitor
---
## Loading the Packages
```{r}
#| label: setup
#| warning: false
library(tidyverse)
library(googlesheets4)
library(skimr)
library(dplyr)
library(stringr)
library(lubridate)
library(purrr)
if(!require(corrr))
install.packages("corrr",repos = "https://cran.us.r-project.org")
if(!require(janitor))
install.packages("janitor",repos = "https://cran.us.r-project.org")
if(!require(usmap))
install.packages("usmap",repos = "https://cran.us.r-project.org")
if(!require(viridis))
install.packages("viridis",repos = "https://cran.us.r-project.org")
if(!require(transformr))
install.packages("transformr",repos = "https://cran.us.r-project.org")
if(!require(patchwork))
install.packages("patchwork",repos = "https://cran.us.r-project.org")
if(!require(kableExtra))
install.packages("kableExtra",repos = "https://cran.us.r-project.org")
knitr::opts_chunk$set(echo = TRUE)
```
## Introduction
Every election season, millions of dollars are spent on political advertisements that help candidates reach a wider audience of potential voters and influence the voting process (Nott, 2020). Political advertisements can be defined as those that "describe a political leader, organization, or party, a public office candidate, or an election/referendum" (Tomasi, 2021). These advertisements can also be created by entities other than the candidates themselves.
Now, with the proliferation of social media in almost every aspect of our lives, they are also playing their part in influencing the political process. Unlike traditional media like newspapers and television, social media platforms are not liable for what is displayed on them and can set their own content regulations (Nott, 2020). Political advertisements on social media are becoming popular because they allow for a 'micro-targeting' of demographics and allow candidates to understand and reach the masses better, in turn increasing voter engagement (Nott, 2020). Micro-targeting refers to a marketing strategy that employs consumer demographics and data to generate audience segments ("What is Micro-Targeting & How Does it Affect Advertising", n.d.).
While Facebook and Google have long been the dominating players in digital political advertising, Snapchat is becoming increasingly popular. In 2020, Snapchat had around 249 million active users on its platform, most of them in the age range of 13-29 (Rodriguez, 2020; "Snapchat statistics 2020", 2020). Snapchat has made data about the political ads shown on their app public, so this project will use their data for the year 2020 (Snap Inc., n.d.). In particular, I'll be looking at advertisements shown in the United States for the candidates Joe Biden and Donald Trump. I chose these candidates because the presidential elections were held this year (November 3rd, 2020) and they closely contested against each other. Using this dataset, I plan on looking at relative advertisement expenditure, impressions, and location micro-targeting, and explore the following broad questions:
::: {.callout-note appearance="simple"}
## Project Questions
- Is there a relationship between political ad expenditure and impressions?
- Which candidate's advertisements received more impressions?
- How much was spent on these advertisements and which candidate spent more on average?
- Which states were targeted by these advertisements, and which states did each candidate target more?
:::
\
## Reading in the Data
```{r}
#| label: reading in the data
polads_orig <- read_sheet('https://docs.google.com/spreadsheets/d/1S7jF0D2o8aC3gGndORVrksuSsvMMZwqVdKLmu4SYqUc/edit?usp=sharing')
```
Looking at the dataset's various characteristics:
```{r}
#| label: describing the data (1)
skim(polads_orig)
print(summarytools::dfSummary(polads_orig, varnumbers = FALSE, plain.ascii = FALSE, graph.magnif = 0.50, style = "grid", valid.col = FALSE),
method = 'render', table.classes = 'table-condensed')
```
This file contains the information for political ads that are/have been displayed on Snapchat's platform, such as the amount spent on them, the organization and advertisers behind them, the candidates/causes the ads support, demographic and location-based ad targeting, and so on. It has 12705 rows and 38 columns. There are 28 character-type, 1 list-type, 7 logical-type, and 2 numeric-type columns.
## Tidying the Data
Removing columns that only have missing values:
```{r}
#| label: tidying the data (1)
polads <- polads_orig %>%
remove_empty()
```
Snake case is typically recommended by tidyverse's style guide for column names and object names. However the column names in this dataset are written either in title case (e.g. `Currency Code`) or camel case (e.g. `OrganizationName`). Some of them also contain special characters like brackets which could interfere with implementing R functions.
::: callout-note
Snake case refers to the writing style that replaces spaces between words with an underscore (\_) and all of the letters in a word are lowercase. On the other hand, title case is the writing style in which the first letter of each word is capitalized and there are spaces between each word. A third type is the camel case, wherein phrases are written out without punctuation or spaces, and words are usually distinguished with the second word's first letter capitalized.
:::
Hence, using the `clean_names()` function from the 'janitor' package to convert all the column names accordingly.
```{r}
#| label: tidying the data (2)
polads <-clean_names(polads)
colnames(polads)
```
### Narrowing down the data
Let's look at the distribution of countries receiving political advertisements on Snapchat for this year.
```{r}
#| label: tidying the data (3)
table(polads$country_code) %>%
knitr::kable(caption = "Countries Receiving Snapchat Political Ads (2020)",col.names = c("Country","Frequency")) %>%
kable_minimal()
```
Most of the political advertisements were delivered to places in the United States (11124). Only keeping rows that describe ads targeting the United States:
```{r}
#| label: tidying the data (4)
polads <- filter(polads,country_code =="united states")
polads
```
Verifying whether all ads targeted for the United States were paid for in USD:
```{r}
#| label: tidying the data (5)
table(select(polads,currency_code)) %>%
knitr::kable(caption = "Currency Code Frequency (2020)",col.names = c("Currency Code","Frequency")) %>%
kable_minimal()
```
Three ads were paid for in Canadian Dollars (CAD). Finding out more about these three rows:
```{r}
#| label: tidying the data (6)
polads %>%
filter(currency_code=="CAD")
```
Two ads were paid for by the University of British Columbia and were targeted at people aged 16-25 in the following areas: San Francisco, Oakland, and San Jose. The third was paid for by Point Digital Creative Studio. None of them provided information about the candidate associated with the ad.
Since we require information about the candidate in order to analyze relative ad spending, targeting and impressions, tidying up the `candidate_ballot_information` column by removing missing values:
```{r}
#| label: tidying the data (7)
sum(is.na(polads$candidate_ballot_information))
# removing rows with missing values
polads <- drop_na(polads,candidate_ballot_information)
polads
```
There were 5312 rows missing candidate information. Next, I'm only including those rows that explicitly states the candidate name (containing the words "Biden" and "Trump").
```{r}
#| label: tidying the data (8)
polads <- filter(polads, str_detect(candidate_ballot_information, 'Biden|Trump'))
# sanity check
table(select(polads,candidate_ballot_information)) %>%
knitr::kable(caption = "Frequency of Candidates", col.names = c("Candidate","Frequency"))
```
There are two entries that contain the string "Trump" but are in fact campaigning against him ("Against Trump", "Operation Dump Trump", "Titere de Trump"). There's also an entry called "Biden vs Trump" which doesn't clearly indicate which party the ad will be supporting. Removing these rows so that they don't skew the results:
```{r}
#| label: tidying the data (9)
polads <- polads %>%
filter(!(candidate_ballot_information=="Against Trump"| candidate_ballot_information=="Operation Dump Trump"| candidate_ballot_information=="Biden vs Trump"| candidate_ballot_information=="Titere de Trump"))
```
Since missing values in certain columns indicate that either all or none of the categories in the column were targeted, I'm changing their missing values accordingly for easy analysis.
```{r}
#| label: tidying the data (10)
polads <- polads %>%
replace_na(list(gender = "ALL",os_type = "ALL",language = "none",advanced_demographics = "None",targeting_connection_type = "None",targeting_carrier_isp = "ALL"))
# sanity check
table(select(polads,gender))
table(select(polads,os_type))
table(select(polads,language))
table(select(polads,advanced_demographics))
table(select(polads,targeting_connection_type))
table(select(polads,targeting_carrier_isp))
```
### The case of `age_bracket` and `advanced_demographics`
The `age_bracket` column's values are as follows:
```{r}
#| label: tidying the data (11)
table(select(polads,age_bracket)) %>%
knitr::kable(caption = "Age Targeting by Snapchat Political Ads (2020)",col.names = c("Ages","Frequency")) %>%
kable_minimal()
```
Clearly, the column's values overlap and tend to refer to similar age groups, for instance, 18-20, 18-24, and 18+.
As for `advanced_demographics`:
```{r}
#| label: tidying the data (12)
table(select(polads,advanced_demographics))%>%
knitr::kable(caption = "Advanced Demographics Targeting by Snapchat Political Ads (2020)",col.names = c("Advanced Demographics","Frequency")) %>%
kable_minimal()
```
Clearly, very few ads provided additional demographic information for ad targetting and the data aren't uniform (i.e. there are details on people's household incomes, occupations, languages spoken, educational levels, number of children etc.), so I wouldn't be able to effectively analyze it in relation to other columns. Though I was looking forward to analyzing these columns, the data they had were too sparse to work with.
### Wrangling with the date columns
The "Z" at the end of the date-timestamp indicates that the timezone chosen is UTC, but I won't be requiring it for analysis, so I'll remove it. Also, I'm arranging the rows by the start date set for the advertisement and converting the data types of the date columns (`start_date` and `end_date`) from character to date-time.
```{r}
#| label: tidying the data (13)
polads <- polads %>%
arrange(ymd_hms(polads$start_date))
#sanity check
head(polads)
# converting data types of date columns from character to datetime
polads <- polads %>%
mutate(start_date = ymd_hms(start_date)) %>%
mutate(end_date = ymd_hms(end_date))
# rechecking class of these columns
class(polads$start_date)
class(polads$end_date)
head(polads)
```
Next, I want to create a new column that gives the duration for which the ad was run on Snapchat. I chose to display this information in hours.
```{r}
#| label: tidying the data (14)
polads <- polads %>%
mutate(ad_duration = difftime(end_date,start_date,units= c("hours")))
unique(polads$ad_duration)
```
Missing values show up for those rows without an end date for the advertisement. Plotting the distribution of political advertisement duration:
```{r}
#| label: tidying the data (15)
ggplot(polads, aes(x=as.numeric(ad_duration))) + geom_histogram(binwidth=15) + labs(title = "Distribution of Snapchat Political Ad Duration (2020)",x = "Duration in Hours", y = "Frequency", caption = "Note: This plot does not include ads that did not specify an end date") + theme_minimal()
```
A large proportion of the ads ran for less that 250 hours.
Lastly, I'll be changing the entries `candidate_ballot_information` to either "Biden" or "Trump" to make it more uniform and for ease of analysis. For instance, "Joe Biden for President" will be changed to "Biden".
```{r}
#| label: tidying the data (16)
# changing `candidate_ballot_information` to either "Biden" or "Trump"
polads <- polads %>%
mutate(candidate_ballot_information = case_when(
str_detect(candidate_ballot_information, "Biden") ~ "Biden",
str_detect(candidate_ballot_information, "Trump") ~ "Trump",
TRUE ~ candidate_ballot_information))
# sanity check
polads %>%
filter(str_detect(candidate_ballot_information, "Trump")) %>%
tally() %>%
knitr::kable(col.names = "Number of Trump ads")
polads %>%
filter(str_detect(candidate_ballot_information, "Biden")) %>%
tally() %>%
knitr::kable(col.names = "Number of Biden ads")
```
There are 1251 political advertisements supporting Biden's campaign and 483 political advertisements for Trump's campaign.
## Analyzing and Visualizing the Data
### Ad Expenditure and Impression Analysis
I want to determine whether there's a correlation between two variables I'm interested in: `spend` and `impressions`.
```{r}
#| label: correlation between spend and impressions
polads_cor <- polads %>%
select(spend,impressions) %>%
correlate()
polads_cor_plot <- rplot(polads_cor) + labs(title = "Correlation between Ad Expenditure and Impressions Received\n")
polads_cor_plot
```
This plot shows us that these two variables are moderately correlated with each other.
```{r}
#| label: total ad expenditure and impressions
# total amount spent by both candidates' ads
polads %>%
select(candidate_ballot_information, spend)%>%
group_by(candidate_ballot_information)%>%
summarize(spend_sum = sum(spend)) %>%
knitr::kable(caption = "Total Snapchat Political Ad Expenditure (2020)", col.names = c("Candidate","Total Amount in USD")) %>%
kable_minimal()
```
More funds were allocated to political ads supporting Biden on Snapchat (\$4,367,549) than Trump (\$613,733).
```{r}
# total impressions received by both candidates' ads
polads %>%
select(candidate_ballot_information, impressions)%>%
group_by(candidate_ballot_information)%>%
summarize(impressions_sum = sum(impressions)) %>%
knitr::kable(caption = "Total Snapchat Political Ad Impressions (2020)", col.names = c("Candidate","Total Impressions")) %>%
kable_minimal()
```
Ads supporting Biden received more impressions (804,943,566) than those supporting Trump (378,452,979).
```{r}
#| label: highest expenditure on a single ad by candidate
polads %>%
select(candidate_ballot_information,spend,organization_name,paying_advertiser_name,impressions,start_date,end_date) %>%
group_by(candidate_ballot_information) %>%
slice(which.max(spend)) %>%
knitr::kable(caption = "Highest Singular Ad Expenditure by Candidate", col.names = c("Candidate","Expenditure","Organization Name","Paying Advertiser Name","Impressions","Start Date","End Date")) %>%
kable_minimal()
```
The highest funds allocated for a single political advertisement supporting Biden (and overall) was \$151,724, while the \$33,349 spent by Albbiom Marketing LLC was the most expensive political advertisement for Trump's campaign. The Biden advertisement was displayed almost all day on election day (11/03/2020) as indicated by its start and end date. Even though more funds was spent on the Biden advertisement, Trump's advertisement had more impressions (30,383,613).
```{r}
#| label: highest impressions on a single ad by candidate
polads %>%
select(candidate_ballot_information,spend,organization_name,paying_advertiser_name,impressions,start_date,end_date) %>%
group_by(candidate_ballot_information) %>%
slice(which.max(impressions)) %>%
knitr::kable(caption = "Highest Singular Ad Impressions by Candidate (2020)", col.names = c("Candidate","Spend","Organization Name","Paying Advertiser Name","Impressions","Start Date","End Date")) %>%
kable_minimal()
```
The highest number of impressions received for a singular ad supporting Biden was 17,927,667, while for Trump it was 31,848,256. The higher number of impressions for Trump's ad could be attributed to it not having a set end date.
I'm interested in knowing the relative expenditure and impressions for advertisements by candidate as well. First, I want to extract the month from the `start_date` and `end_date` columns and use it to determine spending over the months.
```{r}
#| label: creating new columns for months
polads <- polads %>%
mutate(start_month = month(start_date,label = TRUE),end_month = month(end_date, label = TRUE))
# sanity check
str(polads$start_month)
str(polads$end_month)
```
I'll need to take a log transformation because the values in the `spend` column are skewed. I'm using a smooth plot to track expenditure and impressions over the months.
```{r}
#| label: political ad expenditure by month
exp_by_month_plot <- polads %>%
ggplot(aes(x=start_date, y=log(spend), group=candidate_ballot_information, color=candidate_ballot_information)) + geom_smooth() + labs(title = "Snapchat Political Ad Expenditure per Month by Candidate (2020)", x = "Month", y = "Expenditure", colour = "Candidate") + scale_color_brewer(palette = "Set2") + theme_minimal()
exp_by_month_plot
```
More funds were spent on political ads supporting Biden's campaign in the months leading up to election day, i.e. July to November. Ads for Trump's campaign received more funds in the first half of the year. It would be worthwhile to compare the impressions of advertisements for both candidates too:
```{r}
#| label: political ad impressions by month
imp_by_month_plot <-polads %>%
ggplot(aes(x=start_date, y=log(impressions), group=candidate_ballot_information, color=candidate_ballot_information)) + geom_smooth() + labs(title = "Political Ad Impressions per Month by Candidate (2020)", x = "Month", y = "Impressions", colour = "Candidate") + scale_color_brewer(palette = "Set2") + theme_minimal()
imp_by_month_plot
```
Advertisements supporting Trump's campaign seem to have reached more people than Biden's advertisements in the first half of the year. However, as noted before, impressions reached for advertisements for Biden's campaign were more prominent in the later months of the year.
I want to know which ads had the longest and shortest duration by candidate, to see whether impressions vary greatly:
```{r}
#| label: ad duration by candidate
polads %>%
select(candidate_ballot_information,organization_name,paying_advertiser_name,start_date,end_date,ad_duration,impressions)%>%
group_by(candidate_ballot_information)%>%
slice(which.max(ad_duration),which.min(ad_duration)) %>%
knitr::kable(caption = "Longest and Shortest Snapchat Political Ads by Candidate (2020)", col.names = c("Candidate","Organization Name","Paying Advertiser Name","Start Date","End Date","Ad Duration","Impressions"))%>%
kable_minimal()
```
The longest duration of an ad supporting Biden was more than 835 hours long and ran till election day. It's interesting that the ad with the shortest duration (29 hours) supporting this candidate received way more impressions than the longer one. This could be because the shorter ad was run on election day. On the other hand, the longest duration for Trump's ads was more than 1860 hours long, also running till the end of election day. The shortest ad (6.6 hours) for this candidate was displayed in June and received lesser impressions too.The paying advertiser's names indicate that these ads were probably issued directly from the respective candidates' campaigns and not by an outside entity (except for the shortest ad supporting Trump).
### Location Targeting Analysis
#### Wrangling with the location columns
The following columns indicate different types of information about the locations targeted by the advertisements: `regions_included`, `regions_excluded`, `electoral_districts_included`, `radius_targeting_included`, `radius_targeting_excluded`, `metros_included`, `metros_excluded`, `postal_codes_included`, `postal_codes_excluded`. Most of these columns do not have enough values to be effectively analyzed, and due to a lack of time, the list column `postal_codes_included` could not be included in my analysis.
I'll be using the `regions_included` and `regions_excluded` columns. They have multiple states in each row which need to be separated into different rows:
```{r}
#| label: tidying `regions_included` and `regions_excluded`
# regions_included
polads <- polads %>%
separate_rows(regions_included, sep = ",")
# sanity check
unique(polads$regions_included)
# `regions_excluded`
polads <- polads %>%
separate_rows(regions_excluded, sep = ",")
# sanity check
unique(polads$regions_excluded)
```
The states of Alaska, Hawaii, and California were excluded from being shown certain political advertisements of the candidates. This could be either due to the stringent laws these states have for reporting campaign contributions and expenditure activities or historic voting patterns ("Campaign Disclosure, Filer Resources, Alaska Public Offices Commission, Department of Administration, State of Alaska", n.d.;"Contribution Limits", n.d.;Electronic Media Advertisements, 2020).
Checking whether information on `organization_name` and `paying_advertiser_name` is available for those advertisements excluding these states:
```{r}
#| label: information on ads excluding certain states
polads %>%
select(organization_name,paying_advertiser_name,spend,candidate_ballot_information,regions_excluded) %>%
filter(str_detect(regions_excluded, 'California|Hawaii|Alaska')) %>%
distinct()
```
All of the ads that excluded these regions supported Donald Trump as a candidate, were by an organization called 'Marud Khan', and were paid for by Albbiom Marketing LLC. According to Markay (2020), Albbiom Marketing LLC is a marketing company without a proper address that provides "free" Trump merchandise and has scammed people in the past. They also found no evidence that 'Marud Khan' was a real person.
#### Creating a data subset for location analysis
Checking the distribution of values in the `spend` and `impressions` columns:
```{r}
#| label: distribution of spend and impressions
ggplot(polads, aes(x=spend)) + geom_histogram() + theme_minimal() + labs(title = "Expenditure Distribution", x = "Expenditure", y = "Frequency")
ggplot(polads, aes(x=impressions)) + geom_histogram() + theme_minimal() + labs(title = "Impressions Distribution", x = "Impressions", y = "Frequency")
```
Clearly, both distributions are skewed to the right and are not symmetric. Hence, I'm taking the median of these columns for analysis. Creating a subset of the data for further analysis:
```{r}
#| label: creating subset for location analysis
polads_loc1 <- polads %>%
select(regions_included,spend,impressions,candidate_ballot_information) %>%
drop_na(regions_included) %>%
group_by(regions_included,candidate_ballot_information)%>%
summarize(spend_median = median(spend),impressions_median = median(impressions)) %>%
rename(state=regions_included)
polads_loc1
```
#### Ad expenditure across states
```{r}
#| label: plotting expenditure across states
loc_spend_plot <- plot_usmap(data = polads_loc1, values = "spend_median", labels = FALSE) + scale_fill_viridis_c(name = "Median Ad Expenditure") +
labs(title = "Snapchat Targeted Political Ad Expenditure Across the States",caption = "Note: This plot excludes ads targetting no states in particular") + theme(legend.position = "right")
loc_spend_plot
```
From this plot, we can observe that ads targeting Pennsylvania and Nebraska had relatively higher median expenditure. Now, let's look at median ad expenditure across states by the candidate they supported.
```{r}
#| label: median ad expenditure across states by candidate
# Biden ads
polads_loc1_biden <- polads %>%
select(regions_included,spend,impressions,candidate_ballot_information) %>%
filter(candidate_ballot_information=="Biden")%>%
drop_na(regions_included) %>%
group_by(regions_included,candidate_ballot_information)%>%
summarize(spend_median = median(spend),impressions_median = median(impressions)) %>%
rename(state=regions_included)
polads_loc1_biden
# plotting expenditure for Biden ads
loc_spend_biden_plot <- plot_usmap(data = polads_loc1_biden, values = "spend_median", labels = FALSE) + scale_fill_viridis_c(name = "Median Ad Expenditure") +
labs(title = "Snapchat Targeted Political Ad Expenditure for Biden Across the States (2020)",caption = "Note: This plot excludes ads targetting no states in particular") + theme(legend.position = "right")
# Trump ads
polads_loc1_trump <- polads %>%
select(regions_included,spend,impressions,candidate_ballot_information) %>%
filter(candidate_ballot_information=="Trump")%>%
drop_na(regions_included) %>%
group_by(regions_included,candidate_ballot_information)%>%
summarize(spend_median = median(spend),impressions_median = median(impressions)) %>%
rename(state=regions_included)
polads_loc1_trump
# plotting expenditure for Trump ads
loc_spend_trump_plot <- plot_usmap(data = polads_loc1_trump, values = "spend_median", labels = FALSE) + scale_fill_viridis_c(name = "Median Ad Expenditure") +
labs(title = "Snapchat Targeted Political Ad Expenditure for Trump Across the States (2020)",caption = "Note: This plot excludes ads targetting no states in particular") + theme(legend.position = "right")
# comparing plots
loc_spend_biden_plot
loc_spend_trump_plot
```
For Biden's ads, the median expenditure was highest in Pennsylvania and Nebraska. For Trump's, it was highest in Texas, Mississippi, and South Carolina. While Trump explicitly targeted all states, Biden's ads were limited to particular states.
Next, I'm looking at how ad expenditure across targeted states changes over the months:
```{r}
#| eval: false
polads_loc_month1 <- polads %>%
select(regions_included,spend,impressions,candidate_ballot_information,start_month) %>%
drop_na(regions_included) %>%
group_by(regions_included,candidate_ballot_information,start_month)%>%
summarize(spend_median = median(spend),impressions_median = median(impressions)) %>%
rename(state=regions_included)
polads_loc_month1
# plotting expenditure
loc_spend_month_plot <- plot_usmap(data = polads_loc_month1, values = "spend_median", labels = FALSE,label_color = "black") + scale_fill_viridis_c(name = "Ad Expenditure Amount by Month") +
labs(title = "Snapchat Targeted Political Ad Expenditure Across the States") + theme(legend.position = "right")
loc_spend_month_plot
# animating change in median expenditure by month
loc_spend_month_transition <- loc_spend_month_plot +
labs(title = "Total Political Ad Expenditure in {as.numeric(frame_time)}") + transition_time(as.numeric(start_month))
loc_spend_anim <- animate(loc_spend_month_transition, fps=10) + ease_aes('linear')
loc_spend_anim
```
::: callout-note
I couldn't get the above block of code to display any output even though it ran perfectly fine on my RStudio.
:::
#### Ad impressions across states
```{r}
#| label: plotting impressions across states
loc_imp_plot <- plot_usmap(data = polads_loc1, values = "impressions_median", labels = FALSE) + scale_fill_viridis_c(name = "Median Ad Impressions") +
labs(title = "Snapchat Targeted Political Ad Impressions Across the States (2020)",caption = "Note: This plot excludes ads targetting no states in particular") + theme(legend.position = "right")
loc_imp_plot
```
Overall, Mississippi and South Carolina had the highest median ad impressions.
Now, looking at the median ad impressions across states by the candidate they supported.
```{r}
#| label: median ad impressions across states by candidate
# plotting impressions for Biden ads
loc_imp_biden_plot <- plot_usmap(data = polads_loc1_biden, values = "impressions_median", labels = FALSE) + scale_fill_viridis_c(name = "Median Ad Impressions") +
labs(title = "Snapchat Targeted Political Ad Impressions for Biden Across the States (2020)",caption = "Note: This plot excludes ads targetting no states in particular") + theme(legend.position = "right")
# plotting impressions for Trump ads
loc_imp_trump_plot <- plot_usmap(data = polads_loc1_trump, values = "impressions_median", labels = FALSE) + scale_fill_viridis_c(name = "Median Ad Impressions") +
labs(title = "Snapchat Targeted Political Ad Impressions for Trump Across the States (2020)",caption = "Note: This plot excludes ads targetting no states in particular") + theme(legend.position = "right")
# comparing plots
loc_imp_biden_plot
loc_imp_trump_plot
```
In a similar trend to median expenditure, Biden's ads got their highest median impressions in Nebraska and Pennsylvania, while median impressions for Trump's ads were highest in Texas, Mississippi, and South Carolina.
Conducting a sanity check with the data:
```{r}
#| label: sanity check
# checking highest median expenditure by candidate
polads_loc1%>%
select(state,candidate_ballot_information,spend_median) %>%
group_by(candidate_ballot_information) %>%
arrange(desc(spend_median)) %>%
slice(1:3) %>%
knitr::kable(caption = "Highest Median Ad Expenditure by State and Candidate (2020)",col.names = c("State","Candidate","Median Expenditure"))%>%
kable_minimal()
# checking highest median impressions by candidate
polads_loc1%>%
select(state,candidate_ballot_information,impressions_median) %>%
group_by(candidate_ballot_information) %>%
arrange(desc(impressions_median)) %>%
slice(1:3) %>%
knitr::kable(caption = "Highest Median Ad Impressions by State and Candidate (2020)",col.names = c("State","Candidate","Median Expenditure"))%>%
kable_minimal()
```
## Reflections
In 2020, the United States made far more use of the Snapchat social media platform for political ads compared to other countries. The above analysis showcased the reach and funding of advertisements supporting the candidates Joe Biden and Donald Trump prior to and during the 2020 presidential election season. The variables that I focused on - ad expenditure, ad impressions, location micro-targeting, and even ad duration - all contribute to forming an effective political advertising strategy. It is important to note that there were more ads that supported Biden in this dataset, which may have skewed the results summarized below.
The data revealed a close correlation between the amount of expenditure on ads and the impressions they received. Ads supporting Biden had more funds allocated to them and also received more impressions. This may have played a part in his election victory. Biden's ads were more frequent in the second half of 2020, while it was the opposite trend for Trump's ads. The timing of the ad also matters. A shorter ad supporting Biden displayed on election day received more impressions that the one running for more than 800 hours from September 2020. From the location visualizations, it seems that candidates were targeting states that were predominantly Democratic or Republican in order to either win them over or maintain their party dominance.
In terms of the data used, I wish I looked at how sparse the data were in columns like `advanced_demographics` and `radius_targeting_included` before beginning because I was really looking forward to using it in my analysis. Another caveat of this data was that since there were multiple regions in a single entry of the `regions_included` column, it became hard to find out the individual ad expenditure and impressions for each state.
Nevertheless, I enjoyed the process of completing this project. Though I can still improve, I learnt a lot about coding in R - from writing tidy code to creating publication-worthy plots. At the same time, I think I got a bit overwhelmed with everything that can be done in R since I kept going down an online rabbit hole of endless packages and techniques. Also, I learnt not to underestimate the importance of the data cleaning process; I spent a lot more time on that than actually analyzing the data.
## Future Directions
Further analysis can be done with this dataset. One could determine the type of entity paying for the ads for both candidates (whether it was funded by their own campaign or an outside organization), and the highest paying advertisers. I wanted to do more with the `ad_duration` column I'd created, but I found working with the difftime object more difficult than expected. Plots showing change in expenditure and impressions by state over time could also be generated. More analysis can be conducted with the postal codes data provided in this dataset to map more specific regions that the ads were targeting. Lastly, future projects could join data on individual state populations and analyze expenditure and impressions in relation to that.
## References
::: {#refs}
California Fair Political Practices Commission. (2020). *Electronic Media Advertisements* \[PDF\]. Retrieved 3 September 2022, from <https://www.fppc.ca.gov/content/dam/fppc/NS-Documents/AgendaDocuments/Task-Force/dttf-2020/march-2020/Legal.pdf>.
*Campaign Disclosure, Filer Resources, Alaska Public Offices Commission, Department of Administration, State of Alaska*. Alaska Department of Administration. Retrieved 3 September 2022, from <https://doa.alaska.gov/apoc/FilerResources/campaignDisclosure.html>.
*Contribution Limits*. Campaign Spending Commission. Retrieved 3 September 2022, from <https://ags.hawaii.gov/campaign/contribution-limits/>.
Grolemund, G., & Wickham, H. (2016). *R for Data Science: Import, Tidy, Transform, Visualize, and Model Data*. O'Reilly Media.
*Lookalike Audiences*. Business Help Center - Snapchat. Retrieved 3 September 2022, from <https://businesshelp.snapchat.com/s/article/create-lookalike-audience?language=en_US>.
Markay, L. (2020). *The Trump-Scam-Industrial-Complex Now Extends to Snapchat. Daily Beast*. Retrieved 2 September 2022, from <https://www.thedailybeast.com/the-trump-scam-industrial-complex-now-extends-to-snapchat?ref=scroll>.
R Core Team. (2020). *R: A language and environment for statistical computing*. R Foundation for Statistical Computing, Vienna, Austria.<https://www.r-project.org>.
Rodriguez, S. (2020). *Snap stock rockets up after surprise earnings beat*. CNBC. Retrieved 3 September 2022, from <https://www.cnbc.com/2020/10/20/snap-earnings-q3-2020.html>.
RStudio Team. (2019). *RStudio: Integrated Development for R*. RStudio, Inc., Boston, MA. <https://www.rstudio.com>.
*Snap Audience Match Terms*. Snap Inc. Retrieved 3 September 2022, from <https://snap.com/en-US/terms/snap-audience-match>.
*Snapchat statistics 2020*. (2020). Smart Insights. Retrieved 4 September 2022, from <https://www.smartinsights.com/social-media-marketing/social-media-strategy/snapchat-statistics/>.
Snap Inc. (n.d.). *PoliticalAds* \[Data set\]. <https://snap.com/en-US/political-ads>.
Tomasi, R. (2021). *Quick guide on social media advertising for campaigns and public institutions*. The European Campaign Playbook. Retrieved 2 September 2022, from <https://www.campaignplaybook.eu/blog_quick_guide_on_social_media_advertising>.
*What is Micro-Targeting & How Does it Affect Advertising* (n.d.). MNI Targeted Media. Retrieved 4 September 2022, from <https://www.mni.com/blog/advertmarket/what-is-micro-targeting-how-does-it-affect-advertising/>.
:::
## Appendix
### Dataframe variable names and descriptions
| **Variable Name** | **Description** |
|------------------------------------|------------------------------------|
| `ADID` | A unique value for each political advertisement. |
| `CreativeURL` | A URL to the advertisement's creative content. |
| `Currency Code` | The currency used by the account creating the advertisement. |
| `Spend` | The amount spent by the advertiser for the ad campaign expressed in local currency. |
| `Impressions` | The number of times the advertisement has been viewed by Snapchat users. |
| `StartDate` | The time at which the advertisement was set to start running on the platform. |
| `EndDate` | The time at which the advertisement was set to stop running on the platform. |
| `OrganizationName` | The organization that is responsible for creating the advertisement. |
| `BillingAddress` | The address of the organization that is responsible for creating the advertisement. |
| `CandidateBallotInformation` | Information on the candidate (for California elections: also the office they are contesting for) or ballot initiative that the advertisement is associated with the advertisement. |
| `PayingAdvertiserName` | The entity that is providing funds for the advertisement. |
| `CommitteeName` | The name of the committee paying for the advertisement. |
| `CommitteeIdentificationNumber` | The identification number of the committee paying for the advertisement. |
| `DisclosureNameOfCommittee` | The disclosure name of the committee paying for the advertisement, as stipulated by California law. |
| `AdvertisingJurisdiction` | The jurisdiction that the advertisement refers to. |
| `Gender` | The genders targeted by the advertisement. If this field is empty, all genders were targeted. |
| `AgeBracket` | The ages targeted by the advertisement. If this field is empty, all ages were targeted. |
| `CountryCode` | The country that the advertisement is targeting. |
| `Regions (Included)` | The region(s) included in the advertisement's targeting criteria (states or provinces). |
| `Regions (Excluded)` | The region(s) excluded in the advertisement's targeting criteria (states or provinces). |
| `Electoral Districts (Included)` | The electoral district(s) included in the advertisement's targeting criteria. |
| `Electoral Districts (Excluded)` | The electoral district(s) excluded in the advertisement's targeting criteria. |
| `Radius Targeting (Included)` | The point-radius circles included in the advertisement's targeting criteria. |
| `Radius Targeting (Excluded)` | The point-radius circles excluded in the advertisement's targeting criteria. |
| `Metros (Included)` | The metro(s) included in the advertisement's targeting criteria. |
| `Metros (Excluded)` | The metro(s) excluded in the advertisement's targeting criteria. |
| `Postal Code (Included)` | The postal code(s) included in the advertisement's targeting criteria. |
| `Postal Code (Excluded)` | The postal code(s) excluded in the advertisement's targeting criteria. |
| `Location Categories (Included)` | The location categories included in the advertisement's targeting criteria. |
| `Location Categories (Excluded)` | The location categories excluded in the advertisement's targeting criteria. |
| `Interests` | The interest audience(s) included in the advertisement's targeting criteria. If this field is empty, then no interest targeting was used. |
| `OsType` | The operating systems included in the advertisement's targeting criteria. If this field is empty, then all operating systems were targeted. |
| `Segments` | The segments included in the advertisement's targeting criteria. This is advertiser-specific data used such as Snap Audience Match[^1] or Lookalike audiences[^2] |
| `Language` | The languages targeted by the advertisement. If this field is empty, then no language-based targeting was used. |
| `AdvancedDemographics` | The third-party data segments targeted by the advertisement. If this field is empty, then no third-party data segments were used. |
| `Targeting Connection Type` | The internet connection type targeted by the advertisement. If this field is empty, then no targeting based on internet connect type was used. |
| `Targeting Carrier (ISP)` | The carrier type targeted by the advertisement. If this field is empty, all carrier types are targeted. |
| `CreativeProperties` | The URL specified in advertisement's call to action. |
[^1]: Snap Audience Match or Customer List Audience is a Snapchat feature that allows users to send their data to the platform and its affiliates to form custom audiences ("Snapchat Audience Match Terms", n.d.).
[^2]: A Lookalike audience reaches Snapchat users that have similar characteristics to an organization account's existing customers. There are three different options: Similarity (a small-size audience that closely resembles the seed audience), Balance (a medium-size audience that balances similarity and reach), and Reach (a large-size audience that broadly resembles the seed audience) ("Lookalike Audiences", n.d.).