FinalProject_AnanyaPujary.qmd

---
title: "Final Project"
author: "Ananya Pujary"
description: "Analyzing Snapchat Political Ads in the US in 2020"
date: "09/04/2022"  
format:
  html:
    toc: true
    code-fold: true
    code-copy: true
    code-tools: true
categories:
  - final-project
  - snapchat-political-ads
  - ggplot
  - dplyr
  - stringr
  - lubridate
  - janitor
---

## Loading the Packages

```{r}
#| label: setup
#| warning: false

library(tidyverse)
library(googlesheets4)
library(skimr)
library(dplyr)
library(stringr)
library(lubridate)
library(purrr)
if(!require(corrr))
  install.packages("corrr",repos = "https://cran.us.r-project.org")
if(!require(janitor))
  install.packages("janitor",repos = "https://cran.us.r-project.org")
if(!require(usmap))
  install.packages("usmap",repos = "https://cran.us.r-project.org")
if(!require(viridis))
  install.packages("viridis",repos = "https://cran.us.r-project.org")
if(!require(transformr))
  install.packages("transformr",repos = "https://cran.us.r-project.org")
if(!require(patchwork))
  install.packages("patchwork",repos = "https://cran.us.r-project.org")
if(!require(kableExtra))
  install.packages("kableExtra",repos = "https://cran.us.r-project.org")

knitr::opts_chunk$set(echo = TRUE)
```

## Introduction

Every election season, millions of dollars are spent on political advertisements that help candidates reach a wider audience of potential voters and influence the voting process (Nott, 2020). Political advertisements can be defined as those that "describe a political leader, organization, or party, a public office candidate, or an election/referendum" (Tomasi, 2021). These advertisements can also be created by entities other than the candidates themselves.

Now, with the proliferation of social media in almost every aspect of our lives, they are also playing their part in influencing the political process. Unlike traditional media like newspapers and television, social media platforms are not liable for what is displayed on them and can set their own content regulations (Nott, 2020). Political advertisements on social media are becoming popular because they allow for a 'micro-targeting' of demographics and allow candidates to understand and reach the masses better, in turn increasing voter engagement (Nott, 2020). Micro-targeting refers to a marketing strategy that employs consumer demographics and data to generate audience segments ("What is Micro-Targeting & How Does it Affect Advertising", n.d.).

While Facebook and Google have long been the dominating players in digital political advertising, Snapchat is becoming increasingly popular. In 2020, Snapchat had around 249 million active users on its platform, most of them in the age range of 13-29 (Rodriguez, 2020; "Snapchat statistics 2020", 2020). Snapchat has made data about the political ads shown on their app public, so this project will use their data for the year 2020 (Snap Inc., n.d.). In particular, I'll be looking at advertisements shown in the United States for the candidates Joe Biden and Donald Trump. I chose these candidates because the presidential elections were held this year (November 3rd, 2020) and they closely contested against each other. Using this dataset, I plan on looking at relative advertisement expenditure, impressions, and location micro-targeting, and explore the following broad questions:

::: {.callout-note appearance="simple"}
## Project Questions

-   Is there a relationship between political ad expenditure and impressions?

-   Which candidate's advertisements received more impressions?

-   How much was spent on these advertisements and which candidate spent more on average?

-   Which states were targeted by these advertisements, and which states did each candidate target more?
:::

\

## Reading in the Data

```{r}
#| label: reading in the data

polads_orig <- read_sheet('https://docs.google.com/spreadsheets/d/1S7jF0D2o8aC3gGndORVrksuSsvMMZwqVdKLmu4SYqUc/edit?usp=sharing')
```

Looking at the dataset's various characteristics:

```{r}
#| label: describing the data (1)

skim(polads_orig)

print(summarytools::dfSummary(polads_orig, varnumbers = FALSE, plain.ascii = FALSE, graph.magnif = 0.50, style = "grid", valid.col = FALSE), 
      method = 'render', table.classes = 'table-condensed')

```

This file contains the information for political ads that are/have been displayed on Snapchat's platform, such as the amount spent on them, the organization and advertisers behind them, the candidates/causes the ads support, demographic and location-based ad targeting, and so on. It has 12705 rows and 38 columns. There are 28 character-type, 1 list-type, 7 logical-type, and 2 numeric-type columns.

## Tidying the Data

Removing columns that only have missing values:

```{r}
#| label: tidying the data (1)

polads <- polads_orig %>% 
  remove_empty()

```

Snake case is typically recommended by tidyverse's style guide for column names and object names. However the column names in this dataset are written either in title case (e.g. `Currency Code`) or camel case (e.g. `OrganizationName`). Some of them also contain special characters like brackets which could interfere with implementing R functions.

::: callout-note
Snake case refers to the writing style that replaces spaces between words with an underscore (\_) and all of the letters in a word are lowercase. On the other hand, title case is the writing style in which the first letter of each word is capitalized and there are spaces between each word. A third type is the camel case, wherein phrases are written out without punctuation or spaces, and words are usually distinguished with the second word's first letter capitalized.
:::

Hence, using the `clean_names()` function from the 'janitor' package to convert all the column names accordingly.

```{r}
#| label: tidying the data (2)

polads <-clean_names(polads)

colnames(polads)
  
```

### Narrowing down the data

Let's look at the distribution of countries receiving political advertisements on Snapchat for this year.

```{r}
#| label: tidying the data (3)
  
table(polads$country_code) %>%
  knitr::kable(caption = "Countries Receiving Snapchat Political Ads (2020)",col.names = c("Country","Frequency")) %>%
  kable_minimal()

```

Most of the political advertisements were delivered to places in the United States (11124). Only keeping rows that describe ads targeting the United States:

```{r}
#| label: tidying the data (4)

polads <- filter(polads,country_code =="united states")
polads
```

Verifying whether all ads targeted for the United States were paid for in USD:

```{r}
#| label: tidying the data (5)

table(select(polads,currency_code)) %>%
  knitr::kable(caption = "Currency Code Frequency (2020)",col.names = c("Currency Code","Frequency")) %>%
  kable_minimal()

```

Three ads were paid for in Canadian Dollars (CAD). Finding out more about these three rows:

```{r}
#| label: tidying the data (6)

polads %>%
  filter(currency_code=="CAD")

```

Two ads were paid for by the University of British Columbia and were targeted at people aged 16-25 in the following areas: San Francisco, Oakland, and San Jose. The third was paid for by Point Digital Creative Studio. None of them provided information about the candidate associated with the ad.

Since we require information about the candidate in order to analyze relative ad spending, targeting and impressions, tidying up the `candidate_ballot_information` column by removing missing values:

```{r}
#| label: tidying the data (7)

sum(is.na(polads$candidate_ballot_information))

# removing rows with missing values
polads <- drop_na(polads,candidate_ballot_information)
polads

```

There were 5312 rows missing candidate information. Next, I'm only including those rows that explicitly states the candidate name (containing the words "Biden" and "Trump").

```{r}
#| label: tidying the data (8)

polads <- filter(polads, str_detect(candidate_ballot_information, 'Biden|Trump'))

# sanity check
table(select(polads,candidate_ballot_information)) %>%
  knitr::kable(caption = "Frequency of Candidates", col.names = c("Candidate","Frequency"))

```

There are two entries that contain the string "Trump" but are in fact campaigning against him ("Against Trump", "Operation Dump Trump", "Titere de Trump"). There's also an entry called "Biden vs Trump" which doesn't clearly indicate which party the ad will be supporting. Removing these rows so that they don't skew the results:

```{r}
#| label: tidying the data (9)

polads <- polads %>%
  filter(!(candidate_ballot_information=="Against Trump"| candidate_ballot_information=="Operation Dump Trump"| candidate_ballot_information=="Biden vs Trump"| candidate_ballot_information=="Titere de Trump"))

```

Since missing values in certain columns indicate that either all or none of the categories in the column were targeted, I'm changing their missing values accordingly for easy analysis.

```{r}
#| label: tidying the data (10)
          
polads <- polads %>%
  replace_na(list(gender = "ALL",os_type = "ALL",language = "none",advanced_demographics = "None",targeting_connection_type = "None",targeting_carrier_isp = "ALL"))

# sanity check
table(select(polads,gender))
table(select(polads,os_type))
table(select(polads,language))
table(select(polads,advanced_demographics))
table(select(polads,targeting_connection_type))
table(select(polads,targeting_carrier_isp))

```

### The case of `age_bracket` and `advanced_demographics`

The `age_bracket` column's values are as follows:

```{r}
#| label: tidying the data (11)

table(select(polads,age_bracket)) %>%
    knitr::kable(caption = "Age Targeting by Snapchat Political Ads (2020)",col.names = c("Ages","Frequency")) %>%
  kable_minimal()
```

Clearly, the column's values overlap and tend to refer to similar age groups, for instance, 18-20, 18-24, and 18+.

As for `advanced_demographics`:

```{r}
#| label: tidying the data (12)

table(select(polads,advanced_demographics))%>%
  knitr::kable(caption = "Advanced Demographics Targeting by Snapchat Political Ads (2020)",col.names = c("Advanced Demographics","Frequency")) %>%
  kable_minimal()

```

Clearly, very few ads provided additional demographic information for ad targetting and the data aren't uniform (i.e. there are details on people's household incomes, occupations, languages spoken, educational levels, number of children etc.), so I wouldn't be able to effectively analyze it in relation to other columns. Though I was looking forward to analyzing these columns, the data they had were too sparse to work with.

### Wrangling with the date columns

The "Z" at the end of the date-timestamp indicates that the timezone chosen is UTC, but I won't be requiring it for analysis, so I'll remove it. Also, I'm arranging the rows by the start date set for the advertisement and converting the data types of the date columns (`start_date` and `end_date`) from character to date-time.

```{r}
#| label: tidying the data (13)

polads <- polads %>%
  arrange(ymd_hms(polads$start_date))

#sanity check

head(polads)

# converting data types of date columns from character to datetime

polads <- polads %>%
  mutate(start_date = ymd_hms(start_date)) %>%
  mutate(end_date = ymd_hms(end_date))

# rechecking class of these columns
class(polads$start_date)
class(polads$end_date)

head(polads)

```

Next, I want to create a new column that gives the duration for which the ad was run on Snapchat. I chose to display this information in hours.

```{r}
#| label: tidying the data (14)

polads <- polads %>%
  mutate(ad_duration = difftime(end_date,start_date,units= c("hours")))

unique(polads$ad_duration)

```

Missing values show up for those rows without an end date for the advertisement. Plotting the distribution of political advertisement duration:

```{r}
#| label: tidying the data (15)

ggplot(polads, aes(x=as.numeric(ad_duration))) + geom_histogram(binwidth=15) + labs(title = "Distribution of Snapchat Political Ad Duration (2020)",x = "Duration in Hours", y = "Frequency", caption = "Note: This plot does not include ads that did not specify an end date") + theme_minimal()

```

A large proportion of the ads ran for less that 250 hours.

Lastly, I'll be changing the entries `candidate_ballot_information` to either "Biden" or "Trump" to make it more uniform and for ease of analysis. For instance, "Joe Biden for President" will be changed to "Biden".

```{r}
#| label: tidying the data (16)

# changing `candidate_ballot_information` to either "Biden" or "Trump"
polads <- polads %>%
  mutate(candidate_ballot_information = case_when(
    str_detect(candidate_ballot_information, "Biden") ~ "Biden",
    str_detect(candidate_ballot_information, "Trump")  ~ "Trump",
    TRUE ~ candidate_ballot_information))

# sanity check
polads %>%
  filter(str_detect(candidate_ballot_information, "Trump")) %>%
  tally() %>%
  knitr::kable(col.names = "Number of Trump ads")

polads %>%
  filter(str_detect(candidate_ballot_information, "Biden")) %>%
  tally() %>%
  knitr::kable(col.names = "Number of Biden ads")

```

There are 1251 political advertisements supporting Biden's campaign and 483 political advertisements for Trump's campaign.

## Analyzing and Visualizing the Data

### Ad Expenditure and Impression Analysis

I want to determine whether there's a correlation between two variables I'm interested in: `spend` and `impressions`.

```{r}
#| label: correlation between spend and impressions

polads_cor <- polads %>% 
  select(spend,impressions) %>% 
  correlate()

polads_cor_plot <- rplot(polads_cor) + labs(title = "Correlation between Ad Expenditure and Impressions Received\n")
polads_cor_plot

```

This plot shows us that these two variables are moderately correlated with each other.

```{r}
#| label: total ad expenditure and impressions

# total amount spent by both candidates' ads
polads %>%
  select(candidate_ballot_information, spend)%>%
  group_by(candidate_ballot_information)%>%
  summarize(spend_sum = sum(spend)) %>%
  knitr::kable(caption = "Total Snapchat Political Ad Expenditure (2020)", col.names = c("Candidate","Total Amount in USD")) %>%
  kable_minimal()
```

More funds were allocated to political ads supporting Biden on Snapchat (\$4,367,549) than Trump (\$613,733).

```{r}
# total impressions received by both candidates' ads
polads %>%
  select(candidate_ballot_information, impressions)%>%
  group_by(candidate_ballot_information)%>%
  summarize(impressions_sum = sum(impressions)) %>%
  knitr::kable(caption = "Total Snapchat Political Ad Impressions (2020)", col.names = c("Candidate","Total Impressions")) %>%
  kable_minimal()

```

Ads supporting Biden received more impressions (804,943,566) than those supporting Trump (378,452,979).

```{r}
#| label: highest expenditure on a single ad by candidate

polads %>%
  select(candidate_ballot_information,spend,organization_name,paying_advertiser_name,impressions,start_date,end_date) %>%
  group_by(candidate_ballot_information) %>% 
  slice(which.max(spend)) %>%
  knitr::kable(caption = "Highest Singular Ad Expenditure by Candidate", col.names = c("Candidate","Expenditure","Organization Name","Paying Advertiser Name","Impressions","Start Date","End Date")) %>%
  kable_minimal()

```

The highest funds allocated for a single political advertisement supporting Biden (and overall) was \$151,724, while the \$33,349 spent by Albbiom Marketing LLC was the most expensive political advertisement for Trump's campaign. The Biden advertisement was displayed almost all day on election day (11/03/2020) as indicated by its start and end date. Even though more funds was spent on the Biden advertisement, Trump's advertisement had more impressions (30,383,613).

```{r}
#| label: highest impressions on a single ad by candidate

polads %>%
  select(candidate_ballot_information,spend,organization_name,paying_advertiser_name,impressions,start_date,end_date) %>%
  group_by(candidate_ballot_information) %>% 
  slice(which.max(impressions)) %>%
  knitr::kable(caption = "Highest Singular Ad Impressions by Candidate (2020)", col.names = c("Candidate","Spend","Organization Name","Paying Advertiser Name","Impressions","Start Date","End Date")) %>%
  kable_minimal()

```

The highest number of impressions received for a singular ad supporting Biden was 17,927,667, while for Trump it was 31,848,256. The higher number of impressions for Trump's ad could be attributed to it not having a set end date.

I'm interested in knowing the relative expenditure and impressions for advertisements by candidate as well. First, I want to extract the month from the `start_date` and `end_date` columns and use it to determine spending over the months.

```{r}
#| label: creating new columns for months

polads <- polads %>%
  mutate(start_month = month(start_date,label = TRUE),end_month = month(end_date, label = TRUE)) 

# sanity check
str(polads$start_month)
str(polads$end_month)
```

I'll need to take a log transformation because the values in the `spend` column are skewed. I'm using a smooth plot to track expenditure and impressions over the months.

```{r}
#| label: political ad expenditure by month

exp_by_month_plot <- polads %>%
  ggplot(aes(x=start_date, y=log(spend), group=candidate_ballot_information, color=candidate_ballot_information)) + geom_smooth() + labs(title = "Snapchat Political Ad Expenditure per Month by Candidate (2020)", x = "Month", y = "Expenditure", colour = "Candidate") + scale_color_brewer(palette = "Set2") + theme_minimal()

exp_by_month_plot

```

More funds were spent on political ads supporting Biden's campaign in the months leading up to election day, i.e. July to November. Ads for Trump's campaign received more funds in the first half of the year. It would be worthwhile to compare the impressions of advertisements for both candidates too:

```{r}
#| label: political ad impressions by month

imp_by_month_plot <-polads %>%
  ggplot(aes(x=start_date, y=log(impressions), group=candidate_ballot_information, color=candidate_ballot_information)) + geom_smooth() + labs(title = "Political Ad Impressions per Month by Candidate (2020)", x = "Month", y = "Impressions", colour = "Candidate") + scale_color_brewer(palette = "Set2") + theme_minimal()

imp_by_month_plot

```

Advertisements supporting Trump's campaign seem to have reached more people than Biden's advertisements in the first half of the year. However, as noted before, impressions reached for advertisements for Biden's campaign were more prominent in the later months of the year.

I want to know which ads had the longest and shortest duration by candidate, to see whether impressions vary greatly:

```{r}
#| label: ad duration by candidate

polads %>%
  select(candidate_ballot_information,organization_name,paying_advertiser_name,start_date,end_date,ad_duration,impressions)%>%
  group_by(candidate_ballot_information)%>%
  slice(which.max(ad_duration),which.min(ad_duration)) %>%
  knitr::kable(caption = "Longest and Shortest Snapchat Political Ads by Candidate (2020)", col.names = c("Candidate","Organization Name","Paying Advertiser Name","Start Date","End Date","Ad Duration","Impressions"))%>%
  kable_minimal()

```

The longest duration of an ad supporting Biden was more than 835 hours long and ran till election day. It's interesting that the ad with the shortest duration (29 hours) supporting this candidate received way more impressions than the longer one. This could be because the shorter ad was run on election day. On the other hand, the longest duration for Trump's ads was more than 1860 hours long, also running till the end of election day. The shortest ad (6.6 hours) for this candidate was displayed in June and received lesser impressions too.The paying advertiser's names indicate that these ads were probably issued directly from the respective candidates' campaigns and not by an outside entity (except for the shortest ad supporting Trump).

### Location Targeting Analysis

#### Wrangling with the location columns

The following columns indicate different types of information about the locations targeted by the advertisements: `regions_included`, `regions_excluded`, `electoral_districts_included`, `radius_targeting_included`, `radius_targeting_excluded`, `metros_included`, `metros_excluded`, `postal_codes_included`, `postal_codes_excluded`. Most of these columns do not have enough values to be effectively analyzed, and due to a lack of time, the list column `postal_codes_included` could not be included in my analysis.

I'll be using the `regions_included` and `regions_excluded` columns. They have multiple states in each row which need to be separated into different rows:

```{r}
#| label: tidying `regions_included` and `regions_excluded`

# regions_included
polads <- polads %>%
  separate_rows(regions_included, sep = ",")

# sanity check
unique(polads$regions_included)

# `regions_excluded`
polads <- polads %>%
  separate_rows(regions_excluded, sep = ",")

# sanity check
unique(polads$regions_excluded)

```

The states of Alaska, Hawaii, and California were excluded from being shown certain political advertisements of the candidates. This could be either due to the stringent laws these states have for reporting campaign contributions and expenditure activities or historic voting patterns ("Campaign Disclosure, Filer Resources, Alaska Public Offices Commission, Department of Administration, State of Alaska", n.d.;"Contribution Limits", n.d.;Electronic Media Advertisements, 2020).

Checking whether information on `organization_name` and `paying_advertiser_name` is available for those advertisements excluding these states:

```{r}
#| label: information on ads excluding certain states

polads %>%
  select(organization_name,paying_advertiser_name,spend,candidate_ballot_information,regions_excluded) %>%
  filter(str_detect(regions_excluded, 'California|Hawaii|Alaska')) %>%
  distinct()

```

All of the ads that excluded these regions supported Donald Trump as a candidate, were by an organization called 'Marud Khan', and were paid for by Albbiom Marketing LLC. According to Markay (2020), Albbiom Marketing LLC is a marketing company without a proper address that provides "free" Trump merchandise and has scammed people in the past. They also found no evidence that 'Marud Khan' was a real person.

#### Creating a data subset for location analysis

Checking the distribution of values in the `spend` and `impressions` columns:

```{r}

#| label: distribution of spend and impressions

ggplot(polads, aes(x=spend)) + geom_histogram() + theme_minimal() + labs(title = "Expenditure Distribution", x = "Expenditure", y = "Frequency")

ggplot(polads, aes(x=impressions)) + geom_histogram() + theme_minimal() + labs(title = "Impressions Distribution", x = "Impressions", y = "Frequency")


```

Clearly, both distributions are skewed to the right and are not symmetric. Hence, I'm taking the median of these columns for analysis. Creating a subset of the data for further analysis:

```{r}
#| label: creating subset for location analysis

polads_loc1 <- polads %>%
  select(regions_included,spend,impressions,candidate_ballot_information) %>%
  drop_na(regions_included) %>%
  group_by(regions_included,candidate_ballot_information)%>%
  summarize(spend_median = median(spend),impressions_median = median(impressions)) %>%
  rename(state=regions_included)

polads_loc1
```

#### Ad expenditure across states

```{r}
#| label: plotting expenditure across states

loc_spend_plot <- plot_usmap(data = polads_loc1, values = "spend_median", labels = FALSE) + scale_fill_viridis_c(name = "Median Ad Expenditure") +
labs(title = "Snapchat Targeted Political Ad Expenditure Across the States",caption = "Note: This plot excludes ads targetting no states in particular") + theme(legend.position = "right")

loc_spend_plot
```

From this plot, we can observe that ads targeting Pennsylvania and Nebraska had relatively higher median expenditure. Now, let's look at median ad expenditure across states by the candidate they supported.

```{r}
#| label: median ad expenditure across states by candidate

# Biden ads
polads_loc1_biden <- polads %>%
  select(regions_included,spend,impressions,candidate_ballot_information) %>%
  filter(candidate_ballot_information=="Biden")%>%
  drop_na(regions_included) %>%
  group_by(regions_included,candidate_ballot_information)%>%
  summarize(spend_median = median(spend),impressions_median = median(impressions)) %>%
  rename(state=regions_included)

polads_loc1_biden

# plotting expenditure for Biden ads
loc_spend_biden_plot <- plot_usmap(data = polads_loc1_biden, values = "spend_median", labels = FALSE) + scale_fill_viridis_c(name = "Median Ad Expenditure") +
labs(title = "Snapchat Targeted Political Ad Expenditure for Biden Across the States (2020)",caption = "Note: This plot excludes ads targetting no states in particular") + theme(legend.position = "right")

# Trump ads
polads_loc1_trump <- polads %>%
  select(regions_included,spend,impressions,candidate_ballot_information) %>%
  filter(candidate_ballot_information=="Trump")%>%
  drop_na(regions_included) %>%
  group_by(regions_included,candidate_ballot_information)%>%
  summarize(spend_median = median(spend),impressions_median = median(impressions)) %>%
  rename(state=regions_included)

polads_loc1_trump

# plotting expenditure for Trump ads
loc_spend_trump_plot <- plot_usmap(data = polads_loc1_trump, values = "spend_median", labels = FALSE) + scale_fill_viridis_c(name = "Median Ad Expenditure") +
labs(title = "Snapchat Targeted Political Ad Expenditure for Trump Across the States (2020)",caption = "Note: This plot excludes ads targetting no states in particular") + theme(legend.position = "right")

# comparing plots
loc_spend_biden_plot
loc_spend_trump_plot

```

For Biden's ads, the median expenditure was highest in Pennsylvania and Nebraska. For Trump's, it was highest in Texas, Mississippi, and South Carolina. While Trump explicitly targeted all states, Biden's ads were limited to particular states.

Next, I'm looking at how ad expenditure across targeted states changes over the months:

```{r}
#| eval: false

polads_loc_month1 <- polads %>%
  select(regions_included,spend,impressions,candidate_ballot_information,start_month) %>%
  drop_na(regions_included) %>%
  group_by(regions_included,candidate_ballot_information,start_month)%>%
  summarize(spend_median = median(spend),impressions_median = median(impressions)) %>%
  rename(state=regions_included)

polads_loc_month1

# plotting expenditure
loc_spend_month_plot <- plot_usmap(data = polads_loc_month1, values = "spend_median", labels = FALSE,label_color = "black") + scale_fill_viridis_c(name = "Ad Expenditure Amount by Month") +
labs(title = "Snapchat Targeted Political Ad Expenditure Across the States") + theme(legend.position = "right")

loc_spend_month_plot

# animating change in median expenditure by month
loc_spend_month_transition <- loc_spend_month_plot +
  labs(title = "Total Political Ad Expenditure in {as.numeric(frame_time)}") + transition_time(as.numeric(start_month))

loc_spend_anim <- animate(loc_spend_month_transition, fps=10) + ease_aes('linear')
loc_spend_anim

```

::: callout-note
I couldn't get the above block of code to display any output even though it ran perfectly fine on my RStudio.
:::

#### Ad impressions across states

```{r}
#| label: plotting impressions across states

loc_imp_plot <- plot_usmap(data = polads_loc1, values = "impressions_median", labels = FALSE) + scale_fill_viridis_c(name = "Median Ad Impressions") +
labs(title = "Snapchat Targeted Political Ad Impressions Across the States (2020)",caption = "Note: This plot excludes ads targetting no states in particular") + theme(legend.position = "right")

loc_imp_plot
```

Overall, Mississippi and South Carolina had the highest median ad impressions.

Now, looking at the median ad impressions across states by the candidate they supported.

```{r}
#| label: median ad impressions across states by candidate

# plotting impressions for Biden ads
loc_imp_biden_plot <- plot_usmap(data = polads_loc1_biden, values = "impressions_median", labels = FALSE) + scale_fill_viridis_c(name = "Median Ad Impressions") +
labs(title = "Snapchat Targeted Political Ad Impressions for Biden Across the States (2020)",caption = "Note: This plot excludes ads targetting no states in particular") + theme(legend.position = "right")

# plotting impressions for Trump ads
loc_imp_trump_plot <- plot_usmap(data = polads_loc1_trump, values = "impressions_median", labels = FALSE) + scale_fill_viridis_c(name = "Median Ad Impressions") +
labs(title = "Snapchat Targeted Political Ad Impressions for Trump Across the States (2020)",caption = "Note: This plot excludes ads targetting no states in particular") + theme(legend.position = "right")

# comparing plots
loc_imp_biden_plot
loc_imp_trump_plot

```

In a similar trend to median expenditure, Biden's ads got their highest median impressions in Nebraska and Pennsylvania, while median impressions for Trump's ads were highest in Texas, Mississippi, and South Carolina.

Conducting a sanity check with the data:

```{r}
#| label: sanity check

# checking highest median expenditure by candidate
polads_loc1%>%
  select(state,candidate_ballot_information,spend_median) %>%
  group_by(candidate_ballot_information) %>%
  arrange(desc(spend_median)) %>%
  slice(1:3) %>%
  knitr::kable(caption = "Highest Median Ad Expenditure by State and Candidate (2020)",col.names = c("State","Candidate","Median Expenditure"))%>%
    kable_minimal()

# checking highest median impressions by candidate
polads_loc1%>%
  select(state,candidate_ballot_information,impressions_median) %>%
  group_by(candidate_ballot_information) %>%
  arrange(desc(impressions_median)) %>%
  slice(1:3) %>%
    knitr::kable(caption = "Highest Median Ad Impressions by State and Candidate (2020)",col.names = c("State","Candidate","Median Expenditure"))%>%
    kable_minimal()

```

## Reflections

In 2020, the United States made far more use of the Snapchat social media platform for political ads compared to other countries. The above analysis showcased the reach and funding of advertisements supporting the candidates Joe Biden and Donald Trump prior to and during the 2020 presidential election season. The variables that I focused on - ad expenditure, ad impressions, location micro-targeting, and even ad duration - all contribute to forming an effective political advertising strategy. It is important to note that there were more ads that supported Biden in this dataset, which may have skewed the results summarized below.

The data revealed a close correlation between the amount of expenditure on ads and the impressions they received. Ads supporting Biden had more funds allocated to them and also received more impressions. This may have played a part in his election victory. Biden's ads were more frequent in the second half of 2020, while it was the opposite trend for Trump's ads. The timing of the ad also matters. A shorter ad supporting Biden displayed on election day received more impressions that the one running for more than 800 hours from September 2020. From the location visualizations, it seems that candidates were targeting states that were predominantly Democratic or Republican in order to either win them over or maintain their party dominance.

In terms of the data used, I wish I looked at how sparse the data were in columns like `advanced_demographics` and `radius_targeting_included` before beginning because I was really looking forward to using it in my analysis. Another caveat of this data was that since there were multiple regions in a single entry of the `regions_included` column, it became hard to find out the individual ad expenditure and impressions for each state.

Nevertheless, I enjoyed the process of completing this project. Though I can still improve, I learnt a lot about coding in R - from writing tidy code to creating publication-worthy plots. At the same time, I think I got a bit overwhelmed with everything that can be done in R since I kept going down an online rabbit hole of endless packages and techniques. Also, I learnt not to underestimate the importance of the data cleaning process; I spent a lot more time on that than actually analyzing the data.

## Future Directions

Further analysis can be done with this dataset. One could determine the type of entity paying for the ads for both candidates (whether it was funded by their own campaign or an outside organization), and the highest paying advertisers. I wanted to do more with the `ad_duration` column I'd created, but I found working with the difftime object more difficult than expected. Plots showing change in expenditure and impressions by state over time could also be generated. More analysis can be conducted with the postal codes data provided in this dataset to map more specific regions that the ads were targeting. Lastly, future projects could join data on individual state populations and analyze expenditure and impressions in relation to that.

## References

::: {#refs}
California Fair Political Practices Commission. (2020). *Electronic Media Advertisements* \[PDF\]. Retrieved 3 September 2022, from <https://www.fppc.ca.gov/content/dam/fppc/NS-Documents/AgendaDocuments/Task-Force/dttf-2020/march-2020/Legal.pdf>.

*Campaign Disclosure, Filer Resources, Alaska Public Offices Commission, Department of Administration, State of Alaska*. Alaska Department of Administration. Retrieved 3 September 2022, from <https://doa.alaska.gov/apoc/FilerResources/campaignDisclosure.html>.

*Contribution Limits*. Campaign Spending Commission. Retrieved 3 September 2022, from <https://ags.hawaii.gov/campaign/contribution-limits/>.

Grolemund, G., & Wickham, H. (2016). *R for Data Science: Import, Tidy, Transform, Visualize, and Model Data*. O'Reilly Media.

*Lookalike Audiences*. Business Help Center - Snapchat. Retrieved 3 September 2022, from <https://businesshelp.snapchat.com/s/article/create-lookalike-audience?language=en_US>.

Markay, L. (2020). *The Trump-Scam-Industrial-Complex Now Extends to Snapchat. Daily Beast*. Retrieved 2 September 2022, from <https://www.thedailybeast.com/the-trump-scam-industrial-complex-now-extends-to-snapchat?ref=scroll>.

R Core Team. (2020). *R: A language and environment for statistical computing*. R Foundation for Statistical Computing, Vienna, Austria.<https://www.r-project.org>.

Rodriguez, S. (2020). *Snap stock rockets up after surprise earnings beat*. CNBC. Retrieved 3 September 2022, from <https://www.cnbc.com/2020/10/20/snap-earnings-q3-2020.html>.

RStudio Team. (2019). *RStudio: Integrated Development for R*. RStudio, Inc., Boston, MA. <https://www.rstudio.com>.

*Snap Audience Match Terms*. Snap Inc. Retrieved 3 September 2022, from <https://snap.com/en-US/terms/snap-audience-match>.

*Snapchat statistics 2020*. (2020). Smart Insights. Retrieved 4 September 2022, from <https://www.smartinsights.com/social-media-marketing/social-media-strategy/snapchat-statistics/>.

Snap Inc. (n.d.). *PoliticalAds* \[Data set\]. <https://snap.com/en-US/political-ads>.

Tomasi, R. (2021). *Quick guide on social media advertising for campaigns and public institutions*. The European Campaign Playbook. Retrieved 2 September 2022, from <https://www.campaignplaybook.eu/blog_quick_guide_on_social_media_advertising>.

*What is Micro-Targeting & How Does it Affect Advertising* (n.d.). MNI Targeted Media. Retrieved 4 September 2022, from <https://www.mni.com/blog/advertmarket/what-is-micro-targeting-how-does-it-affect-advertising/>.
:::

## Appendix

### Dataframe variable names and descriptions

| **Variable Name**                | **Description**                                                                                                                                                                    |
|------------------------------------|------------------------------------|
| `ADID`                           | A unique value for each political advertisement.                                                                                                                                   |
| `CreativeURL`                    | A URL to the advertisement's creative content.                                                                                                                                     |
| `Currency Code`                  | The currency used by the account creating the advertisement.                                                                                                                       |
| `Spend`                          | The amount spent by the advertiser for the ad campaign expressed in local currency.                                                                                                |
| `Impressions`                    | The number of times the advertisement has been viewed by Snapchat users.                                                                                                           |
| `StartDate`                      | The time at which the advertisement was set to start running on the platform.                                                                                                      |
| `EndDate`                        | The time at which the advertisement was set to stop running on the platform.                                                                                                       |
| `OrganizationName`               | The organization that is responsible for creating the advertisement.                                                                                                               |
| `BillingAddress`                 | The address of the organization that is responsible for creating the advertisement.                                                                                                |
| `CandidateBallotInformation`     | Information on the candidate (for California elections: also the office they are contesting for) or ballot initiative that the advertisement is associated with the advertisement. |
| `PayingAdvertiserName`           | The entity that is providing funds for the advertisement.                                                                                                                          |
| `CommitteeName`                  | The name of the committee paying for the advertisement.                                                                                                                            |
| `CommitteeIdentificationNumber`  | The identification number of the committee paying for the advertisement.                                                                                                           |
| `DisclosureNameOfCommittee`      | The disclosure name of the committee paying for the advertisement, as stipulated by California law.                                                                                |
| `AdvertisingJurisdiction`        | The jurisdiction that the advertisement refers to.                                                                                                                                 |
| `Gender`                         | The genders targeted by the advertisement. If this field is empty, all genders were targeted.                                                                                      |
| `AgeBracket`                     | The ages targeted by the advertisement. If this field is empty, all ages were targeted.                                                                                            |
| `CountryCode`                    | The country that the advertisement is targeting.                                                                                                                                   |
| `Regions (Included)`             | The region(s) included in the advertisement's targeting criteria (states or provinces).                                                                                            |
| `Regions (Excluded)`             | The region(s) excluded in the advertisement's targeting criteria (states or provinces).                                                                                            |
| `Electoral Districts (Included)` | The electoral district(s) included in the advertisement's targeting criteria.                                                                                                      |
| `Electoral Districts (Excluded)` | The electoral district(s) excluded in the advertisement's targeting criteria.                                                                                                      |
| `Radius Targeting (Included)`    | The point-radius circles included in the advertisement's targeting criteria.                                                                                                       |
| `Radius Targeting (Excluded)`    | The point-radius circles excluded in the advertisement's targeting criteria.                                                                                                       |
| `Metros (Included)`              | The metro(s) included in the advertisement's targeting criteria.                                                                                                                   |
| `Metros (Excluded)`              | The metro(s) excluded in the advertisement's targeting criteria.                                                                                                                   |
| `Postal Code (Included)`         | The postal code(s) included in the advertisement's targeting criteria.                                                                                                             |
| `Postal Code (Excluded)`         | The postal code(s) excluded in the advertisement's targeting criteria.                                                                                                             |
| `Location Categories (Included)` | The location categories included in the advertisement's targeting criteria.                                                                                                        |
| `Location Categories (Excluded)` | The location categories excluded in the advertisement's targeting criteria.                                                                                                        |
| `Interests`                      | The interest audience(s) included in the advertisement's targeting criteria. If this field is empty, then no interest targeting was used.                                          |
| `OsType`                         | The operating systems included in the advertisement's targeting criteria. If this field is empty, then all operating systems were targeted.                                        |
| `Segments`                       | The segments included in the advertisement's targeting criteria. This is advertiser-specific data used such as Snap Audience Match[^1] or Lookalike audiences[^2]                  |
| `Language`                       | The languages targeted by the advertisement. If this field is empty, then no language-based targeting was used.                                                                    |
| `AdvancedDemographics`           | The third-party data segments targeted by the advertisement. If this field is empty, then no third-party data segments were used.                                                  |
| `Targeting Connection Type`      | The internet connection type targeted by the advertisement. If this field is empty, then no targeting based on internet connect type was used.                                     |
| `Targeting Carrier (ISP)`        | The carrier type targeted by the advertisement. If this field is empty, all carrier types are targeted.                                                                            |
| `CreativeProperties`             | The URL specified in advertisement's call to action.                                                                                                                               |

[^1]: Snap Audience Match or Customer List Audience is a Snapchat feature that allows users to send their data to the platform and its affiliates to form custom audiences ("Snapchat Audience Match Terms", n.d.).

[^2]: A Lookalike audience reaches Snapchat users that have similar characteristics to an organization account's existing customers. There are three different options: Similarity (a small-size audience that closely resembles the seed audience), Balance (a medium-size audience that balances similarity and reach), and Reach (a large-size audience that broadly resembles the seed audience) ("Lookalike Audiences", n.d.).