Skip to content

Files

Latest commit

 

History

History
795 lines (700 loc) · 43.4 KB

variable-importance.md

File metadata and controls

795 lines (700 loc) · 43.4 KB

Data/variable investigation

25 February 2021, Andreas Beger

For the forecast update done in early 2021, I investigated whether all variables and external data sources are really needed in the forecasting models. To do that I looked at variable importance scores from the random forest models, using the old data that was created at the update in spring 2020. This note describes the results.


Are all data sources and variables that were included in the 2020 version needed?

To investigate this, I ran random forest models using the 2020 data version (v2) that ranges from 1970 to 2019. The corresponding code is in modelrunner/R/variable-importance.R. Variable importances were computed using the permutation method (in R, see ?ranger::ranger). I used average variable importances both for specific variables but also for entire groups of variables, to assess whether to keep or drop variables and/or data sources.

The goal is twofold:

  1. Reduce the number of external data sources that have to be updated. This will make future updates quicker and easier.
  2. By reducing the number of variables that go into the forecasting models, this will also decrease the time needed to run the full set of models. The 2020 forecasts took 12 hours to run on a Digital Ocean server.

Summary of changes:

  • The 2020 data involved 464 features from 8 data sources (ACD, Archigos, EPR, G&W statelist, P&T Coups, V-Dem, WDI (which has some other sources for missing pop/GDP as well)).
  • The new 2021 data will retain 3 external data sources in addition to V-Dem, with a total of ~230 columns. changes will drop 3 of the 7 data sources, and 230 of 464 columns.

Changes:

  • Retain P&T coups, but keep only the indicator for years since last P&T Coup attempt (drop 17 others)
  • Retain the GW state age indicator (SL prefix), but only raw or logged, not both (drop 1 column)
  • Retain the WDI and related indicators for infant mortality, population, and GDP
    • Drop 2 growth variables (drop 2 columns)
    • Drop the raw pop variable and keep only logged pop (drop 1 column)
  • In the V-Dem variables:
    • Drop the year to year change transformations (VD-diff below; 181 columns)
  • Drop ACD as a data source: variables are not important for prediction (drop 15 columns)
  • Drop Archigos: not very important (drop 5 columns)
  • Drop EPR: also not very important, and utility likely to decrease since most recent data cover to 2017 only (drop 8 columns)

Overview of variable importance data

I have multiplied the raw variable importance values by 1,000 to make comparisons easier. The resulting value range is a bit under 0 to 18.28.

Here are the number of variables by group:

## # A tibble: 11 x 2
##    group          n
##  * <chr>      <int>
##  1 ACD           15
##  2 Archigos       5
##  3 EPR            8
##  4 P&T Coups     18
##  5 SL             4
##  6 VD-diff      181
##  7 VD-v2        134
##  8 VD-v2x        50
##  9 VD-y           6
## 10 VD-y-trans    36
## 11 WDI            7
  • ACD: Armed Conflict Dataset
  • Archigos: state leader data
  • EPR: Ethnic Power Relations
  • P&T Coups: Powell & Thyne coups
  • SL: statelist indicators (gwcode, year, state time since independence)
  • VD-y: V-Dem outcome variables (N=6)
  • VD-y-trans: transformations of the outcome vars (y2y diff, MA5, MA10, squared)
  • VD-v2x: V-Dem variables that include “v2x”
  • VD-v2: Other V-Dem variables
  • VD-diff: year to year change in the VD-v2x and VD-v2 variable groups
  • WDI: World Development Indicators

The next plot is a histogram of variable importance values:

And basic summary stats:

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -0.03172  0.04262  0.20910  0.96202  1.29064 18.24742

Note that the top quintile is around 1.3 or higher. I’m going to use this for comparison in tables below.

We have 12 outcomes (6 spaces x 2 directions). Here are the variable importance values for each variable, by group:

Variable groups on the x-axis are ordered by average VI value. Groups on the bottom have lower averages and are less useful. Basically the points to the right are what we need to keep.

Next I’m going to focus on specific variable groups and go through the reasons I decided to keep or drop variables (or the complete data group/source).

ACD

The ACD data are regularly updated, but it doesn’t seem that they are informative for predicting changes in the democratic space variables.

The plot below shows the variable importance values for all outcomes and directions. The x-scale has the same range as the ones in the plot above.

Easy to see that they just don’t seem to matter for prediction.

Drop the whole data source.

EPR

The EPR data were last updated in November 2019, with data through 2017. We now ideally would have data through 2020.

The variable importances are slightly higher than with ACD, but none reach the top quintile:

variable mean max n_over_1.3
lag2_epr_excluded_group_pop 0.5 1.0 0
lag2_epr_inpower_groups_pop 0.5 0.8 0
lag2_epr_elf 0.5 1.1 0
lag2_epr_groups 0.3 1.1 0
lag2_epr_excluded_groups_count 0.3 0.7 0
lag2_epr_inpower_groups_count 0.2 0.5 0
lag2_epr_regaut_group_pop 0.1 0.2 0
lag2_epr_regaut_groups_count 0.1 0.2 0

Data are quite stale as they haven’t been updated recently, and they do not have high VI values; drop the whole data source.

Archigos

Variables related to the leader/head of a state.

variable mean max n_over_1.3
lag1_ldr_yr_in_power 0.7 1.2 0
lag1_ldr_age 0.7 1.3 0
lag1_ldr_irr_entry 0.1 0.6 0
lag1_ldr_male 0.0 0.1 0
lag1_ldr_imputed 0.0 0.0 0

Don’t seem to be very important; drop.

P&T Coups

variable mean max n_over_1.3
lag0_years_since_last_pt_attempt 1.6 2.4 8
lag0_years_since_last_pt_coup 1.4 2.1 8
lag0_years_since_last_pt_failed 1.2 1.9 6
lag0_pt_attempt_total 0.3 0.8 0
lag0_pt_coup_total 0.3 0.7 0
lag0_pt_attempt_num10yrs 0.3 1.0 0
lag0_pt_failed_total 0.3 0.5 0
lag0_pt_coup_num10yrs 0.2 0.7 0
lag0_pt_attempt_num5yrs 0.2 0.9 0
lag0_pt_failed_num10yrs 0.1 0.4 0
lag0_pt_coup_num5yrs 0.1 0.4 0
lag0_pt_failed_num5yrs 0.1 0.4 0
lag0_pt_coup 0.0 0.3 0
lag0_pt_coup_num 0.0 0.3 0
lag0_pt_attempt 0.0 0.1 0
lag0_pt_attempt_num 0.0 0.1 0
lag0_pt_failed_num 0.0 0.1 0
lag0_pt_failed 0.0 0.0 0

Only the “years_since_last_…” features seem to be important.

Correlations in the 3 “years_since_last” measures:

pt_coup pt_failed pt_attempt
pt_coup 1.00 0.78 0.90
pt_failed 0.78 1.00 0.91
pt_attempt 0.90 0.91 1.00

Since “years_since_last_pt_coup_attempt” is highly correlated with the other 2, just keep that one.

WDI

Ok, a couple seem to be high VI.

variable mean max n_over_1.3
lag2_infmort 2.6 5.5 11
lag2_log_gdp_pc 2.0 3.9 10
lag2_log_gdp 1.7 2.6 8
lag0_pop 1.3 2.1 7
lag0_log_pop 1.3 2.1 7
lag2_gdp_growth 0.2 0.4 0
lag2_gdp_pc_growth 0.2 0.4 0

Infmort, GDP, and pop.

infmort gdp_growth gdp_pc_growth log_gdp log_gdp_pc pop log_pop
infmort 1.00 0.02 -0.04 -0.57 -0.72 -0.03 -0.07
gdp_growth 0.02 1.00 0.97 0.00 0.00 0.04 0.01
gdp_pc_growth -0.04 0.97 1.00 0.05 0.04 0.05 0.02
log_gdp -0.57 0.00 0.05 1.00 0.67 0.35 0.70
log_gdp_pc -0.72 0.00 0.04 0.67 1.00 -0.06 -0.05
pop -0.03 0.04 0.05 0.35 -0.06 1.00 0.52
log_pop -0.07 0.01 0.02 0.70 -0.05 0.52 1.00

The 2 growth variables are not useful; also, drop raw pop and just keep logged pop. Although they are not very highly correlated, I’d rather keep the logged version.

SL

Things derived from the G&W state list and basic data structure (country codes and years).

variable mean max n_over_1.3
year 2.6 7.3 8
lag0_state_age 1.7 3.3 7
lag0_log_state_age 1.6 2.9 7
gwcode 0.9 2.2 2

Mildly informative and low-cost to keep (they are in the data anyways). Keep either state age or log state age but not both.

On to V-Dem groups of indicators.

VD-y

These are the 12 outcome indicators. They are used, unlagged, in each model, when forecasting next year’s outcome value.

variable mean max n_over_1.3
v2x_freexp_altinf 6.3 18.0 12
v2xcs_ccsi 5.7 16.4 12
v2xcl_rol 5.1 12.6 12
v2x_veracc_osp 4.6 15.8 11
v2x_horacc_osp 4.4 11.3 12
v2x_pubcorr 2.7 10.8 7

Keep.

VD-y-trans

Transformations derived from the outcome indicators. Five and ten year moving averages; squared terms; year to year diff.

## Warning: Removed 1 rows containing missing values (geom_point).

variable mean max n_over_1.3
v2x_freexp_altinf_squared 6.4 18.2 12
v2xcs_ccsi_squared 5.8 16.1 12
v2xcl_rol_squared 5.4 12.7 12
v2x_horacc_osp_squared 4.8 14.0 12
v2x_veracc_osp_squared 4.7 14.3 11
v2x_pubcorr_squared 2.6 9.3 7
v2x_freexp_altinf_diff_y2y 0.7 1.8 1
v2xcl_rol_up_ma10 0.5 1.3 1
v2xcs_ccsi_up_ma10 0.4 1.1 0
v2x_horacc_osp_up_ma10 0.4 1.3 0
v2xcs_ccsi_diff_y2y 0.4 0.8 0
v2xcl_rol_diff_y2y 0.3 0.5 0
v2x_horacc_osp_down_ma10 0.3 0.8 0
v2x_freexp_altinf_up_ma10 0.3 0.8 0
v2xcs_ccsi_up_ma5 0.3 1.2 0
v2xcs_ccsi_down_ma10 0.3 1.4 1
v2x_pubcorr_down_ma10 0.3 0.8 0
v2x_horacc_osp_diff_y2y 0.3 0.5 0
v2x_veracc_osp_down_ma10 0.2 0.7 0
v2x_veracc_osp_diff_y2y 0.2 0.4 0
v2x_horacc_osp_up_ma5 0.2 1.3 0
v2xcl_rol_down_ma10 0.2 0.4 0
v2x_freexp_altinf_down_ma10 0.2 0.5 0
v2xcl_rol_down_ma5 0.2 0.7 0
v2xcl_rol_up_ma5 0.2 0.4 0
v2x_horacc_osp_down_ma5 0.2 0.5 0
v2x_freexp_altinf_up_ma5 0.2 0.7 0
v2xcs_ccsi_down_ma5 0.2 0.4 0
v2x_veracc_osp_up_ma10 0.1 0.4 0
v2x_veracc_osp_down_ma5 0.1 0.5 0
v2x_pubcorr_up_ma10 0.1 0.3 0
v2x_pubcorr_down_ma5 0.1 0.3 0
v2x_freexp_altinf_down_ma5 0.1 0.2 0
v2x_pubcorr_diff_y2y 0.1 0.4 0
v2x_veracc_osp_up_ma5 0.1 0.5 0
v2x_pubcorr_up_ma5 0.1 0.2 0

The squared versions (_squared) matter, the rest (_ma5, _ma10, _diff_y2y) can be dropped.

VD-v2x

variable mean max n_over_1.3
lag0_v2x_diagacc 7.2 13.0 12
lag0_v2x_clpol 6.9 13.8 12
lag0_v2x_freexp 6.1 12.7 12
lag0_v2x_civlib 5.5 11.0 12
lag0_v2x_liberal 5.4 9.2 12
lag0_v2x_frassoc_thick 5.3 10.0 12
lag0_v2x_polyarchy 5.3 11.0 12
lag0_v2xcl_disc 4.8 9.2 12
lag0_v2xnp_pres 4.3 8.7 12
lag0_v2x_EDcomp_thick 4.1 10.3 12
lag0_v2x_clpriv 4.1 10.4 12
lag0_v2xel_frefair 3.8 12.9 12
lag0_v2x_neopat 3.5 5.5 12
lag0_v2xlg_legcon 3.5 11.3 10
lag0_v2x_rule 3.4 5.9 11
lag0_v2x_clphy 3.2 5.4 12
lag0_v2x_cspart 3.1 5.6 12
lag0_v2x_gencl 3.1 6.6 12
lag0_v2xcl_acjst 3.1 5.9 11
lag0_v2xdl_delib 3.0 4.5 12
lag0_v2x_jucon 2.8 4.9 11
lag0_v2xnp_regcorr 2.6 5.4 11
lag0_v2x_corr 2.4 6.7 9
lag0_v2x_egal 2.3 3.2 12
lag0_v2x_execorr 2.3 5.3 11
lag0_v2xcl_prpty 2.3 3.3 11
lag0_v2xcl_dmove 2.2 6.0 11
lag0_v2x_partip 2.2 3.9 12
lag0_v2xeg_eqaccess 2.0 2.9 10
lag0_v2x_gencs 1.9 3.3 9
lag0_v2xeg_eqprotec 1.8 2.8 9
lag0_v2xnp_client 1.8 2.7 10
lag0_v2xeg_eqdr 1.5 2.1 7
lag0_v2x_ex_military 1.5 2.8 6
lag0_v2xcl_slave 1.4 1.8 6
lag0_v2x_elecreg 0.9 8.9 1
lag0_v2xex_elecleg 0.8 3.9 2
lag0_v2xlg_elecreg 0.6 5.5 1
lag0_v2x_elecoff 0.6 2.9 1
lag0_v2x_ex_party 0.4 0.7 0
lag0_v2x_ex_confidence 0.4 1.1 0
lag0_v2x_ex_hereditary 0.1 0.3 0
lag0_v2x_ex_direlect 0.1 0.3 0
lag0_v2xex_elecreg 0.1 0.4 0
lag0_v2xlg_leginter 0.1 0.5 0
lag0_v2x_hosinter 0.0 0.2 0
lag0_v2xel_elecparl 0.0 0.1 0
lag0_v2x_legabort 0.0 0.1 0
lag0_v2xel_elecpres 0.0 0.0 0
lag0_v2x_hosabort 0.0 0.0 0

Quite a few of these are good to keep. Arbitrarily: keep those with a max value over 1.3.

Keep:

## c("lag0_v2x_civlib", "lag0_v2x_clphy", "lag0_v2x_clpol", "lag0_v2x_clpriv", 
## "lag0_v2x_corr", "lag0_v2x_cspart", "lag0_v2x_diagacc", "lag0_v2x_EDcomp_thick", 
## "lag0_v2x_egal", "lag0_v2x_elecoff", "lag0_v2x_elecreg", "lag0_v2x_ex_military", 
## "lag0_v2x_execorr", "lag0_v2x_frassoc_thick", "lag0_v2x_freexp", 
## "lag0_v2x_gencl", "lag0_v2x_gencs", "lag0_v2x_jucon", "lag0_v2x_liberal", 
## "lag0_v2x_neopat", "lag0_v2x_partip", "lag0_v2x_polyarchy", "lag0_v2x_rule", 
## "lag0_v2xcl_acjst", "lag0_v2xcl_disc", "lag0_v2xcl_dmove", "lag0_v2xcl_prpty", 
## "lag0_v2xcl_slave", "lag0_v2xdl_delib", "lag0_v2xeg_eqaccess", 
## "lag0_v2xeg_eqdr", "lag0_v2xeg_eqprotec", "lag0_v2xel_frefair", 
## "lag0_v2xex_elecleg", "lag0_v2xlg_elecreg", "lag0_v2xlg_legcon", 
## "lag0_v2xnp_client", "lag0_v2xnp_pres", "lag0_v2xnp_regcorr")

Drop:

## c("lag0_v2x_ex_confidence", "lag0_v2x_ex_direlect", "lag0_v2x_ex_hereditary", 
## "lag0_v2x_ex_party", "lag0_v2x_hosabort", "lag0_v2x_hosinter", 
## "lag0_v2x_legabort", "lag0_v2xel_elecparl", "lag0_v2xel_elecpres", 
## "lag0_v2xex_elecreg", "lag0_v2xlg_leginter")

VD-v2

variable mean max n_over_1.3
lag0_v2cseeorgs 4.7 11.1 12
lag0_v2csreprss 4.0 13.5 12
lag0_v2cldiscm 3.9 7.8 12
lag0_v2cldiscw 3.7 6.0 12
lag0_v2meharjrn 3.6 4.9 12
lag0_v2cltrnslw 3.5 6.0 11
lag0_v2merange 3.4 10.2 12
lag0_v2mecrit 3.1 7.6 12
lag0_v2clacjstm 3.1 5.6 12
lag0_v2mebias 3.1 5.3 12
lag0_v2mecenefm 3.0 6.5 12
lag0_v2meslfcen 3.0 8.4 11
lag0_v2mecorrpt 3.0 5.1 12
lag0_v2psbars 3.0 6.8 12
lag0_v2clacfree 3.0 5.9 12
lag0_v2elembaut 2.9 4.7 11
lag0_v2clfmove 2.9 8.0 11
lag0_v2lgoppart 2.8 7.3 10
lag0_v2cltort 2.7 4.6 12
lag0_v2dlcountr 2.7 5.8 12
lag0_v2clkill 2.6 4.2 11
lag0_v2clacjstw 2.6 5.8 10
lag0_v2exembez 2.6 4.7 11
lag0_v2cscnsult 2.6 4.3 11
lag0_v2lginvstp 2.5 7.7 8
lag0_v2psoppaut 2.5 3.7 11
lag0_v2exthftps 2.4 9.7 7
lag0_v2excrptps 2.4 9.2 6
lag0_v2clrspct 2.4 4.6 12
lag0_v2dlengage 2.4 3.8 12
lag0_v2csprtcpt 2.4 3.8 11
lag0_v2lgotovst 2.3 6.8 9
lag0_v2csantimv 2.3 4.3 11
lag0_v2elfrfair 2.2 7.4 7
lag0_v2csrlgrep 2.2 4.3 9
lag0_v2ellocons 2.2 8.8 8
lag0_v2dlconslt 2.2 4.0 11
lag0_v2exrescon 2.1 3.6 10
lag0_v2pssunpar 2.1 4.3 10
lag0_v2clstown 2.1 5.6 9
lag0_v2psparban 2.0 3.2 10
lag0_v2jucomp 1.9 3.4 10
lag0_v2clprptyw 1.8 3.0 11
lag0_v2cldmovem 1.8 6.3 6
lag0_v2exbribe 1.8 4.7 9
lag0_v2elintim 1.8 4.1 7
lag0_v2lgqstexp 1.8 3.0 8
lag0_v2clrelig 1.8 3.5 7
lag0_v2elmulpar 1.7 3.8 6
lag0_v2juhccomp 1.7 2.8 9
lag0_v2clprptym 1.7 2.8 8
lag0_v2pepwrsoc 1.7 2.4 9
lag0_v2juaccnt 1.7 4.0 8
lag0_v2elembcap 1.6 3.1 6
lag0_v2lgcomslo 1.6 3.1 6
lag0_v2cldmovew 1.6 3.4 8
lag0_v2juhcind 1.6 2.3 9
lag0_v2elirreg 1.6 3.0 7
lag0_v2dlreason 1.6 2.2 8
lag0_v2jucorrdc 1.5 3.2 6
lag0_v2juncind 1.5 2.5 7
lag0_v2clacjust 1.5 2.7 7
lag0_v2elaccept 1.5 4.9 6
lag0_v2elrgstry 1.5 2.8 6
lag0_v2pepwrort 1.5 3.2 6
lag0_v2elvotbuy 1.5 2.3 7
lag0_v2jureview 1.4 3.6 5
lag0_v2lgfunds 1.4 3.4 4
lag0_v2dlencmps 1.4 3.1 4
lag0_v2ellocumul 1.4 2.4 7
lag0_v2lglegplo 1.4 2.1 5
lag0_v2lgcrrpt 1.4 2.4 5
lag0_v2csrlgcon 1.3 2.8 5
lag0_v2pehealth 1.3 3.0 6
lag0_v2jupurge 1.3 2.0 6
lag0_v2psplats 1.3 2.0 6
lag0_v2elpdcamp 1.3 2.4 6
lag0_v2pepwrgen 1.3 2.4 4
lag0_v2clslavem 1.2 1.7 5
lag0_v2peedueq 1.2 1.8 6
lag0_v2csgender 1.2 2.3 5
lag0_v2eldonate 1.1 2.2 3
lag0_v2clslavef 1.1 1.9 3
lag0_v2elpaidig 1.1 2.1 5
lag0_v2exdfpphs 1.1 1.8 5
lag0_v2psprlnks 1.1 1.8 5
lag0_v2psorgs 1.1 1.9 4
lag0_v2clsocgrp 1.1 2.4 2
lag0_v2exdfvths 1.1 1.7 3
lag0_v2exremhsp 1.1 1.5 4
lag0_v2psprbrch 1.1 1.9 1
lag0_v2jupoatck 1.1 2.1 4
lag0_v2elpeace 1.1 1.6 2
lag0_v2pepwrses 1.0 1.8 3
lag0_v2elboycot 1.0 3.1 2
lag0_v2dlcommon 1.0 1.7 2
lag0_v2elfrcamp 1.0 1.5 4
lag0_v2pscnslnl 0.9 1.5 1
lag0_v2jupack 0.9 1.6 2
lag0_v2exdfdmhs 0.9 1.6 1
lag0_v2svstterr 0.9 2.2 1
lag0_v2elpubfin 0.8 1.1 0
lag0_v2exdfdshs 0.8 1.5 2
lag0_v2psnatpar 0.8 1.3 1
lag0_v2pscohesv 0.8 2.6 1
lag0_is_elec 0.8 8.5 1
lag0_v2exdfcbhs 0.8 1.1 0
lag0_v2dlunivl 0.8 1.2 0
lag0_v2clrgunev 0.8 1.6 1
lag0_v2elasmoff 0.8 1.7 1
lag0_v2lgsrvlo 0.8 1.3 0
lag0_v2pscomprg 0.7 1.7 1
lag0_v2lgbicam 0.6 3.4 2
lag0_v2svdomaut 0.6 1.2 0
lag0_v2jureform 0.6 0.9 0
lag0_v2svinlaut 0.6 1.0 0
lag0_v2elprescumul 0.6 1.1 0
lag0_v2expathhs 0.5 2.1 1
lag0_v2elprescons 0.5 0.7 0
lag0_v2lgdsadlobin 0.5 0.7 0
lag0_is_leg 0.5 2.9 2
lag0_v2ex_hosw 0.2 0.3 0
lag0_v2ex_hogw 0.2 0.4 0
lag0_v2ex_elechos 0.1 0.2 0
lag0_v2ex_legconhog 0.1 0.1 0
lag0_v2elreggov 0.1 0.3 0
lag0_v2ex_legconhos 0.1 0.1 0
lag0_v2exhoshog 0.1 0.1 0
lag0_v2elrsthos 0.0 0.1 0
lag0_v2ellocgov 0.0 0.1 0
lag0_v2elrstrct 0.0 0.1 0
lag0_v2elmonref 0.0 0.1 0
lag0_v2elmonden 0.0 0.1 0
lag0_is_election_year 0.0 0.1 0

Like above, this is more variable specific. Again using an arbitrary 1.3 max value.

Keep:

## c("lag0_is_elec", "lag0_is_leg", "lag0_v2clacfree", "lag0_v2clacjstm", 
## "lag0_v2clacjstw", "lag0_v2clacjust", "lag0_v2cldiscm", "lag0_v2cldiscw", 
## "lag0_v2cldmovem", "lag0_v2cldmovew", "lag0_v2clfmove", "lag0_v2clkill", 
## "lag0_v2clprptym", "lag0_v2clprptyw", "lag0_v2clrelig", "lag0_v2clrgunev", 
## "lag0_v2clrspct", "lag0_v2clslavef", "lag0_v2clslavem", "lag0_v2clsocgrp", 
## "lag0_v2clstown", "lag0_v2cltort", "lag0_v2cltrnslw", "lag0_v2csantimv", 
## "lag0_v2cscnsult", "lag0_v2cseeorgs", "lag0_v2csgender", "lag0_v2csprtcpt", 
## "lag0_v2csreprss", "lag0_v2csrlgcon", "lag0_v2csrlgrep", "lag0_v2dlcommon", 
## "lag0_v2dlconslt", "lag0_v2dlcountr", "lag0_v2dlencmps", "lag0_v2dlengage", 
## "lag0_v2dlreason", "lag0_v2elaccept", "lag0_v2elasmoff", "lag0_v2elboycot", 
## "lag0_v2eldonate", "lag0_v2elembaut", "lag0_v2elembcap", "lag0_v2elfrcamp", 
## "lag0_v2elfrfair", "lag0_v2elintim", "lag0_v2elirreg", "lag0_v2ellocons", 
## "lag0_v2ellocumul", "lag0_v2elmulpar", "lag0_v2elpaidig", "lag0_v2elpdcamp", 
## "lag0_v2elpeace", "lag0_v2elrgstry", "lag0_v2elvotbuy", "lag0_v2exbribe", 
## "lag0_v2excrptps", "lag0_v2exdfdmhs", "lag0_v2exdfdshs", "lag0_v2exdfpphs", 
## "lag0_v2exdfvths", "lag0_v2exembez", "lag0_v2expathhs", "lag0_v2exremhsp", 
## "lag0_v2exrescon", "lag0_v2exthftps", "lag0_v2juaccnt", "lag0_v2jucomp", 
## "lag0_v2jucorrdc", "lag0_v2juhccomp", "lag0_v2juhcind", "lag0_v2juncind", 
## "lag0_v2jupack", "lag0_v2jupoatck", "lag0_v2jupurge", "lag0_v2jureview", 
## "lag0_v2lgbicam", "lag0_v2lgcomslo", "lag0_v2lgcrrpt", "lag0_v2lgfunds", 
## "lag0_v2lginvstp", "lag0_v2lglegplo", "lag0_v2lgoppart", "lag0_v2lgotovst", 
## "lag0_v2lgqstexp", "lag0_v2mebias", "lag0_v2mecenefm", "lag0_v2mecorrpt", 
## "lag0_v2mecrit", "lag0_v2meharjrn", "lag0_v2merange", "lag0_v2meslfcen", 
## "lag0_v2peedueq", "lag0_v2pehealth", "lag0_v2pepwrgen", "lag0_v2pepwrort", 
## "lag0_v2pepwrses", "lag0_v2pepwrsoc", "lag0_v2psbars", "lag0_v2pscnslnl", 
## "lag0_v2pscohesv", "lag0_v2pscomprg", "lag0_v2psnatpar", "lag0_v2psoppaut", 
## "lag0_v2psorgs", "lag0_v2psparban", "lag0_v2psplats", "lag0_v2psprbrch", 
## "lag0_v2psprlnks", "lag0_v2pssunpar", "lag0_v2svstterr")

Drop:

## c("lag0_is_election_year", "lag0_v2dlunivl", "lag0_v2ellocgov", 
## "lag0_v2elmonden", "lag0_v2elmonref", "lag0_v2elprescons", "lag0_v2elprescumul", 
## "lag0_v2elpubfin", "lag0_v2elreggov", "lag0_v2elrsthos", "lag0_v2elrstrct", 
## "lag0_v2ex_elechos", "lag0_v2ex_hogw", "lag0_v2ex_hosw", "lag0_v2ex_legconhog", 
## "lag0_v2ex_legconhos", "lag0_v2exdfcbhs", "lag0_v2exhoshog", 
## "lag0_v2jureform", "lag0_v2lgdsadlobin", "lag0_v2lgsrvlo", "lag0_v2svdomaut", 
## "lag0_v2svinlaut")

VD-diff (V-Dem variable changes from previous year)

These are year to year changes in the “v2” and “v2x” sets of variables.

## Warning: Removed 1 rows containing missing values (geom_point).

variable mean max n_over_1.3
lag0_diff_year_prior_v2x_diagacc 1.0 2.1 3
lag0_diff_year_prior_v2x_clpol 0.9 1.3 1
lag0_diff_year_prior_v2x_freexp 0.7 1.7 1
lag0_diff_year_prior_v2x_civlib 0.6 1.1 0
lag0_diff_year_prior_v2psbars 0.5 0.9 0
lag0_diff_year_prior_v2xcl_disc 0.4 0.8 0
lag0_diff_year_prior_v2x_clphy 0.4 1.0 0
lag0_diff_year_prior_v2x_frassoc_thick 0.3 0.7 0
lag0_diff_year_prior_v2psoppaut 0.3 0.7 0
lag0_diff_year_prior_v2xdl_delib 0.3 0.7 0
lag0_diff_year_prior_v2x_liberal 0.3 0.4 0
lag0_diff_year_prior_v2cltort 0.3 0.8 0
lag0_diff_year_prior_v2x_polyarchy 0.3 0.4 0
lag0_diff_year_prior_v2psparban 0.3 0.6 0
lag0_diff_year_prior_v2merange 0.3 0.8 0
lag0_diff_year_prior_v2csreprss 0.3 0.5 0
lag0_diff_year_prior_v2xnp_pres 0.3 0.5 0
lag0_diff_year_prior_v2cldiscw 0.2 0.5 0
lag0_diff_year_prior_v2cseeorgs 0.2 0.6 0
lag0_diff_year_prior_v2cldiscm 0.2 0.4 0
lag0_diff_year_prior_v2mebias 0.2 0.6 0
lag0_diff_year_prior_v2clkill 0.2 0.5 0
lag0_diff_year_prior_v2mecenefm 0.2 0.5 0
lag0_diff_year_prior_v2x_cspart 0.2 0.6 0
lag0_diff_year_prior_v2x_rule 0.2 0.6 0
lag0_diff_year_prior_v2xlg_legcon 0.2 0.5 0
lag0_diff_year_prior_v2pssunpar 0.2 0.5 0
lag0_diff_year_prior_v2meharjrn 0.2 0.5 0
lag0_diff_year_prior_v2x_EDcomp_thick 0.2 0.3 0
lag0_diff_year_prior_v2clacfree 0.2 0.5 0
lag0_diff_year_prior_v2elembaut 0.2 0.5 0
lag0_diff_year_prior_v2dlconslt 0.2 0.5 0
lag0_diff_year_prior_v2dlengage 0.2 0.5 0
lag0_diff_year_prior_v2mecrit 0.2 0.5 0
lag0_diff_year_prior_v2mecorrpt 0.2 0.5 0
lag0_diff_year_prior_v2csprtcpt 0.2 0.7 0
lag0_diff_year_prior_v2cltrnslw 0.2 0.3 0
lag0_diff_year_prior_v2x_clpriv 0.2 0.4 0
lag0_diff_year_prior_v2meslfcen 0.2 0.4 0
lag0_diff_year_prior_v2psnatpar 0.2 0.5 0
lag0_diff_year_prior_v2dlcountr 0.2 0.4 0
lag0_diff_year_prior_v2exrescon 0.2 0.3 0
lag0_diff_year_prior_v2x_partip 0.1 0.3 0
lag0_diff_year_prior_v2x_jucon 0.1 0.3 0
lag0_diff_year_prior_v2lgoppart 0.1 0.4 0
lag0_diff_year_prior_v2x_execorr 0.1 0.7 0
lag0_diff_year_prior_v2lginvstp 0.1 0.4 0
lag0_diff_year_prior_v2csantimv 0.1 0.2 0
lag0_diff_year_prior_v2xel_frefair 0.1 0.2 0
lag0_diff_year_prior_v2x_corr 0.1 0.7 0
lag0_diff_year_prior_v2x_neopat 0.1 0.3 0
lag0_diff_year_prior_v2xcl_dmove 0.1 0.3 0
lag0_diff_year_prior_v2clsocgrp 0.1 0.3 0
lag0_diff_year_prior_v2cscnsult 0.1 0.2 0
lag0_diff_year_prior_v2csrlgrep 0.1 0.2 0
lag0_diff_year_prior_v2xcl_acjst 0.1 0.2 0
lag0_diff_year_prior_v2lgotovst 0.1 0.3 0
lag0_diff_year_prior_v2cldmovem 0.1 0.3 0
lag0_diff_year_prior_v2juhcind 0.1 0.3 0
lag0_diff_year_prior_v2x_gencl 0.1 0.2 0
lag0_diff_year_prior_v2clrspct 0.1 0.2 0
lag0_diff_year_prior_v2clfmove 0.1 0.4 0
lag0_diff_year_prior_v2psplats 0.1 0.4 0
lag0_diff_year_prior_v2x_gencs 0.1 0.3 0
lag0_diff_year_prior_v2exembez 0.1 0.4 0
lag0_diff_year_prior_v2xlg_leginter 0.1 0.7 0
lag0_diff_year_prior_v2eldonate 0.1 0.2 0
lag0_diff_year_prior_v2clacjstw 0.1 0.2 0
lag0_diff_year_prior_v2xnp_client 0.1 0.2 0
lag0_diff_year_prior_v2elmulpar 0.1 0.2 0
lag0_diff_year_prior_v2jureform 0.1 0.2 0
lag0_diff_year_prior_v2clacjstm 0.1 0.1 0
lag0_diff_year_prior_v2juncind 0.1 0.2 0
lag0_diff_year_prior_v2xex_elecleg 0.1 0.2 0
lag0_diff_year_prior_v2jucomp 0.1 0.2 0
lag0_diff_year_prior_v2ellocons 0.1 0.6 0
lag0_diff_year_prior_v2xnp_regcorr 0.1 0.3 0
lag0_diff_year_prior_v2psorgs 0.1 0.2 0
lag0_diff_year_prior_v2psprlnks 0.1 0.3 0
lag0_diff_year_prior_v2juhccomp 0.1 0.1 0
lag0_diff_year_prior_v2lgfunds 0.1 0.2 0
lag0_diff_year_prior_v2exdfdshs 0.1 0.2 0
lag0_diff_year_prior_v2exremhsp 0.1 0.1 0
lag0_diff_year_prior_v2dlreason 0.1 0.2 0
lag0_diff_year_prior_v2lgqstexp 0.1 0.2 0
lag0_diff_year_prior_v2x_ex_military 0.1 0.1 0
lag0_diff_year_prior_v2exbribe 0.1 0.3 0
lag0_diff_year_prior_v2lgdsadlobin 0.1 0.3 0
lag0_diff_year_prior_v2x_elecoff 0.1 0.3 0
lag0_diff_year_prior_v2xcl_prpty 0.1 0.2 0
lag0_diff_year_prior_v2lgcrrpt 0.1 0.3 0
lag0_diff_year_prior_v2cldmovew 0.1 0.2 0
lag0_diff_year_prior_v2clrelig 0.1 0.1 0
lag0_diff_year_prior_v2clprptym 0.1 0.3 0
lag0_diff_year_prior_v2exthftps 0.1 0.2 0
lag0_diff_year_prior_v2lgbicam 0.1 0.5 0
lag0_diff_year_prior_v2psprbrch 0.1 0.1 0
lag0_diff_year_prior_v2svstterr 0.1 0.1 0
lag0_diff_year_prior_v2xcl_slave 0.1 0.2 0
lag0_diff_year_prior_v2pepwrses 0.1 0.1 0
lag0_diff_year_prior_v2elfrfair 0.1 0.1 0
lag0_diff_year_prior_v2lgcomslo 0.1 0.1 0
lag0_diff_year_prior_v2dlencmps 0.1 0.1 0
lag0_diff_year_prior_v2clslavem 0.1 0.2 0
lag0_diff_year_prior_v2jureview 0.1 0.2 0
lag0_diff_year_prior_v2dlcommon 0.1 0.2 0
lag0_diff_year_prior_v2excrptps 0.1 0.3 0
lag0_diff_year_prior_v2pscohesv 0.1 0.1 0
lag0_diff_year_prior_v2elfrcamp 0.1 0.1 0
lag0_diff_year_prior_v2exdfvths 0.1 0.2 0
lag0_diff_year_prior_v2elirreg 0.1 0.1 0
lag0_diff_year_prior_v2elpubfin 0.1 0.2 0
lag0_diff_year_prior_v2elembcap 0.1 0.1 0
lag0_diff_year_prior_v2expathhs 0.1 0.2 0
lag0_diff_year_prior_v2csgender 0.1 0.1 0
lag0_diff_year_prior_v2elpdcamp 0.1 0.1 0
lag0_diff_year_prior_v2elintim 0.0 0.2 0
lag0_diff_year_prior_v2jupurge 0.0 0.2 0
lag0_diff_year_prior_v2lglegplo 0.0 0.1 0
lag0_diff_year_prior_v2elpaidig 0.0 0.1 0
lag0_diff_year_prior_v2pscnslnl 0.0 0.1 0
lag0_diff_year_prior_v2csrlgcon 0.0 0.2 0
lag0_diff_year_prior_v2exdfpphs 0.0 0.1 0
lag0_diff_year_prior_v2xeg_eqdr 0.0 0.1 0
lag0_diff_year_prior_v2clprptyw 0.0 0.1 0
lag0_diff_year_prior_v2svdomaut 0.0 0.1 0
lag0_diff_year_prior_v2clstown 0.0 0.1 0
lag0_diff_year_prior_v2lgsrvlo 0.0 0.1 0
lag0_diff_year_prior_v2elaccept 0.0 0.1 0
lag0_diff_year_prior_v2x_egal 0.0 0.1 0
lag0_diff_year_prior_v2exdfcbhs 0.0 0.1 0
lag0_diff_year_prior_v2clslavef 0.0 0.1 0
lag0_diff_year_prior_v2xeg_eqprotec 0.0 0.1 0
lag0_diff_year_prior_v2exdfdmhs 0.0 0.1 0
lag0_diff_year_prior_v2elrgstry 0.0 0.1 0
lag0_diff_year_prior_v2svinlaut 0.0 0.1 0
lag0_diff_year_prior_v2pepwrort 0.0 0.1 0
lag0_diff_year_prior_v2clrgunev 0.0 0.1 0
lag0_diff_year_prior_v2jucorrdc 0.0 0.1 0
lag0_diff_year_prior_v2elboycot 0.0 0.1 0
lag0_diff_year_prior_v2xeg_eqaccess 0.0 0.1 0
lag0_diff_year_prior_v2elasmoff 0.0 0.1 0
lag0_diff_year_prior_v2x_hosinter 0.0 0.2 0
lag0_diff_year_prior_v2juaccnt 0.0 0.1 0
lag0_diff_year_prior_v2x_ex_party 0.0 0.1 0
lag0_diff_year_prior_v2pepwrsoc 0.0 0.1 0
lag0_diff_year_prior_v2jupoatck 0.0 0.1 0
lag0_diff_year_prior_v2clacjust 0.0 0.1 0
lag0_diff_year_prior_v2elprescons 0.0 0.1 0
lag0_diff_year_prior_v2elvotbuy 0.0 0.1 0
lag0_diff_year_prior_v2pscomprg 0.0 0.1 0
lag0_diff_year_prior_v2dlunivl 0.0 0.1 0
lag0_diff_year_prior_v2pepwrgen 0.0 0.1 0
lag0_diff_year_prior_v2pehealth 0.0 0.1 0
lag0_diff_year_prior_v2elpeace 0.0 0.1 0
lag0_diff_year_prior_v2x_elecreg 0.0 0.2 0
lag0_diff_year_prior_v2xlg_elecreg 0.0 0.1 0
lag0_diff_year_prior_v2peedueq 0.0 0.0 0
lag0_diff_year_prior_v2jupack 0.0 0.0 0
lag0_diff_year_prior_v2ex_elechos 0.0 0.0 0
lag0_diff_year_prior_v2exhoshog 0.0 0.1 0
lag0_diff_year_prior_v2xel_elecparl 0.0 0.1 0
lag0_diff_year_prior_v2x_ex_confidence 0.0 0.0 0
lag0_diff_year_prior_v2xex_elecreg 0.0 0.0 0
lag0_diff_year_prior_v2ex_hosw 0.0 0.0 0
lag0_diff_year_prior_v2ex_hogw 0.0 0.0 0
lag0_diff_year_prior_v2ellocumul 0.0 0.1 0
lag0_diff_year_prior_v2x_legabort 0.0 0.1 0
lag0_diff_year_prior_v2xel_elecpres 0.0 0.0 0
lag0_diff_year_prior_v2elprescumul 0.0 0.0 0
lag0_diff_year_prior_v2x_ex_direlect 0.0 0.0 0
lag0_diff_year_prior_v2ex_legconhos 0.0 0.0 0
lag0_diff_year_prior_v2ex_legconhog 0.0 0.0 0
lag0_diff_year_prior_v2x_ex_hereditary 0.0 0.0 0
lag0_diff_year_prior_v2x_hosabort 0.0 0.0 0
lag0_diff_year_prior_v2elrsthos 0.0 0.0 0
lag0_diff_year_prior_v2elmonden 0.0 0.0 0
lag0_diff_year_prior_v2elmonref 0.0 0.0 0
lag0_diff_year_prior_v2elrstrct 0.0 0.0 0
lag0_diff_year_prior_v2elreggov 0.0 0.0 0
lag0_diff_year_prior_v2ellocgov 0.0 0.0 0

Not useful. Drop all of these.