Skip to content

Latest commit

 

History

History
185 lines (146 loc) · 8.71 KB

README.md

File metadata and controls

185 lines (146 loc) · 8.71 KB

knockoffsr

An interface to Knockoffs.jl from the R programming language. knockoffsr provides unique high performance methods for sampling various model-X knockoffs and ships with built-in routines for variable selection. Much of the functionality are unique and allow for orders of magnitude speedup over conventional methods. 'knockoffsr' attaches an R interface onto the package, allowing seamless use of this tooling by R users.

Installation

knockoffsr is not on CRAN yet. One can install the package using devtools:

library(devtools)
install_github("biona001/knockoffsr")

Check installation is successful:

library(knockoffsr)
ko <- knockoffsr::knockoff_setup()

What if installation fails

  1. install_github does not work. One can download the source code directly and install knockoffsr from a local file.
  • wget -O knockoffsr.zip https://github.com/biona001/knockoffsr/archive/refs/heads/main.zip
  • Within R, install package locally
library(devtools)
devtools::install_local("knockoffsr.zip")
  1. Julia cannot be installed automatically. In this case, one should manually install Julia, then give knockoffsr the path to Julia's executable
library(knockoffsr)
julia_dir = "/Applications/Julia-1.8.app/Contents/Resources/julia/bin" # path to folder that containins the Julia executable
ko <- knockoffsr::knockoff_setup(julia_dir)
  1. Setup script for JuliaCall failed due to LoadError of from a Global method definition call Please install the development version of JuliaCall, see this issue and this issue
library(devtools)
devtools::install_github("Non-Contradiction/JuliaCall")

If you encounter other installation errors, please file an issue on Github and I'll take a look asap.

Usage

knockoffsr provides a direct wrapper over Knockoffs.jl. The namespace is setup so that the standard syntax of Julia translates directly over to the R environment. There are 3 things to keep in mind:

  • All Knockoffs.jl commands are prefaced by ko$
  • All commands with a ! are replaced with _bang, for example solve_s! becomes solve_s_bang.
  • All Knockoffs.jl functions that require a Symmetric matrix as inputs now accepts a regular matrix. However, for hc_partition_groups, one must set an extra argument isCovariance to be TRUE or FALSE.

The first 2 syntax follows the practice of diffeqr.

Updating package

It is recommended to update the internal Knockoffs.jl julia package, whenever it makes new releases. This can be achieved in R via

library(JuliaCall)
julia_update_package("Knockoffs")
ko <- knockoffsr::knockoff_setup()
# ... perform knockoff analysis

Documentation

Since knockoffsr is a wrapper of Knockoffs.jl, the documentation of Knockoffs.jl will be the main source of documentation for knockoffsr.

Example 1: Exact model-X group knockoffs

Lets simulate X ~ N(0, Sigma) where Sigma is a symmetric Toesplitz matrix. Here we assume every 5 variables form a group

n <- 1000          # number of samples
p <- 1000          # number of covariates
m <- 1             # number of knockoffs to generate per feature
groups = rep(1:200, each = 5)
Sigma <- toeplitz(0.7^(0:(p-1)))
mu <- rep(0, p)
X <- MASS::mvrnorm(n=n, mu=mu, Sigma=Sigma)

We generate model-X group knockoffs as follows

solver <- "maxent" # Maximum entropy solver, other choices include "mvr", "sdp", "equi"
result <- ko$modelX_gaussian_group_knockoffs(X, solver, groups, mu, Sigma, m=m, verbose=TRUE)

Maxent initial obj = -2087.692936466681
Iter 1 (PCA): obj = -1607.437164111949, δ = 0.48068405334794, t1 = 0.08, t2 = 0.51
Iter 2 (CCD): obj = -1589.9518385891724, δ = 0.046537146748585216, t1 = 0.2, t2 = 1.72, t3 = 0.0
Iter 3 (PCA): obj = -1570.4491015280207, δ = 0.32338244109035785, t1 = 0.27, t2 = 2.23
Iter 4 (CCD): obj = -1562.5471454507465, δ = 0.028462155141070353, t1 = 0.39, t2 = 3.4, t3 = 0.0
Iter 5 (PCA): obj = -1557.1393033537292, δ = 0.12456084458147469, t1 = 0.52, t2 = 3.9
Iter 6 (CCD): obj = -1552.4489159484517, δ = 0.020754156607443626, t1 = 0.63, t2 = 5.08, t3 = 0.0
Iter 7 (PCA): obj = -1549.8106156569438, δ = 0.07012194156366308, t1 = 0.7, t2 = 5.59
Iter 8 (CCD): obj = -1547.020766696056, δ = 0.015614065719369137, t1 = 0.81, t2 = 6.71, t3 = 0.01
Iter 9 (PCA): obj = -1545.5310885752167, δ = 0.051170185931354716, t1 = 0.87, t2 = 7.22
Iter 10 (CCD): obj = -1543.8817717502172, δ = 0.013019011861969437, t1 = 0.98, t2 = 8.35, t3 = 0.01
Iter 11 (PCA): obj = -1542.9960966431306, δ = 0.04029486159842444, t1 = 1.04, t2 = 8.86
Iter 12 (CCD): obj = -1542.0192637205878, δ = 0.011386766045753434, t1 = 1.15, t2 = 10.0, t3 = 0.01
Iter 13 (PCA): obj = -1541.4898664708526, δ = 0.033102222474122915, t1 = 1.21, t2 = 10.51
Iter 14 (CCD): obj = -1540.900570416842, δ = 0.01023411502927556, t1 = 1.31, t2 = 11.64, t3 = 0.01
Iter 15 (PCA): obj = -1540.578996100881, δ = 0.027573434751228434, t1 = 1.39, t2 = 12.14
Iter 16 (CCD): obj = -1540.2216962317439, δ = 0.009352305423346552, t1 = 1.49, t2 = 13.26, t3 = 0.01
Iter 17 (PCA): obj = -1540.0257796341727, δ = 0.023457719731933214, t1 = 1.55, t2 = 13.76
Iter 18 (CCD): obj = -1539.806801955504, δ = 0.008629474061312124, t1 = 1.65, t2 = 14.89, t3 = 0.01
Iter 19 (PCA): obj = -1539.6892344905818, δ = 0.020162878516815832, t1 = 1.71, t2 = 15.39
Iter 20 (CCD): obj = -1539.5543495056943, δ = 0.007951906306608823, t1 = 1.81, t2 = 16.53, t3 = 0.02

To extract the knockoffs, S matrix, and the final objective, one can do:

Xko <- result$Xko
S <- result$S
obj <- result$obj

Example 2: Defining group memberships

We provide useful utilities functions to define groups empirically, based on data matrix X or directly on the covariance matrix Sigma.

Using hierarchical clustering (input is design matrix X):

isCovariance = FALSE
groups = ko$hc_partition_groups(X, isCovariance, linkage="average", cutoff=0.5)
str(groups)

 int [1:1000] 1 1 2 2 3 3 4 4 5 5 ...

Using hierarchical clustering (input is covariance matrix Sigma):

isCovariance = TRUE
groups = ko$hc_partition_groups(Sigma, isCovariance, linkage="average", cutoff=0.5)
str(groups)

 int [1:1000] 1 1 1 1 2 2 2 2 3 3 ...

Example 3: Feature selection via Lasso

To run Lasso and apply knockoff-filter, one option is to feed in the knockoff variable result created in example 1.

# first simulate a response with k=10 causal variables drawn from N(0, 1)
k <- 10
beta <- sample(c(rnorm(k), rep(0, p-k)))
y <- X %*% beta + rnorm(n)

# fit lasso and apply knockoff filter, where default FDR = 0.01, 0.05, 0.1, 0.25, 0.5
ko_filter = ko$fit_lasso(y, result)

# get selected groups
selected1 <- ko_filter$selected[1] # selected groups with target FDR = 0.01
selected2 <- ko_filter$selected[2] # selected groups with target FDR = 0.05 
selected3 <- ko_filter$selected[3] # ...
selected4 <- ko_filter$selected[4]
selected5 <- ko_filter$selected[5]

Example 4: Ghost Knockoffs

Given a correlation matrix Sigma and a vector of Z-scores z, one can generate the knockoffs of z directly, so called ghost knockoffs as derived in this paper. To illustrate this, in the following, we will

  1. Simulate artificial correlation matrix and Z scores
  2. Define groups by average linkage hierarchical clustering, and choose group-key variables within groups by applying a best-subset seletion algorithm with default cutoff value c=0.5. This follows this paper.
  3. Sample ghost knockoffs
# 1. simulate data
p <- 1000 
Sigma <- toeplitz(0.7^(0:(p-1)))
z <- MASS::mvrnorm(mu=rep(0, p), Sigma=Sigma)

# 2. define groups and group-key variables
isCovariance = TRUE
groups <- ko$hc_partition_groups(Sigma, isCovariance, cutoff=0.5, linkage="average")
group_reps <- ko$choose_group_reps(Sigma, groups, threshold=0.5)

# 3. solve group knockoff optimization problem and generate ghostknockoffs
m <- 5
method <- "maxent"
result <- ko$solve_s_graphical_group(Sigma, groups, group_reps, method, m=m, verbose=TRUE)
zko <- ko$ghost_knockoffs(z, result[[2]], solve(Sigma), m=as.integer(m))

Citation and reproducibility

If you use this software in a research paper, please cite our paper. Scripts to reproduce the results featured in our paper can be found here.