-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathmethodshub.qmd
115 lines (77 loc) · 4.23 KB
/
methodshub.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
---
title: rang - Reconstructing Reproducible R Computational Environments
format:
html:
embed-resources: true
gfm: default
---
## Description
<!-- - Provide a brief and clear description of the method, its purpose, and what it aims to achieve. Add a link to a related paper from social science domain and show how your method can be applied to solve that research question. -->
Resolve the dependency graph of R packages at a specific time point based on the information from various 'R-hub' web services <https://blog.r-hub.io/>. The dependency graph can then be used to reconstruct the R computational environment with 'Rocker' <https://rocker-project.org>.
## Keywords
<!-- EDITME -->
* Computational Environment
* Computational Reproducibility
* Open Science
## Science Usecase(s)
<!-- - Include usecases from social sciences that would make this method applicable in a certain scenario. -->
<!-- The use cases or research questions mentioned should arise from the latest social science literature cited in the description. -->
<!-- This is an example -->
This package is designed to retrospectively construct a constant computational environment for running shared R scripts, in which the computational environment is **not** specified. Additional functions are provided for creating executable [research compendia](https://research-compendium.science/).
## Repository structure
This repository follows [the standard structure of an R package](https://cran.r-project.org/doc/FAQ/R-exts.html#Package-structure).
## Environment Setup
With R installed:
```r
install.packages("rang")
```
Installation of [Docker](https://www.docker.com/) or [Singularity](https://sylabs.io/singularity/) is strongly recommended.
## Input Data
The main function `resolve()` accepts various input data. One example is a path to a directory of R scripts.
## Output Data
The main function `resolve()` gives an S3 object of dependency graph. Please refer to @sec-touse.
## How to Use {#sec-touse}
Suppose you would like to run this code snippet in [this 2018 paper](https://joss.theoj.org/papers/10.21105/joss.00774) of the R package `quanteda` (an R package for text analysis).
```r
library("quanteda")
# construct the feature co-occurrence matrix
examplefcm <-
tokens(data_corpus_irishbudget2010, remove_punct = TRUE) %>%
tokens_tolower() %>%
tokens_remove(stopwords("english"), padding = FALSE) %>%
fcm(context = "window", window = 5, tri = FALSE)
# choose 30 most frequency features
topfeats <- names(topfeatures(examplefcm, 30))
# select the top 30 features only, plot the network
set.seed(100)
textplot_network(fcm_select(examplefcm, topfeats), min_freq = 0.8)
```
This code cannot be executed with a recent version of `quanteda`. As the above code was written in 2018, one can get the dependency graph of `quanteda` in 2018:
```{r}
library(rang)
graph <- resolve(pkgs = "quanteda",
snapshot_date = "2018-10-06",
os = "ubuntu-18.04")
graph
```
This dependency graph can be used to create a dockerized computational environment (in form of `Dockerfile`) for running the abovementioned code. Suppose one would like to generate the `Dockerfile` in the directory "quanteda_docker".
```r
dockerize(graph, "quanteda_docker", method = "evercran")
```
A Docker container can then be built and launched, e.g. from the shell:
```sh
cd quanteda_docker
docker build -t rang .
docker run --rm --name "rangtest" -ti rang
```
The launched container is based on R 3.5.1 and `quanteda` 1.3.4 and is able to run the abovementioned code snippet.
Please refer to either the [publication of this package](https://doi.org/10.1371/journal.pone.0286761) or the [official website](https://gesistsa.github.io/rang/) for further information.
## Contact Details
Maintainer: Chung-hong Chan <chainsawtiney@gmail.com>
Issue Tracker: [https://github.com/gesistsa/rang/issues](https://github.com/gesistsa/rang/issues)
## Publication
Chan, C. H., & Schoch, D. (2023). rang: Reconstructing reproducible R computational environments. PLoS ONE, 18(6): e0286761. <https://doi.org/10.1371/journal.pone.0286761>.
<!-- ## Acknowledgements -->
<!-- - Acknowledgements if any -->
<!-- ## Disclaimer -->
<!-- - Add any disclaimers, legal notices, or usage restrictions for the method, if necessary. -->