Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Add native support for Github packages #22

Closed
chainsawriot opened this issue Feb 7, 2023 · 28 comments
Closed

Add native support for Github packages #22

chainsawriot opened this issue Feb 7, 2023 · 28 comments
Labels

Comments

@chainsawriot
Copy link
Collaborator

resolve(pkgs = gh("schochastics/netUtils@5e2f3ab534"), snapshot_date = '2020-08-26')

Or even to find the closet commit to snapshot_date. But gran needs to be able to emit gran object for a github package.

@schochastics
Copy link
Collaborator

I think it is best to find the closest commit to snapshot_date automatically because not everybody will know what these random letters/numbers mean and where to get them. Here is a suggestion to obtain the closest commit sha via gh

get_sha <- function(repo,date){
  commits <- gh::gh(paste0("GET /repos/",repo,"/commits"),per_page = 100)
  dates <- sapply(commits,function(x) x$commit$committer$date)
  idx <- which(dates<=date)[1]
  k <- 2
  while(is.null(idx)){
    commits <- gh::gh(paste0("GET /repos/",repo,"/commits"),per_page = 100,page = k)  
    k <- k + 1
  }
  commits[[idx]]$sha
}
repo <- "schochastics/netUtils"
date <- as.Date("2020-08-26")
get_sha(repo,date)
#> [1] "5e2f3ab53452f140312689da02d871ad58a96867"

Created on 2023-02-07 with reprex v2.0.2

Probably still error prone paging

@schochastics
Copy link
Collaborator

Just found pkgdepends, might be helpful to get dependencies of github only packages?

library(pkgdepends)
pd <- new_pkg_deps("schochastics/levelnet@775cf5e")
pd$solve()
#> ! Using bundled GitHub PAT. Please add your own PAT using `gitcreds::gitcreds_set()`.
#> ℹ Loading metadata database
#> ✔ Loading metadata database ... done
#> 
pd$draw()
#> schochastics/levelnet@775cf5e 0.5.0 [new][bld][cmp][dl] (unknown size)
#> ├─igraph 1.3.5 [new][bld][cmp][dl] (2.50 MB)
#> │ ├─magrittr 2.0.3 [new][bld][cmp][dl] (267.07 kB)
#> │ ├─Matrix 1.5-1 < 1.5-3 [old]
#> │ │ └─lattice 0.20-45 
#> │ ├─pkgconfig 2.0.3 [new][bld][dl] (6.08 kB)
#> │ └─rlang 1.0.6 [new][bld][cmp][dl] (742.51 kB)
#> ├─Matrix
#> └─Rcpp 1.0.10 [new][bld][cmp][dl] (2.94 MB)
#> 
#> Key:  [new] new | [old] outdated | [dl] download | [bld] build | [cmp] compile

Created on 2023-02-07 with reprex v2.0.2
Apologies if this is irrelevant but I am still not that familiar with the code base of gran :)

@schochastics
Copy link
Collaborator

A way without pkgdepends could be this one:

get_sha <- function(repo,date){
  commits <- gh::gh(paste0("GET /repos/",repo,"/commits"),per_page = 100)
  dates <- sapply(commits,function(x) x$commit$committer$date)
  idx <- which(dates<=date)[1]
  k <- 2
  while(is.null(idx)){
    commits <- gh::gh(paste0("GET /repos/",repo,"/commits"),per_page = 100,page = k)  
    k <- k + 1
  }
  list(sha = commits[[idx]]$sha,x_pubdate = dates[[idx]])
}

repo <- "schochastics/netUtils"
snapshot_date <- "2020-08-26"
snapshot_date <- anytime::anytime(snapshot_date, tz = "UTC", asUTC = TRUE)
sha <- get_sha(repo,snapshot_date)  
repo_descr <- gh::gh(paste0("GET /repos/",repo,"/contents/DESCRIPTION"),ref=sha$sha)
descr_df <- as.data.frame(read.dcf(url(repo_descr$download_url)))
descr_df
#>       Package                                      Title    Version
#> 1 igraphUtils A Collection of Network Analytic Functions 0.1.0.9000
#>                                                                                                        Authors@R
#> 1 person(given = "David",\nfamily = "Schoch",\nrole = c("aut", "cre"),\nemail = "david.schoch@manchester.ac.uk")
#>                                                                                        Description
#> 1 Provides a collection of network analytic functions that may not deserve a package on their own.
#>              License Encoding LazyData               Roxygen RoxygenNote
#> 1 MIT + file LICENSE    UTF-8     true list(markdown = TRUE)       7.1.0
#>              LinkingTo       Imports
#> 1 Rcpp,\nRcppArmadillo Rcpp,\nigraph

No additional dependencies except gh which one probably needs anyway but we need to parse the Description field ourselves
Created on 2023-02-08 with reprex v2.0.2

@chainsawriot
Copy link
Collaborator Author

chainsawriot commented Feb 8, 2023

@schochastics So now you are a CTB.

I thought about using pkgdepends previously (#1), but decided not using it because pkgsearch::cran_package_history provides enough information (for CRAN packages).

In the long run, I think we might be better off using pkgdepends (because it supports bioc etc.). Also, opening up gran to Github also means opening up to DESCRIPTION fields such as Remotes. And pkgdepends support these.

For now, I will take your get_sha and read.dcf approach.

@chainsawriot
Copy link
Collaborator Author

Tag as v0.1 for now. Dunno if it can be made.

@schochastics
Copy link
Collaborator

schochastics commented Feb 8, 2023

@schochastics So now you are a CTB.

:)

For now, I will take your get_sha and read.dcf approach.

Do you want to take over integrating this into the package? I otherwise give it a shot

@chainsawriot
Copy link
Collaborator Author

@schochastics Please give it a shot (and be AUT)!

@schochastics
Copy link
Collaborator

schochastics commented Feb 8, 2023

I have a working version in my fork in the gh branch.
The problem are system requirements. Not sure we can get this reliably from the DESCRIPTION
example: igraph DESCRIPTION:

SystemRequirements:
    gmp (optional),
    libxml2 (optional),
    glpk (>= 4.57, optional)
R> remotes::system_requirements(package = "igraph",os="ubuntu",os_release="20.04")
[1] "apt-get install -y libglpk-dev" "apt-get install -y libgmp3-dev"
[3] "apt-get install -y libxml2-dev"

@chainsawriot
Copy link
Collaborator Author

@schochastics For now, an interim solution is to put the names of non-cran packages in a special slot inside the granlist object (e.g. output$noncran_pkgs, I don't want to call it gh_pkgs because we might need to include bioc or even local packages in the future). And probably, those non-cran packages would only be in output$grans[[x]]$original (but not output$grans[[x]]$deps, if we don't support those nonstandard DESCRIPTION fields for now). Those non-cran packages need special treatment anyway for export_granlist (probably they will be needed to install the last, preferably being cached).

When getting Sysreqs, the packages in noncran_pkgs need to be separated from CRAN packages. For CRAN packages, do the usual remotes::system_requirements thing.

For gh packages, we need to get their DESCRIPTION again (or if we can, cache the DESCRIPTION file from the previous step) and do this:

https://github.com/r-lib/remotes/blob/88fdc4eb6e64a02528d7289e1cdda6948027c301/R/system_requirements.R#L66-L88

@schochastics
Copy link
Collaborator

Thanks I'll try to get this done.

Different question:
how would you indicate a gh package when providing a list of packages to resolve?
My current implementation is to interpret everything with a "/" as coming from github

resolve(c("rtoot","schochastics/rtoot"))

calls .get_snapshot_dependencies_cran() for rtoot and .get_snapshot_dependencies_gh() for schochastics/rtoot. Not sure if this is the best way, but certainly the simplest?

@chainsawriot
Copy link
Collaborator Author

@schochastics

Slash is fine.

chainsawriot added a commit that referenced this issue Feb 9, 2023
adding support for github packages (#22)
@chainsawriot
Copy link
Collaborator Author

devtools <= 1.5 should use repo and username separately (as older versions, e.g. version < 1, only support username). While > 1.5 (which is 1.6.1 onwards, i.e. snapshot_date >= 2014-10-07) should use username/repo because username is deprecated.

@schochastics
Copy link
Collaborator

ah I remember that change! I can fix that.

@chainsawriot
Copy link
Collaborator Author

Archiving GH packages

https://api.github.com/repos/schochastics/rtoot/tarball/50420ed

And then R CMD BUILD it?

@schochastics
Copy link
Collaborator

I guess that way we could get around the devtool issue?

@schochastics
Copy link
Collaborator

One can install that tarball directly?!?!

R> install.packages("~/Downloads/schochastics-rtoot-v0.2.0-11-g50420ed.tar.gz")
Installing package into ‘/home/david/R/x86_64-pc-linux-gnu-library/4.2’
(as ‘lib’ is unspecified)
inferring 'repos = NULL' from 'pkgs'
Warning in untar2(tarfile, files, list, exdir, restore_times) :
  skipping pax global extended headers
* installing *source* package ‘rtoot’ ...
** using staged installation
** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
*** copying figures
** building package indices
** installing vignettes
** testing if installed package can be loaded from temporary location
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (rtoot)

Something tells me that this is probably not a good idea

@chainsawriot
Copy link
Collaborator Author

@schochastics Yes. The R CMD BUILD step afterwards is simply for getting rid of the unnecessary files as specified in .RBuildignore, build vignette and checking and all those sundries. It is not really necessary for many (well-developed) packages such as rtoot.

My proposal above was mainly for dealing with the cache option of dockerize. But if it can be generalized and avoid the need for devtools/remotes just for the install_github that would be tremendously helpful.

And this is the super hacky version of devtools::install_github without any dependency for inside the container.

(R has a function for untarring, but it is super buggy before R 4. I have direct bad experience about it.)

(This also inspires me that the limitation of R > 2.1 #14 can be eliminated by doing stupid thing like: system(command = paste("R CMD INSTALL", tarball_path, sep = "")). Again, system2 is nicer but is a recent phenomenon.)

pkg <- "schochastics/rtoot"
sha <- "50420ed"
x <- tempfile(fileext = ".tar.gz")
y <- tempdir(check = TRUE)
download.file(paste("https://api.github.com/repos/", pkg, "/tarball/", sha, sep = ""), destfile = x) ## one concern is that Woody can't do proper https authentication; but actually http works as well
system(command = paste("tar", "-zxf ", x, "-C", y))
system(command = paste("R", "CMD", "build", list.dirs(path = y, recursive = FALSE))) # There can be multiple directories if y is reused.
## TODO: Need a way to generate `tarball_path`
tarball_path <- "rtoot_0.2.0.9000.tar.gz"
install.packages(tarball_path, repos = NULL)
unlink(tarball_path)

It also brings us to another issue: Should we store x, x_version as usual for GH packages, i.e. package name and version as per DESCRIPTION? So that we can generate tarball_path as usual. It is also beneficial for cases such as igraphUtils / netUtils.

We can store x, "schochastics/rtoot" and sha somewhere else, e.g. my suggestion: cranlist$noncran_pkgs as a vector/dataframe.

## `type` can be extended to "bioc", "local"
## handle can be github path, local path, or bioc package name
## local probably doesn't need ref, bioc might store version as ref.

data.frame(x = c("rtoot", "igraphUtils"), type = c("github", "github"), handle = c("schochastics/rtoot", "schochastics/netUtils"), ref = c("50420ed", "5e2f3ab"))

Another way is to stay as it is now and look at the DESCRIPTION once again in y to to get the ACTUAL name and version during the container building time.

@schochastics
Copy link
Collaborator

schochastics commented Feb 9, 2023

Just finished almost exactly the same hack-ish solution.
If we can avoid certain issues with system, why not use it? The "R CMD" stuff is probably the most stable thing we have?

one could get the tar file created like this, but no idea how stable this really is:

res <- system(command = paste("R", "CMD", "build", list.dirs(path = y, recursive = FALSE)),intern = TRUE) 
tar_file_line <- res[grepl("*.tar.gz",res)]
tar_file_line
# regex to extract the tar.gz file

I will work on this a bit more. devtools is a bit of a pain with its dependencies

@schochastics
Copy link
Collaborator

We can store x, "schochastics/rtoot" and sha somewhere else, e.g. my suggestion: cranlist$noncran_pkgs as a vector/dataframe.

I will toy around with this, but I noticed that it becomes complicated quickly to drag along everything. The sha is enough to recreate what we need, though maybe in a cumbersome way.

Creating that dataframe might however be helpful just as a reference

@schochastics
Copy link
Collaborator

This can deal with pkg renaming. Obviously still needs some error handling

pkg <- "schochastics/igraphUtils"
sha <- "1b601a3"

x <- tempfile(fileext = ".tar.gz")
y <- tempdir(check = TRUE)
download.file(paste("https://api.github.com/repos/", pkg, "/tarball/", sha, sep = ""), destfile = x)
system(command = paste("tar", "-zxf ", x, "-C", y))
dlist <- list.dirs(path = y, recursive = FALSE)
pkg_dir <- dlist[grepl(sha, dlist)] # the sha allows to identify the dir uniquely 
res <- system(command = paste("cd ", y, " && R", "CMD", "build", pkg_dir), intern = TRUE)
tar_file_line <- res[grepl("*.tar.gz", res)]
flist <- list.files(y, pattern = "tar.gz", recursive = FALSE)
tarball_path <- paste0(y, "/", flist[vapply(flist, function(x) any(grepl(x, res)), logical(1))])
install.packages(tarball_path, repos = NULL)
unlink(tarball_path)

@chainsawriot
Copy link
Collaborator Author

So, let's make it like that in header.R for now.

We don't need a lot of error handling in the container building part. It's better to err when things go wrong there.

chainsawriot added a commit that referenced this issue Feb 9, 2023
Github support without devtools (#22)
@chainsawriot
Copy link
Collaborator Author

5291eae

@chainsawriot
Copy link
Collaborator Author

This is the "for the sake of argument" test case:

x <- resolve("cran/sna", "2005-05-01")

It will generate the earliest supported version of R (2.1.0) but with Github.

@schochastics
Copy link
Collaborator

schochastics commented Feb 13, 2023

This happens in the v0.1 branch when dockerizing the above

FROM debian/eol:
ENV TZ UTC
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone && apt-get update -qq && apt-get install wget locales build-essential r-base-dev  -y
COPY rang.R ./rang.R
COPY compile_r.sh ./compile_r.sh
RUN apt-get update -qq && apt-get install -y libfreetype6-dev libgl1-mesa-dev libglu1-mesa-dev libicu-dev libpng-dev make pandoc zlib1g-dev
RUN bash compile_r.sh 2.1.0
CMD ["R"]
Sending build context to Docker daemon  10.75kB
Step 1/8 : FROM debian/eol:
invalid reference format

looks like debian_version is missing

@chainsawriot
Copy link
Collaborator Author

Yes https://github.com/chainsawriot/rang/tree/fixrang

@chainsawriot
Copy link
Collaborator Author

The github download is not possible inside Woody. We need to warn the users and ask them to cache instead.

@chainsawriot
Copy link
Collaborator Author

And really old packages can't be built on modern R, e.g. cran/sna.

Need to download them now, transfer them inside the container, and built there instead. (So complicated...)

@chainsawriot
Copy link
Collaborator Author

I think this is done.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants