Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[Feature Request] Automatic installation of packages detected via explicit package::function() calls in quarto and R projects #712

Open
fretwurst opened this issue Nov 8, 2024 · 1 comment

Comments

@fretwurst
Copy link

Problem

In CI/CD-Workflows, particularly for Quarto-based projects, missing package installations often cause rendering processes to fail. This is a common issue when:

  • Packages are used explicitly via package::function() in .qmd files and are not pre-installed in the CI/CD environment.
  • CI/CD pipelines (e.g., with Rocker Docker containers) halt after encountering the first missing package, leading to time-consuming debugging in large projects.

For Quarto projects, active chapters defined in _quarto.yml (under chapters or render) often determine the relevant .qmd files to render. Detecting and pre-installing the packages used in these files before rendering could significantly streamline the CI/CD workflow.


Proposed Solution

Enhance pak with a feature to:

  1. Scan project files (e.g., .qmd, .Rmd, .R) for all explicitly used packages:
    • Detect package::function() calls.
    • Optionally scan for pak::pkg_install() or pak::pkg() calls in file headers.
  2. Support Quarto project workflows:
    • Read _quarto.yml to identify active .qmd files (chapters or render keys).
    • Install all required packages before rendering begins.
  3. Install missing packages efficiently:
    • Use pak's parallelized installation and caching to minimize installation time in CI/CD pipelines.
    • Avoid breaking on the first missing package.

Best Practice Alignment

Modern R style guides, such as the Google R Style Guide and the RStudio Tidyverse Style Guide, recommend using explicit package::function() calls over loading packages globally. This approach improves:

  • Clarity: The source of each function is immediately clear.
  • Conflict avoidance: Prevents naming conflicts between functions in different packages.
  • Modularity: Ensures code runs independently of preloaded packages.

Given this trend, tools like pak should support workflows where packages are explicitly referenced, especially in CI/CD contexts where no preloaded environment exists.

#### Example Workflow
A new `pak` function, such as `pak::install_quarto_deps()`, could streamline this process:

```r
# Automatically scan a Quarto project and install dependencies
pak::install_quarto_deps(yml = "_quarto.yml")

This function would:

  • Parse _quarto.yml to identify active .qmd files.
  • Extract all packages used via package::function() in these files.
  • Install any missing packages before rendering.

Alternatively, a more general function like pak::scan_and_install() could be used for non-Quarto workflows:

# Scan an arbitrary folder for used packages and install them
pak::scan_and_install(path = ".", pattern = "\\.qmd$")

Benefits

  1. Streamlined CI/CD Pipelines:
    Avoid pipeline failures due to missing packages by ensuring all dependencies are installed in advance.

  2. Efficiency for Large Projects:
    Automatically handle dependency management for Quarto projects with multiple .qmd files and dynamic dependencies.

  3. Modern Style Alignment:
    Supports best practices by enabling workflows where package::function() is preferred over global package loading.

  4. Broader Use Case:
    While the focus is on Quarto projects, this feature could benefit RMarkdown users or anyone working with R scripts in CI/CD environments.

  5. Optimized for Docker:
    By leveraging pak’s caching and parallelized installation, it minimizes time and resources in containerized environments.

@gaborcsardi
Copy link
Member

This is already happening here: r-lib/pkgdepends#390

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants