Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[covr.record_tests] segfault when tracing R6 active bindings to parquet data #573

Open
dgkf opened this issue Jul 10, 2024 · 1 comment
Open

Comments

@dgkf
Copy link
Contributor

dgkf commented Jul 10, 2024

Note

This is in code I introduced and only affects very specific scenarios when options(covr.record_tests) is enabled.
I plan to fix this

Example at dgkf/covr-r6-active-binding-test

git clone https://github.com/dgkf/covr-r6-active-binding-test
cd covr-r6-active-binding-test
R -e 'options(covr.record_tests = TRUE)' -e 'covr::package_coverage()'

Details

From what I can tell, there are a few key features of this package that culminate in this bug:

  • An R6 active binding is included in the test call stack
    • For reasons that are still unclear to me, the sys.calls() captured in update_current_test(), when capturing an active binding, includes the R function, not just the call. Critically, this function brings its calling environment.
  • When the R6 object environment includes a parquet dataset (I'm assuming due to the way arrow works with the in-memory representation of the data), this environment segfaults when serialized to disk.

Proposed solution

  • Will first try to drill into why this is grabbing a function. There might be a more holistic way to avoiding this altogether.
  • If there's no better way to avoid functions in the call stack, then before recording the call stack, if a frame contains a function, set its environment to emptyenv() so it can be serialized.
@dgkf
Copy link
Contributor Author

dgkf commented Jul 11, 2024

Just following up to say that I have a safeguard implemented in dgkf/covr@573-record-tests-arrow-segfault.

I can open a PR whenever you'd like, but I'd prefer to test it in some more real-world applications first to make sure there aren't any severe performance implications and that no necessary data gets thrown out.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant