Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

assert_here() #27

Closed
jennybc opened this issue Aug 24, 2018 · 11 comments · Fixed by #55 or #59
Closed

assert_here() #27

jennybc opened this issue Aug 24, 2018 · 11 comments · Fixed by #55 or #59

Comments

@jennybc
Copy link
Member

jennybc commented Aug 24, 2018

It would be interesting to consider an assert_here() function.

Inspired by this Twitter conversation with Emile Latour (@emilelatour).

Problem: you use Projects and the here package and life is awesome. But your collaborator still clicks on an Rmd in the Finder and it just opens in RStudio, in whatever Project (or non-Project) they happen to have open. Now all the path-building code doesn't work as planned when they run chunks interactively (knitting the whole document still works fine for them).

If you want to make a .R or .Rmd file more resilient to this and you feel like the Project's name is pretty stable, you could put here::assert_here("PROJECT_NAME") at the top as a way of raising an alarm when the workflow violates expectations.

assert_here() would be something like assertthat::assert_that(identical(basename(here::here()), “PROJECT_NAME”)).

@nacnudus
Copy link

A collaborator just fell into this trap. See the RStudio behaviour when opening by file association:

When launched through a file association, RStudio automatically sets the working directory to the directory of the opened file. Note that RStudio can also open files via associations when it is already running—in this case RStudio simply opens the file and does not change the working directory.

In their case they wound up in their home directory, which seems a good candidate for a message: Did you mean to set the working directory to your home directory? Check here for common solutions [url]

@krlmlr
Copy link
Member

krlmlr commented Apr 17, 2019

We could also look for a specific file/directory in the project, and give a nicer error message. Need to mull over it.

@krlmlr
Copy link
Member

krlmlr commented Nov 9, 2020

I prefer solving this with documentation.

@krlmlr
Copy link
Member

krlmlr commented Nov 12, 2020

I added a section to the "Getting started" article, feedback welcome.

@nacnudus
Copy link

Thinking about this again, any solution appears to make library(here) redundant.

I'm not sure what to suggest. Perhaps if here("foo.csv") checks for the existence of foo.csv, and if it can't be found, prints a warning that includes the current working directory as a clue.

@krlmlr
Copy link
Member

krlmlr commented Nov 12, 2020

I'm not sure about the redundancy.

here("foo") works even if the file doesn't exists, it's easy for the user to check that the file doesn't exist (and any ingestion function will fail too).

I think this is about failing early, failing fast, failing clearly. Does the text at https://here.r-lib.org/articles/here.html#enforce-project-oriented-workflow-1 need more clarification?

@nacnudus
Copy link

Even if you always work inside a project, your collaborators may not be aware. here() falls back to the current working directory when no project can be found. This means that your code does not find the correct files even if you use here() everywhere.

I don't think this describes the situation in this thread, but I might not have fully understood it. I think the steps to reproduce this behaviour are as follows.

.
├── proj1
│   ├── data1.csv
│   ├── .here
│   └── script1.R
└── proj2
    ├── data2.csv
    ├── .here
    └── script2.R
# proj1/script1.R
getwd()
library(here)
file.exists(here("data1.csv"))
# proj1/script2.R
getwd()
library(here)
file.exists(here("data2.csv"))
  1. Close all instances of RStudio.
  2. Open proj1/script1.R in a new instance of RStudio by file association. RStudio automatically sets the working directory to proj1.
  3. Run the script proj1/script1.R. library(here) doesn't change the working directory, because it's already correct.
  4. Open proj2/script2.R by file association. It opens in the existing RStudio session, and RStudio doesn't change the working directory.
  5. Run the script proj2/script2.R. library(here) does nothing, because the package has already been loaded. file.exists("data2.csv") returns FALSE because the working directory is proj1, not proj2.

The advice in the documentation is to use a project-oriented workflow, along with some defensive code. I think this advice makes the use of the {here} package redundant, because:

  1. If you use projects, RStudio sets the working directory correctly without the need for library(here).
  2. If you don't use projects, library(here) isn't reliable (as demonstrated above), so defensive code has to be written anyway.

I do think a project-oriented workflow should be encouraged when all collaborators use RStudio, and {here} should be encouraged when no collaborators use RStudio. But I would find it difficult to persuade people to use {here} if they still have to write defensive code, hence my suggestion to make here() itself more defensive -- although even that doesn't seem like a great solution.

@krlmlr
Copy link
Member

krlmlr commented Nov 12, 2020

Thanks. Your described workflow is consistent with my understanding of the problem. Both the starting directory and here's notion of the project root could be anything. Scripts that rely on {here} will fail in varying degrees -- from failure to open a file for reading to silent failure after writing a file to an unexpected location. Ideally all scripts using {here} would fail loudly.

  1. The working directory may not be the root of the project, see https://here.r-lib.org/articles/rmarkdown.html . This was the motivation for implementing {rprojroot} and the {here} wrapper. I agree that if you can guarantee that the w.d. is the project root you don't need {here} -- in practice this is "sometimes" true but not always.
  2. Agree that defensive code makes sense. I decided to solve with documentation for now, I could be persuaded to implement a function that makes these checks simpler.

This coincides with the problem that there's no well-defined way for an R script to know its own path. The kimisc and rprojroot packages have functions that now moved to {whereami} (thanks @yonicd), and {this.path} seems to reimplement this (on CRAN since yesterday).

The suggested assert_here() (or maybe declare_here() or i_am_here() or ...) could fill that gap, and {whereami} and the other functions could use this info to find the active script more reliably if the info has been set?

@krlmlr krlmlr reopened this Nov 12, 2020
@nacnudus
Copy link

nacnudus commented Nov 12, 2020

Thanks for reconsidering this. Maybe another option would be for the contents of .onLoad() to be exported, so that every script can reassert its location, regardless of the prior state of the R session. Bad idea, sorry, because there's no way to know the path of the script being run.

@krlmlr krlmlr mentioned this issue Nov 13, 2020
krlmlr added a commit that referenced this issue Nov 15, 2020
- New `here::i_am()` offers a new recommended way to declare the project root. Instead of relying on special files or directories that indicate the project root, each script and report now can declare its own location relative to the project root (#27).
@jennybc
Copy link
Member Author

jennybc commented Dec 23, 2020

I was reading this, among other things, as part of digesting the recent changes to here. This thread was helpful for getting re-oriented to the safety problem, which I do acknowledge (and which motivated me to open the issue). The concrete demo @nacnudus gives really describes what I see as the heart of this matter.

However, I do want to push back on this:

The advice in the documentation is to use a project-oriented workflow, along with some defensive code. I think this advice makes the use of the {here} package redundant, because:

If you use projects, RStudio sets the working directory correctly without the need for library(here).

That only holds as long as you don't have anything you plan to knit in a subdirectory of the project. Once you've got an .Rmd or a .R file you want to spin in a subdirectory, you have a path-building predicament that the here package solves. RStudio sets the working directory to the top-level of the project, which I think is correct. But the "Knit" button will render in an R process where the working directory has been set to wherever the .Rmd or .R lives. This is one of the main motivations for the here package, as I see it.

@github-actions
Copy link

This old thread has been automatically locked. If you think you have found something related to this, please open a new issue and link to this old issue if necessary.

@github-actions github-actions bot locked and limited conversation to collaborators Dec 23, 2021
# for free to subscribe to this conversation on GitHub. Already have an account? #.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants