Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Error with drive_get() #281

Closed
bshor opened this issue Oct 14, 2019 · 10 comments
Closed

Error with drive_get() #281

bshor opened this issue Oct 14, 2019 · 10 comments

Comments

@bshor
Copy link

bshor commented Oct 14, 2019

I was using googledrive before the upgrade to version 1.0 successfully to access and download google sheets as Excel files. But now with version 1.0 (I'm running R 3.6.1 on Windows), I'm getting a difficult to understand error.

title <- "D State Legislative Pres Endorsements 2020"
test <- drive_get(title)

And I get this error:

Error in add_id_path(nodes, root_id = root_id, leaf = leaf) : 
  !anyDuplicated(nodes$id) is not TRUE

Any ideas of what could be going on?

@jennybc
Copy link
Member

jennybc commented Oct 15, 2019

Sounds similar to #279 #277 #272

@jsstanley
Copy link

jsstanley commented Nov 30, 2019

I'm also getting the same error, albeit very intermittently and randomly despite running the same code every time:

for (i in 1:nrow(statementsList)) {

  currentSheetName <- as.character(statementsList[i, 1, drop = T])
  
  print(paste0('Deleting sheet: ', 
               currentSheetName))
  
  drive_trash(currentSheetName)
  
}

The drive_trash() line seems to be the problem.

@jwbenning
Copy link

jwbenning commented Dec 4, 2019

I'm getting the same error, but there's some more info that's perhaps helpful. So when I run:
herbMaster_gs <- drive_get("Herbivory_Individual"), I get the error:
Error in add_id_path(nodes, root_id = root_id, leaf = leaf) : !anyDuplicated(nodes$id) is not TRUE

When I search for "Herbivory_Individual" in my Drive, only the Google Sheet (that I'm trying to access) is returned. However, in R, when I search:
drive_find(pattern = "Herbivory_Individual"), this finds lots of items:
Items so far: 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400

Any idea what's up, and what all these items could be?

@bshor
Copy link
Author

bshor commented Dec 4, 2019

The solution I found was to delete a file that originally began as a duplicate of the drive document I wanted that had a nearly identical name. Once I did that, drive_get() worked with no problem.

Here is the SO answer that inspired me.

Searches with drive_find() take too long as you discovered ( I quit after a couple of thousand documents) and I don't use them.

@jennybc
Copy link
Member

jennybc commented Dec 5, 2019

Some background on what drive_find() is messaging about and how to make it fast:

  • The pattern = argument is implemented locally. So we recursively fetch data for your whole Drive, then filter on pattern =. I too learned during development that I have access to a shocking number of files on Drive. Fetching all of this is what "Items so far ..." refers too. And yes it can be slow.
  • The fast way to filter is to use the q = clause because that is done on the server-side. The documentation for drive_find() does mention this and includes some examples.

https://googledrive.tidyverse.org/reference/drive_find.html#search-parameters

https://googledrive.tidyverse.org/articles/articles/file-identification.html

https://developers.google.com/drive/api/v3/search-files

@jwbenning
Copy link

@bshor unfortunately that solution isn't working for me...so drive_get is still not working when I supply the name of the spreadsheet. It does work when I supply the sheet URL:

kk <- "https://docs.google.com/spreadsheets/d/1mau6LUz8tWcgXTs6zwHo7Na5TyBkhPzzkfup7zSWioM/edit#gid=938724274"
herbMaster_gs <- drive_get(kk)

So it works, but it definitely would be nice to be able to refer to the spreadsheets by name instead of URL.

@AllysonS
Copy link

AllysonS commented Jan 9, 2020

Deleting the file with a similar name is not an option for me either - one file is a spreadsheet with original data and the other is a companion document containing metadata. They have similar names so that we can easily tell which files go together. And we have a lot of these files.

The drive_get() error started for me when I updated the googledrive package to v 1.0.0.

As @jennybc suggested, using the q = function in drive_find() greatly speeds along the search process so I used that to work around the drive_get() issue.

### simple version
file <- drive_find(q = "name = 'R02_dieback_2019-10-28_raw'") # equals seems to return the file with that exact name

# alternate approach
file <- drive_find(q = "name contains 'R02_dieback_2019-10-28_raw'",                                        
                   q = "not name contains 'metadata'") 

### version with generic object to make process repeatable with multiple files
fn <- "R02_dieback_2019-10-28_raw" 
file <- drive_find(q = paste("name = '", fn, "'", sep = "")

From here I can go ahead with drive_download() and just skip using drive_get().

There are more ways to search for other file characteristics using q listed here: https://developers.google.com/drive/api/v3/search-files

@jennybc
Copy link
Member

jennybc commented Jan 14, 2020

I still have yet to experience this phenomenon or get enough data to truly study it.

But I have formed an untestable hypothesis about the root cause and installed a fix 🤞

Needless to say, please open a new issue if you update to this dev version and still see the phenomenon.

@bshor
Copy link
Author

bshor commented Feb 3, 2020

I thought I'd fixed it as I described above, but I got the anyDuplicated error again (this is on googledrive 1.0.0). I tried drive_find with a q and a pattern option, and it was much faster and worked without error.

@jennybc
Copy link
Member

jennybc commented Feb 3, 2020

In the development version of googledrive, there is a fix for the anyDuplicated error (e56b3f5). But I now believe there is a general problem, from the Google side, re: exhaustively listing files (#288). One conclusion from all of these investigations is that when accuracy and performance become very important, you should maximize your use of the q clause for narrowing search on the server side.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants