-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
drive_find() does not always return exactly the same files #288
Comments
If the exhaustiveness of paginated results is not 100% guaranteed (I believe this may be true), here is one user behaviour that will at least help:
If targeting 1 file, this means specifying the file by ID (or anything that contains the ID), as opposed to by name. If using These measures dramatically reduce the reliance on combining a large number of paged results. |
I came across this issue while attempting to use Google Drive as a collaborative folder for a project I am working on. I am providing some details about my experience with the hopes that they may help someone with more coding expertise than I possess to address the root cause of the issue. I am using drive_find to create a list of files that with names fitting a criteria using the "pattern = XXX". Based on the above comment, I realize this is not ideal, but I have not had much luck with the q clause since the part of the file name I am interested in is at the end of the file name. Using this approach to search through files that have been shared with me, I get a tibble of files that varies in length nearly every time I try it. However, when I try this approach focused on files where I am the owner, I get consistent results. The most major caveat to my results so far is that many more files were present to sift though in the shared file search than in the owned file search. I will be examining this issue further moving forward. |
@tpbarrette You can use |
@jennybc Thanks for the tip. Unfortunately, the combination of In regards to the latter part of my previous comment, have you noticed any difference in the |
I haven't studied that directly. But it's true that the files that went missing for me in |
I used q to narrow files to Google Sheets, and then pattern to search among them. With ~500 files, it only took a few seconds to complete. |
Just FYI, googlesheets4 has a wrapper that does exactly this: https://googlesheets4.tidyverse.org/reference/sheets_find.html |
I think my situation may be related to this issue. (If not, let me know and I'll open a separate issue.) I have a directory of files that I search with Here's the weird part: if I go to Google Drive and open up a file that is missing from the I wish I could create some kind of reprex here. I can reliably get files to become "found" by opening them and closing them. But I can't predict which files will be found on the first go in any given week. UPDATE: Wait, I can get the number to change. The reason? Because the same set of files is getting found, but some files are found more than once! |
I'm basically convinced that unfiltered file listing is always going to be vulnerable to the results not being exhaustive. I think this is just a property of the server, i.e. Google's back end. And yeah, it seems to have something to do with whether you own the file or its shared with you and how recently anyone has created/modified/viewed the file. So if it's very, very important to get an exact set of files, you really need a workflow based on file IDs or on a precise @VectorPosse if you want to find the files in a specific folder, I advise that you use
In there most recent version of googledrive (v1.0.1), we should be eliminating those duplicates before returning results because of this commit: So maybe you need to update googledrive? Otherwise, I would be very interested to see a case of duplicated file IDs in your result. Remember, it is possible to have multiple distinct files with the same name. |
Thanks, @jennybc. I ended up using |
I feel like we've ruminated on this as much as we can and there's really nothing to be done about it. The main advice is to make one's queries as specific as possible, in ways that will route through the I'm also not hearing any complaints about this lately, so maybe the vexing behaviour has diminished or been eliminated on the Google side. |
FYI this issue still persists while using drive_ls on a shared drive with multiple layers of subfolders and/or files that are nto owned by me with the option recursive=TRUE. I could not get it to give consistent results even by using a q clause. |
I have also experienced this with a large number of subfolders and the recursive option. My solution has been to run drive_ls multiple times and take the superset of the results. This is very slow, however -- just running drive_ls once in my case is extremely slow. I have not been able to speed things up by using q to specify queries so I suspect it's the scope and recursiveness that makes this a problem. The only thing that works is to eliminate the recursiveness. This is much, much faster, but would require adding a lot of complexity to my code and eliminates the ability to do something important in my particular use case (checking for true duplicate or duplicately named files that could be in other folders). It's frustrating that something I'd expect to be really fast in a normal file system is so slow with Google Drive! |
The more difficult inverse problem addressed by e56b3f5. In that commit, I check for and eliminate any results (file IDs) that appear in more than 1 page. But I fear the opposite can also happen: there can be files that appear in no page.
This is basically a re-opening of #277.
I have seen this myself now. It's hard to make a clean reprex, as the results are stochastic.
Summary: I can do
drive_find()
repeatedly and get a different number of files. Most recently, with my personal Google identity, I saw 972, 988, 988, and 988 files. (Note to self: In 3 out of 4 requests, I saw higher numbers reported in "Items so far", so the filter for duplicates is actually eliminating replicated IDs.)The "missing" 988 - 972 = 16 files are a subset of the 22 files in one specific folder.
I will attempt to create a clean example of this using curl alone (no R) and, if successful, report/ask upstream.
The text was updated successfully, but these errors were encountered: