Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Using pydrive with user credentials for authenticated download #3

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

jeremyfix
Copy link

Unfortunately, when using your code, an anonymous download is performed and I tried several consecutive days, I always got an exceeded quota error making me unable to download the dataset.

This pull requests, which uses code adapted from the FFHQ-Aging repo is using user credentials for downloading the dataset.

The only requirement is to follow the pydrive quickstart for getting the client_secrets.json file placed in the same directory than download_ffhq.py and you can then indicate you want to use pydrive google authentication by appending the --pydrive command line option.

So for example, for downloading the 1024x1024 images, you simply :

python3 download_ffhq.py -i --pydrive

In the code, several attempts are tried to download a file. Without that code, inspired by yours, I got some httplib2.error.ServerNotFoundError: Unable to find the server at www.googleapis.com being raised. Apparently, retrying the download a second time and the exception is not raised.

I only tested the download of the images (the command line above) but as the other downloads go through the download_files function, I hope it works as well for the other downloads.

@jeremyfix
Copy link
Author

jeremyfix commented Apr 22, 2021

Note that, for some reasons, after some times (like hours), it may try to reauthenticate and it ends as a failure but relaunching the script and it continues downloading;

I successfully downloaded the 90 GB of the 1024x1024 images this way.

@mmazeika
Copy link

mmazeika commented Apr 12, 2022

This was very helpful for me. I was able to download the 89GB of 1024x1024 images with a restart after a few hours. As an additional step, I had to replace

# Google Drive virus checker nag.
links = [html.unescape(link) for link in data_str.split('"') if 'export=download' in link]
if len(links) == 1:
    if attempts_left:
        file_url = requests.compat.urljoin(file_url, links[0])
        continue

with

# Google Drive virus checker nag.
file_id = re.findall('uc\?id=(.*)&amp', data_str)
if len(file_id) == 1:
    file_id = file_id[0]
    if attempts_left:
        file_url = 'https://www.googleapis.com/drive/v3/files/{}/?key=API_KEY&alt=media'.format(file_id)
        continue

This is because the virus checker page changed, so the code for handling it doesn't work anymore. To make this work, I had to follow the instructions in the pydrive quickstart link given above (i.e., use this PR and get a client_secrets.json from the Drive API). The new virus checker workaround uses an API key that you can create in a GCP API project, similar to how you get the client_secrets.json file. You can also use the OAuth key.

I had to run the download script with the --cmd_auth flag and use a "Desktop" instead of "Web application" setting in the Drive API to make it work. Here is a screenshot of my Drive API page.
Screenshot from 2022-04-12 18-39-19

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants