Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Bad error message when loading private dataset #3855

Closed
patrickvonplaten opened this issue Mar 8, 2022 · 2 comments
Closed

Bad error message when loading private dataset #3855

patrickvonplaten opened this issue Mar 8, 2022 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@patrickvonplaten
Copy link
Contributor

Describe the bug

A pretty common behavior of an interaction between the Hub and datasets is the following.
An organization adds a dataset in private mode and wants to load it afterward.

from transformers import load_dataset

ds = load_dataset("NewT5/dummy_data", "dummy")

This command then fails with:

FileNotFoundError: Couldn't find a dataset script at /home/patrick/NewT5/dummy_data/dummy_data.py or any data file in the same directory. Couldn't find 'NewT5/dummy_data' on the Hugging Face Hub either: FileNotFoundError: Dataset 'NewT5/dummy_data' doesn't exist on the Hub

even though the user has access to the website NewT5/dummy_data since she/he is part of the org.

We need to improve the error message here similar to how @sgugger, @LysandreJik and @julien-c have done it for transformers IMO.

Steps to reproduce the bug

E.g. execute the following code to see the different error messages between transformes and datasets.

  1. Transformers
from transformers import BertModel

BertModel.from_pretrained("NewT5/dummy_model")

The error message is clearer here - it gives:

OSError: patrickvonplaten/gpt2-xl is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo with `use_auth_token` or log in with `huggingface-cli login` and pass `use_auth_token=True`.

Let's maybe do the same for datasets? The PR was introduced to transformers here:
huggingface/transformers#15261

Expected results

Better error message

Actual results

Specify the actual results or traceback.

Environment info

  • datasets version: 1.18.4.dev0
  • Platform: Linux-5.15.15-76051515-generic-x86_64-with-glibc2.34
  • Python version: 3.9.7
  • PyArrow version: 6.0.1
@patrickvonplaten patrickvonplaten added the bug Something isn't working label Mar 8, 2022
@patrickvonplaten patrickvonplaten changed the title Bad error message Bad error message when loading private dataset Mar 8, 2022
@lhoestq
Copy link
Member

lhoestq commented Mar 8, 2022

We raise the error “ FileNotFoundError: can’t find the dataset” mainly to follow best practice in security (otherwise users could be able to guess what private repositories users/orgs may have)

We can indeed reformulate this and add the "If this is a private repository,..." part !

@mariosasko
Copy link
Collaborator

Resolved via #4536

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants