Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

GET /datasets?myData=false for HTTPS items appears invalid #304

Open
bodom0015 opened this issue May 24, 2019 · 1 comment
Open

GET /datasets?myData=false for HTTPS items appears invalid #304

bodom0015 opened this issue May 24, 2019 · 1 comment

Comments

@bodom0015
Copy link
Member

bodom0015 commented May 24, 2019

This may be a nothing issue, as I don't expect HTTPS files/datasets to be a super common case. I noticed when working on the Data Catalog refactoring that the GET from /datasets?myData=true appears to return slightly more information than /datasets?myData=false for these items.

Perhaps I am just misunderstanding the behavior here as designed, but this definitely seems like a bug.

Steps to Reproduce

  1. Register the following file via URL: https://raw.githubusercontent.com/whole-tale/dashboard/master/.travis.yml
  2. Navigate to Swagger UI and execute GET /dataset?myData=false
    • You will get back a result similar to the following:
  {
    "_id": "5ce7154cdf11f7f5db93e7f4",
    "_modelType": "folder",
    "created": "2019-05-23T21:49:00.645000+00:00",
    "creatorId": "5cdde4ca84f03ea7329bab0d",
    "description": "",
    "identifier": "https://raw.githubusercontent.com",
    "name": "raw.githubusercontent.com",
    "provider": "HTTPS",
    "size": 0,
    "updated": "2019-05-23T21:49:00.648000+00:00"
  }
  1. Toggle myData and execute GET /dataset?myData=true
    • You will get back a result similar to the following:
  {
    "_id": "5ce7154cdf11f7f5db93e7f8",
    "_modelType": "item",
    "created": "2019-05-23T21:49:00.672000+00:00",
    "creatorId": "5cdde4ca84f03ea7329bab0d",
    "description": "",
    "identifier": "https://raw.githubusercontent.com/whole-tale/dashboard/master/.travis.yml",
    "name": ".travis.yml",
    "provider": "HTTPS",
    "size": 423,
    "updated": "2019-05-23T21:49:00.689000+00:00"
  }

Actual Results
Many of the fields of the dataset are either invalid or not populated at all if myData=false, but all fields look normal when myData=true. In fact, the endpoint appears to be returning entirely different models. It is therefore impossible to properly display any actual information about this dataset, since we don't know it's name, size, or actual location.

Expected Results
If possible, field output should match for these two cases, since the datasets themselves are being filtered.

@Xarthisius
Copy link
Collaborator

Xarthisius commented May 28, 2020

It is an expected behavior. A little bit off rationale behind how http(s) files are handled in the catalog can be found in #266 The difference between myData=True and myData=False stems from the fact that in the former case we are able to track which particular files were registered by the user and craft the response accordingly. In the latter case /dataset?myData=False returns the content of the catalog's root, which in case of http resources, results in folders with domain names (per #266)

Many of the fields of the dataset are either invalid or not populated at all if myData=false, but all fields look normal when myData=true. In fact, the endpoint appears to be returning entirely different models. It is therefore impossible to properly display any actual information about this dataset, since we don't know it's name, size, or actual location.

Not sure what you mean by "entirely different" models. All the fields pasted above are exactly the same. We know the name, since it's object.name, we know the size, it's object.size (or rather we don't in your paste above but that's ok. Sometimes it's not possible to know the size of an entire collection, because that info may not be available or costly to compute. Globus datasets are the latter case, and AFAIR after registration their size is set to -1. As long as it integer it shouldn't be an issue for the UI). I don't know what do you mean by "location", but I guess you're looking for object.identifier?

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants