-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Report figshare data version in notebook output #44
Comments
@dhimmel and/or @cgreene Do you have any thoughts on the best way to handle versioning within the data loader? We currently use |
The data from figshare has versions. Therefore, it'd be ideal to specify a version and then download everything we need corresponding to that version. This is what machine-learning currently does. The What data is needed from GitHub? We should just upload that to figshare so it can use the common versioning system. |
@dhimmel / @gwaygenomics : is this complete? I think that the ml-workers appear to be downloading whatever the latest figshare version is. Does that get reported to the users? |
I don't think it does. I am not sure whether core-service is even storing which figshare version is loaded. The source code for downloading the data is: core-service/api/management/commands/acquiredata.py Lines 21 to 39 in b9b2e4f
So it's using the latest from GitHub for all files besides BTW the figshare has been downloaded 41,471 times. Either people are using this a lot (or more likely we're requesting it an insane number of times 😸 |
If we could reconstruct those URLs and put them into the notebook template, that's probably the best way. We'd like users to be able to reproduce the analysis and I think this key ingredient (the exact right data) is missing. |
Track which version of the data (figshare or cancer data sha) that was used for a classifer
The text was updated successfully, but these errors were encountered: