-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
unable to get indi/fcon1000
data
#33
Comments
I think it is due to nitrc starting to require to login to get access to that file :-/ details(git-annex)lena:/tmp/fcon1000/AnnArbor_a[master]sub04111
$> git annex whereis anat/mprage_anonymized.nii.gz
whereis anat/mprage_anonymized.nii.gz (2 copies)
ccec1cce-6820-4a71-8041-7abd4d6603ac -- yoh@smaug:/mnt/datasets/datalad/crawl/indi/fcon1000/AnnArbor_a
f998b99f-33bf-4631-bd02-f72fe3489d9e -- [datalad-archives]
datalad-archives: dl+archive:MD5E-s1602406400--2e30c496cecbc613c0f8e25bf8e16723.tar#path=sub04111/anat/mprage_anonymized.nii.gz&size=3914814
ok
(dev3) 1 39266.....................................:Tue 23 Jun 2020 12:01:46 PM EDT:.
(git-annex)lena:/tmp/fcon1000/AnnArbor_a[master]sub04111
$> git annex whereis --key MD5E-s1602406400--2e30c496cecbc613c0f8e25bf8e16723.tar#
whereis MD5E-s1602406400--2e30c496cecbc613c0f8e25bf8e16723.tar# (0 copies) failed
git-annex: whereis: 1 failed
(dev3) 1 39267 ->1.....................................:Tue 23 Jun 2020 12:02:00 PM EDT:.
(git-annex)lena:/tmp/fcon1000/AnnArbor_a[master]sub04111
$> git annex whereis --key MD5E-s1602406400--2e30c496cecbc613c0f8e25bf8e16723.tar
whereis MD5E-s1602406400--2e30c496cecbc613c0f8e25bf8e16723.tar (2 copies)
00000000-0000-0000-0000-000000000001 -- web
ccec1cce-6820-4a71-8041-7abd4d6603ac -- yoh@smaug:/mnt/datasets/datalad/crawl/indi/fcon1000/AnnArbor_a
web: http://www.nitrc.org/frs/downloadlink.php/1992
ok
(dev3) 1 39268.....................................:Tue 23 Jun 2020 12:02:02 PM EDT:.
(git-annex)lena:/tmp/fcon1000/AnnArbor_a[master]sub04111
$> git annex get --key MD5E-s1602406400--2e30c496cecbc613c0f8e25bf8e16723.tar
get MD5E-s1602406400--2e30c496cecbc613c0f8e25bf8e16723.tar (from web...)
verification of content failed
Unable to access these remotes: web
Try making some of these repositories available:
00000000-0000-0000-0000-000000000001 -- web
ccec1cce-6820-4a71-8041-7abd4d6603ac -- yoh@smaug:/mnt/datasets/datalad/crawl/indi/fcon1000/AnnArbor_a
failed
git-annex: get: 1 failed
(dev3) 1 39269 ->1.....................................:Tue 23 Jun 2020 12:02:25 PM EDT:.
(git-annex)lena:/tmp/fcon1000/AnnArbor_a[master]sub04111
$> git annex get --key MD5E-s1602406400--2e30c496cecbc613c0f8e25bf8e16723.tar --debug
[2020-06-23 12:02:34.293205307] read: git ["--git-dir=../.git","--work-tree=..","--literal-pathspecs","symbolic-ref","-q","HEAD"]
[2020-06-23 12:02:34.305624863] process done ExitSuccess
[2020-06-23 12:02:34.305860717] read: git ["--git-dir=../.git","--work-tree=..","--literal-pathspecs","show-ref","refs/heads/master"]
[2020-06-23 12:02:34.319716634] process done ExitSuccess
get MD5E-s1602406400--2e30c496cecbc613c0f8e25bf8e16723.tar [2020-06-23 12:02:34.321403394] read: git ["--git-dir=../.git","--work-tree=..","--literal-pathspecs","show-ref","git-annex"]
[2020-06-23 12:02:34.331076977] process done ExitSuccess
[2020-06-23 12:02:34.33168571] read: git ["--git-dir=../.git","--work-tree=..","--literal-pathspecs","show-ref","--hash","refs/heads/git-annex"]
[2020-06-23 12:02:34.346998595] process done ExitSuccess
[2020-06-23 12:02:34.347577569] read: git ["--git-dir=../.git","--work-tree=..","--literal-pathspecs","log","refs/heads/git-annex..88214397de46cdb2d9e0aae77ed89f995d80332f","--pretty=%H","-n1"]
[2020-06-23 12:02:34.357381935] process done ExitSuccess
[2020-06-23 12:02:34.37867209] chat: git ["--git-dir=../.git","--work-tree=..","--literal-pathspecs","cat-file","--batch"]
[2020-06-23 12:02:34.381210874] chat: git ["--git-dir=../.git","--work-tree=..","--literal-pathspecs","cat-file","--batch-check=%(objectname) %(objecttype) %(objectsize)"]
(from web...)
[2020-06-23 12:02:34.45923921] Request {
host = "www.nitrc.org"
port = 80
secure = False
requestHeaders = [("Accept-Encoding","identity"),("User-Agent","git-annex/8.20200501+git61-g64e081d58-1~ndall+1")]
path = "/frs/downloadlink.php/1992"
queryString = ""
method = "GET"
proxy = Nothing
rawBody = False
redirectCount = 10
responseTimeout = ResponseTimeoutDefault
requestVersion = HTTP/1.1
}
verification of content failed
Unable to access these remotes: web
Try making some of these repositories available:
00000000-0000-0000-0000-000000000001 -- web
ccec1cce-6820-4a71-8041-7abd4d6603ac -- yoh@smaug:/mnt/datasets/datalad/crawl/indi/fcon1000/AnnArbor_a
failed
[2020-06-23 12:02:35.993770571] process done ExitSuccess
[2020-06-23 12:02:35.994441736] process done ExitSuccess
git-annex: get: 1 failed
So going to http://www.nitrc.org/frs/downloadlink.php/1992 requires to login, and I guess that is what has changed, may be @chaselgrove could confirm that? As a workaround solution, we could
As a bit more permanent/reliable solution, I guess we would need to adjust our downloaders to provide support for the ad-hoc "you need to login" web page and make datalad downloader "smarter" by allowing first non authenticated attempt, then parsing the output (if not too large) to discover if we got something else from what we expected - e.g. login page -- and then authenticate... I quickly tested that this patch `datalad download-url` works, but I didn't wait long enough (home inet isn't fast enough to fetch GB atm)$> git diff
diff --git a/.travis.yml b/.travis.yml
index 48d9c5a64..9517a6657 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -51,7 +51,7 @@ before_install:
# Install git-annex
- OLD_PATH="$PATH"
- eval source tools/ci/install-annex.sh ${_DL_ANNEX_INSTALL_SCENARIO}
- # if PATH was changed, we need to make it available in the login sessions
+ # if PATH was changed, we need to
- if [ "$PATH" != "$OLD_PATH" ]; then echo export PATH=$PATH >> ~/.bashrc; fi
# Optionally install the latest Git. Exit code 100 indicates that bundled is same as the latest.
- if [ ! -z "${_DL_UPSTREAM_GIT:-}" ]; then
diff --git a/datalad/downloaders/configs/nitrc.cfg b/datalad/downloaders/configs/nitrc.cfg
index a97e16bf8..ebd6e8601 100644
--- a/datalad/downloaders/configs/nitrc.cfg
+++ b/datalad/downloaders/configs/nitrc.cfg
@@ -10,6 +10,7 @@
# to accomplish the mission here
url_re = https?://fcon_1000\.projects\.nitrc\.org/indi/adhd200/index\.html
https?://www\.nitrc\.org/frs/downloadlink\.php/(7058|3075|3479|9108)
+ https?://www\.nitrc\.org/frs/downloadlink\.php/([0-9][0-9]*)
credential = nitrc
authentication_type = html_form
html_form_url = https://www.nitrc.org/account/#.php |
Yes, it appears that login is now required for that file (and others; compare https://www.nitrc.org/frs/?group_id=296 logged in to logged out). |
@chaselgrove what about
|
If you look at the link I sent and compare it logged in and logged out, you get what can best be described as "a list [of] which ones." :) Not "all downloads," but perhaps all that you're concerned with. Certainly everything in the fcon_1000 package (what appears to be all the site tarballs). |
My question was more generic -- by now I do not remember what other datasets from NITRC, beyond fcon_1000 we might have in datasets.datalad.org . So I wondered if there is some list of which datasets started to require authentication. But I guess it could be any project's admin who enables or disables requiring authentication for download, right? i.e. it could have not been you (NITRC) which decided to require it for data distributed otherwise under a license which otherwise does allow redistribution. am I correct @chaselgrove ? |
My first response would be to say look at https://www.nitrc.org/ir/, but that doesn't match the fcon_1000 permissions problem we're seeing here. Didn't we set things up to get data from NITRC-IR? You are correct on the second point. It is in fact never NITRC that makes these decisions for data provided by others. |
I am also having trouble downloading here is the new error message - datalad was able to download about 1.8GB file, but stalled for one of the submodules - I let it run overnight, it just won't download anything. I added
|
I have now pushed those metadata files. Report back if you find some other files not downloadable... But note that in principle you don't need any of those for your analyses of any kind -- those are internal to (now somewhat deprecated) |
@yarikoptic, thanks for the update. The error messages related to the the issue is that the main data folder download seems to got stalled in the middle. is there a flag I can turn on to print out the stalled URL? |
for a file you can run You can run |
@yarikoptic, I think the issue is to first identify which file(s) is hanging the download. after adding
after this point, the download simply hangs with 0%. From the DEBUG line immediately above the hanging, it seems the folder it tries to download is it is also strange that the progress bar showed that it downloaded a number of submodules ranging between 1GB to 8GB before getting to this 3.34GB repo that caused hanging, but when I check the downloaded folder size, it only reached 1.8GB. I don't know if the progress bar had reported the size correctly. anyhow, I was able to download the dataset from https://www.nitrc.org/ir/app/action/ProjectDownloadAction/project/fcon_1000 as guest, although its folder organization is less BIDS-like. |
re progress bar stall: I think we are experiencing an issue since the file(s) to come from an archive: ❯ git annex whereis phenotypic.csv
whereis phenotypic.csv (2 copies)
978192a9-f540-4f5a-b6c5-ca57c0c9552f -- yoh@smaug:/mnt/datasets/datalad/crawl/indi/fcon1000/Cleveland CCF
c402c7ef-34d8-4f1f-a180-a63babc57733 -- [datalad-archives]
datalad-archives: dl+archive:MD5E-s3329424061--90df116f2252fbe5a545064f10cfc1f4.tar.gz#path=INDI_Lite_NIFTI/phenotypic.csv&size=489
ok for me it doesn't hang if for that single file but relatively quickly complains multiple times on the same boring message: ❯ datalad get phenotypic.csv
get(error): phenotypic.csv (file) [Failed to fetch any archive containing MD5E-s489--2d2f2e702e4b40c2eb96a9beeafea6db.csv. Tried: ['MD5E-s3329424061--90df116f2252fbe5a545064f10cfc1f4.tar.gz']
Failed to fetch any archive containing MD5E-s489--2d2f2e702e4b40c2eb96a9beeafea6db.csv. Tried: ['MD5E-s3329424061--90df116f2252fbe5a545064f10cfc1f4.tar.gz']
Failed to fetch any archive containing MD5E-s489--2d2f2e702e4b40c2eb96a9beeafea6db.csv. Tried: ['MD5E-s3329424061--90df116f2252fbe5a545064f10cfc1f4.tar.gz']] but if asked for more files -- indeed just keeps its problems to itself for quite a while, fetching some files once in a while as well. FWIW -- to ease debugging etc, can just invoke ❯ git annex get --key MD5E-s3329424061--90df116f2252fbe5a545064f10cfc1f4.tar.gz
get MD5E-s3329424061--90df116f2252fbe5a545064f10cfc1f4.tar.gz (from datalad...)
[INFO] Downloading 'http://www.nitrc.org/frs/downloadlink.php/3479' into '.git/annex/tmp/MD5E-s3329424061--90df116f2252fbe5a545064f10cfc1f4.tar.gz'
Verification of content failed
Unable to access these remotes: datalad
Maybe add some of these git remotes (git remote add ...):
978192a9-f540-4f5a-b6c5-ca57c0c9552f -- yoh@smaug:/mnt/datasets/datalad/crawl/indi/fcon1000/Cleveland CCF
failed
get: 1 failed so it uses datalad special remote to download it but that one failed for me... knowing that we expose that also via
to see that damn thing downloads just the login page :-/ -- since NITRC doesn't provide proper interface for clients with corresponding 4xx codes, and just web ui -- we are trying to figure out when it wants to login etc, I guess that detection failed now. Its configuration is at Cookies must be enabled past this point. @chaselgrove could you guide me on how to download from NITRC nowadays in a scripted manner? |
What is the problem?
I am unable able to get any data from the fcon1000 dataset.
What steps will reproduce the problem?
What version of DataLad are you using?
datalad wtf
The text was updated successfully, but these errors were encountered: