-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
db_create bug at the 22nd file #14
Comments
Hi Jan, Thanks for raising this issue. I shall investigate the problem. It is likely to do with the format of a record in Dom |
Hi Dom, Thanks for your quick answer. For the plant and fungi run it failed at file
But actually I just realised that another time the program stopped on 24th position, so it is very likely that the problem has nothing to do with the 22th position but rather with some files format as you say.
If I remember well, the only difference between the 2 executions was the computer (office vs. home computer) and the value of the max_length argument. For the first one it was max_length =10000 and the second max_length=6000. Maybe a record with bad format in Jan |
I just tried to re-run the DB creation after updating R (going from version 3.4.3 to 3.5.1) to see if it could change something, it stopped again at the same file than last time for the plants and fungi, but I don't get exactly the same error message. I don't know if this can help.
Jan |
Hi Jan, Thanks for your detailed reporting. It looks like the problem is a little more complicated than I first thought. The error is coming from the Possibilities:
To rule out the second option, you could try running Dom |
Hi Dom, I downloaded again the
Jan |
Sorry Jan, I meant to ask you to delete Thanks for your efforts, |
Hi Jan, On running your script, I was able to recreate your error on a Windows computer. It doesn't seem to be due to anything specific on your end. I will try and find out what is causing it in Windows. Dom |
Hi Dom, My bad, I've read too quickly your response yesterday and I had no time to write you back before now. This morning I have deleted the file If it works for you on Unix I think I am going to try to do it by passing through an Ubuntu live session, I'll tell you if it works. Thanks a lot you for your help, and let me know if you find a solution to run it on Windows ! Jan |
Dear @DomBennett and @jeroen , thanks for this nice piece of code! I unfortunatly, ran into similar problems as described above. More specifically, several seq.gz files appear to contain entries that cannot be parsed correctly. For example , I isolated file "gbbct568.seq.gz" from the "Bacterial" GenBank database (v. 244). When creating the sql database, I get the following error: > library(restez)
> restez_path_set("/media/scratch/GenBank/faulty")
> db_create()
Adding 1 file(s) to the database ...
... 'gbbct568.seq.gz' (1/1)
Error in .local(conn, name, value, ...) :
Failed to insert data: SQLException:assert:M0M29!INSERT INTO: PRIMARY KEY constraint 'nucleotide.nucleotide_accession_pkey' violated
Fehler: callr subprocess failed: Failed to insert data: SQLException:assert:M0M29!INSERT INTO: PRIMARY KEY constraint 'nucleotide.nucleotide_accession_pkey' violated
Type .Last.error.trace to see where the error occurred
> In addition, I receive the following error when creating the database for "gbpln193.seq.gz" from the "plant" database. > library(restez)
> restez_path_set("/media/scratch/GenBank/faulty")
> db_create()
Adding 1 file(s) to the database ...
... 'gbpln193.seq.gz' (1/1)
Error in gsub(pattern = "([0-9]|\\s+|\n|/)", replacement = "", x = seqrecpart) :
'Calloc' konnte keinen Speicher (18446744071562067968 von 1 Bytes) zuteilen
Fehler: callr subprocess failed: 'Calloc' konnte keinen Speicher (18446744071562067968 von 1 Bytes) zuteilen
Type .Last.error.trace to see where the error occurred
> Notably, the same error is raised, when using the subsequent files (*94.seq.gz; *95.seq.gz, etc.) as well. I realized that all of problematic files from the plant database only consist of a single entry. Could this explain the issue? I would very much appreciate your help with this. We need to do thousands of database searches and your approach would be an enormous time-saver! Thanks a lot, Martin |
I am having the same problem with plants DB. With this script:
I get this error (only showing end of output):
The error message is in a different language from @capoony above, but appears to be the same, down to the long string of digits after Session info:
|
I'm not an expert but it looks like it leaks memory. I have tried to fix the downloading mechanism. Could you try to install the new version from https://ropensci.r-universe.dev/ui#package:restez and try again? |
For info, we're looking for a new maintainer / a new maintainer team for this package, see #23. If you read this and are a restez user, feel free to volunteer, we'd be happy to help. |
@jeroen Thanks! I tried with the new version of restez (v1.0.3). Unrelated to this problem, the download took much longer (~3 days instead of overnight; I didn't log times but it was an obvious difference). I still get the same error, at the same file:
This time I captured the output of
It seems there is something in that file ( sessionInfo()``` > sessionInfo() R version 4.1.1 (2021-08-10) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 20.04.4 LTSMatrix products: default locale: attached base packages: other attached packages: loaded via a namespace (and not attached):
|
Should resolve ropensci#14
It turns out it was a very long DNA sequence in a single file causing the problem: gbpln210.seq.gz is a single sequence with 1198270666 characters. This causes the error in I've issued a PR with a fix: #24 |
Thanks! I think this look OK. |
Hi,
I've got an issue with the db_create function and I can't figure out if it comes from my computer or from the function itself. I want to create a database with all data concerning animals from GenBank, and a second database with the plants and fungi. I tried both but each time I run the function db_create, the execution stops and I get an error message when the 22nd file is added to the database. Here is the code I ran for the animals database, and the end of the error message I got (I saved only the end of it last time I ran it). I've ran he function several times with different min_length and max_length values and it ended the same way each time.
I got the same message when I tried to create the plants and fungi database (program stops when adding the 22nd file), but I tried to create a DB for only the rodents or only the "other mammals" and it went well without any problem, and the resulting database was functional for both (even if there are more than 22 files in these parts of GenBank).
Has someone an idea where this problem could come from ? I'm not very experimented so the reason may be obvious, but I don't get it.
Thank you for your help !
Executed program
Error message
Session Info
- Session info -------------------------------------------------------------------------------------setting value
version R version 3.4.3 (2017-11-30)
os Windows >= 8 x64
system x86_64, mingw32
ui RStudio
language (EN)
collate French_France.1252
ctype French_France.1252
tz Europe/Paris
date 2018-12-10
package * version date lib source
assertthat 0.2.0 2017-04-11 [1] CRAN (R 3.4.4)
backports 1.1.2 2017-12-13 [1] CRAN (R 3.4.4)
base64enc 0.1-3 2015-07-28 [1] CRAN (R 3.4.4)
bindr 0.1.1 2018-03-13 [1] CRAN (R 3.4.4)
bindrcpp 0.2.2 2018-03-29 [1] CRAN (R 3.4.4)
bitops 1.0-6 2013-08-17 [1] CRAN (R 3.4.4)
callr 3.0.0 2018-08-24 [1] CRAN (R 3.4.4)
cli 1.0.1 2018-09-25 [1] CRAN (R 3.4.4)
codetools 0.2-15 2016-10-05 [1] CRAN (R 3.4.4)
crayon 1.3.4 2017-09-16 [1] CRAN (R 3.4.4)
DBI 1.0.0 2018-05-02 [1] CRAN (R 3.4.4)
desc 1.2.0 2018-05-01 [1] CRAN (R 3.4.4)
devtools 2.0.1 2018-10-26 [1] CRAN (R 3.4.4)
digest 0.6.18 2018-10-10 [1] CRAN (R 3.4.4)
dplyr 0.7.8 2018-11-10 [1] CRAN (R 3.4.4)
fs 1.2.6 2018-08-23 [1] CRAN (R 3.4.4)
glue 1.3.0 2018-07-17 [1] CRAN (R 3.4.4)
magrittr 1.5 2014-11-22 [1] CRAN (R 3.4.4)
memoise 1.1.0 2017-04-21 [1] CRAN (R 3.4.4)
MonetDBLite 0.6.0 2018-07-27 [1] CRAN (R 3.4.4)
pillar 1.3.0 2018-07-14 [1] CRAN (R 3.4.4)
pkgbuild 1.0.2 2018-10-16 [1] CRAN (R 3.4.4)
pkgconfig 2.0.2 2018-08-16 [1] CRAN (R 3.4.4)
pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.4.4)
prettyunits 1.0.2 2015-07-13 [1] CRAN (R 3.4.4)
processx 3.2.1 2018-12-05 [1] CRAN (R 3.4.4)
ps 1.2.1 2018-11-06 [1] CRAN (R 3.4.4)
purrr 0.2.5 2018-05-29 [1] CRAN (R 3.4.4)
R6 2.3.0 2018-10-04 [1] CRAN (R 3.4.4)
Rcpp 1.0.0 2018-11-07 [1] CRAN (R 3.4.4)
RCurl 1.95-4.11 2018-07-15 [1] CRAN (R 3.4.4)
remotes 2.0.2 2018-10-30 [1] CRAN (R 3.4.4)
restez * 1.0.0 2018-11-26 [1] CRAN (R 3.4.4)
rlang 0.3.0.1 2018-10-25 [1] CRAN (R 3.4.4)
rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.4.4)
rstudioapi 0.8 2018-10-02 [1] CRAN (R 3.4.4)
sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.4.4)
tibble 1.4.2 2018-01-22 [1] CRAN (R 3.4.4)
tidyselect 0.2.5 2018-10-11 [1] CRAN (R 3.4.4)
usethis 1.4.0 2018-08-14 [1] CRAN (R 3.4.4)
withr 2.1.2 2018-03-15 [1] CRAN (R 3.4.4)
[1] C:/Users/Perret/Documents/R/win-library/3.4
[2] C:/Program Files/R/R-3.4.3/library
The text was updated successfully, but these errors were encountered: