Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Error in `ggplot2::geom_bar() #191

Open
pfeutry opened this issue Sep 13, 2024 · 8 comments
Open

Error in `ggplot2::geom_bar() #191

pfeutry opened this issue Sep 13, 2024 · 8 comments

Comments

@pfeutry
Copy link

pfeutry commented Sep 13, 2024

I was running the filter_rad() on a new dataset and got the following error

Filter genotyping threshold: 0.2
Number of individuals / strata / chrom / locus / SNP:
Before: 658 / 3 / 1 / 5478 / 5478
Blacklisted: 0 / 0 / 0 / 32 / 32
After: 658 / 3 / 1 / 5446 / 5446

Computation time, overall: 18 sec
######################### completed filter_genotyping ##########################
################################################################################
###################### radiator::filter_snp_position_read ######################
################################################################################
Execution date@time: 20240913@1606
Function call and arguments stored in: radiator_filter_snp_position_read_args_20240913@1606.tsv
2 steps to visualize and filter the data based on the number of SNP on the read/locus:
Step 1. Visualization (boxplot, distribution
Step 2. Threshold selection
Generating SNP position on read stats
Saving 17.5 x 10 cm image
Error in ggplot2::geom_bar():
! Problem while computing aesthetics.
ℹ Error occurred in the 1st layer.
Caused by error in check_aesthetics():
! Aesthetics must be either length 1 or the same as the data
(5446).
✖ Fix the following mappings: x.
Run rlang::last_trace() to see where the error occurred.
Warning messages:
1: In ggplot2::scale_y_log10(labels = scales::number_format(), oob = scales::squish_infinite) :
log-10 transformation introduced infinite values.
2: There was 1 warning in dplyr::mutate().
ℹ In argument: WHITELISTED_MARKERS = purrr::map_int(...).
Caused by warning:
! Using one column matrices in filter() was deprecated in dplyr
1.1.0.
ℹ Please use one dimensional logical vectors instead.
ℹ The deprecated feature was likely used in the radiator package.
Please report the issue at
https://github.com/thierrygosselin/radiator/issues.
This warning is displayed once every 8 hours.
Call lifecycle::last_lifecycle_warnings() to see where this warning
was generated.

Command line used was:

data <- radiator::filter_rad(data="Report_DWs24-9586_Counts.csv",
strata = "Whale_Shark_Strata.txt" )

devtools::session_info()
─ Session info ─────────────────────────────────────────────────────
setting value
version R version 4.3.2 (2023-10-31)
os Ubuntu 22.04.4 LTS
system x86_64, linux-gnu
ui RStudio
language (EN)
collate en_AU.UTF-8
ctype en_AU.UTF-8
tz Australia/Canberra
date 2024-09-13
rstudio 2023.06.1+524.pro1 Mountain Hydrangea (server)
pandoc 2.9.2.1 @ /usr/bin/pandoc

─ Packages ─────────────────────────────────────────────────────────
package * version date (UTC) lib source
ade4 1.7-22 2023-02-06 [2] CRAN (R 4.3.2)
adegenet 2.1.10 2023-01-26 [2] CRAN (R 4.3.2)
ape 5.7-1 2023-03-13 [2] CRAN (R 4.3.2)
backports 1.5.0 2024-05-23 [1] CRAN (R 4.3.2)
BiocGenerics 0.48.1 2023-11-01 [2] Bioconductor
BiocManager 1.30.25 2024-08-28 [1] CRAN (R 4.3.2)
Biostrings 2.70.1 2023-10-25 [2] Bioconductor
bit 4.0.5 2022-11-15 [2] CRAN (R 4.3.2)
bit64 4.0.5 2020-08-30 [2] CRAN (R 4.3.2)
bitops 1.0-8 2024-07-29 [1] CRAN (R 4.3.2)
boot 1.3-28.1 2022-11-22 [2] CRAN (R 4.3.2)
broom 1.0.6 2024-05-17 [1] CRAN (R 4.3.2)
cachem 1.0.8 2023-05-01 [2] CRAN (R 4.3.2)
callr 3.7.3 2022-11-02 [2] CRAN (R 4.3.2)
cli 3.6.3 2024-06-21 [1] CRAN (R 4.3.2)
cluster 2.1.4 2022-08-22 [2] CRAN (R 4.3.2)
codetools 0.2-19 2023-02-01 [2] CRAN (R 4.3.2)
colorspace 2.1-1 2024-07-26 [1] CRAN (R 4.3.2)
crayon 1.5.3 2024-06-20 [1] CRAN (R 4.3.2)
data.table 1.16.0 2024-08-27 [1] CRAN (R 4.3.2)
devtools 2.4.5 2022-10-11 [2] CRAN (R 4.3.2)
digest 0.6.37 2024-08-19 [1] CRAN (R 4.3.2)
dplyr 1.1.4 2023-11-17 [1] CRAN (R 4.3.2)
ellipsis 0.3.2 2021-04-29 [2] CRAN (R 4.3.2)
fansi 1.0.6 2023-12-08 [1] CRAN (R 4.3.2)
farver 2.1.2 2024-05-13 [1] CRAN (R 4.3.2)
fastmap 1.1.1 2023-02-24 [2] CRAN (R 4.3.2)
foreach 1.5.2 2022-02-02 [2] CRAN (R 4.3.2)
fs 1.6.3 2023-07-20 [2] CRAN (R 4.3.2)
fst 0.9.8 2022-02-08 [1] CRAN (R 4.3.2)
fstcore * 0.9.18 2023-12-02 [1] CRAN (R 4.3.2)
gdsfmt 1.38.0 2023-10-24 [1] Bioconductor
generics 0.1.3 2022-07-05 [2] CRAN (R 4.3.2)
GenomeInfoDb 1.38.1 2023-11-08 [2] Bioconductor
GenomeInfoDbData 1.2.11 2023-11-10 [2] Bioconductor
GenomicRanges 1.54.1 2023-10-29 [2] Bioconductor
ggplot2 3.5.1 2024-04-23 [1] CRAN (R 4.3.2)
glmnet 4.1-8 2023-08-22 [2] CRAN (R 4.3.2)
glue 1.7.0 2024-01-09 [1] CRAN (R 4.3.2)
gridExtra 2.3 2017-09-09 [2] CRAN (R 4.3.2)
gtable 0.3.5 2024-04-22 [1] CRAN (R 4.3.2)
HardyWeinberg 1.7.8 2024-04-06 [1] CRAN (R 4.3.2)
hms 1.1.3 2023-03-21 [2] CRAN (R 4.3.2)
htmltools 0.5.7 2023-11-03 [2] CRAN (R 4.3.2)
htmlwidgets 1.6.2 2023-03-17 [2] CRAN (R 4.3.2)
httpuv 1.6.12 2023-10-23 [2] CRAN (R 4.3.2)
igraph 2.0.3 2024-03-13 [2] CRAN (R 4.3.2)
IRanges 2.36.0 2023-10-24 [2] Bioconductor
iterators 1.0.14 2022-02-05 [2] CRAN (R 4.3.2)
jomo 2.7-6 2023-04-15 [1] CRAN (R 4.3.2)
labeling 0.4.3 2023-08-29 [2] CRAN (R 4.3.2)
later 1.3.1 2023-05-02 [2] CRAN (R 4.3.2)
lattice 0.22-5 2023-10-24 [2] CRAN (R 4.3.2)
lifecycle 1.0.4 2023-11-07 [2] CRAN (R 4.3.2)
lme4 1.1-35.5 2024-07-03 [1] CRAN (R 4.3.2)
magrittr 2.0.3 2022-03-30 [2] CRAN (R 4.3.2)
MASS 7.3-60 2023-05-04 [2] CRAN (R 4.3.2)
Matrix 1.6-1.1 2023-09-18 [2] CRAN (R 4.3.2)
matrixStats 1.1.0 2023-11-07 [2] CRAN (R 4.3.2)
memoise 2.0.1 2021-11-26 [2] CRAN (R 4.3.2)
mgcv 1.9-0 2023-07-11 [2] CRAN (R 4.3.2)
mice 3.16.0 2023-06-05 [1] CRAN (R 4.3.2)
mime 0.12 2021-09-28 [2] CRAN (R 4.3.2)
miniUI 0.1.1.1 2018-05-18 [2] CRAN (R 4.3.2)
minqa 1.2.8 2024-08-17 [1] CRAN (R 4.3.2)
mitml 0.4-5 2023-03-08 [1] CRAN (R 4.3.2)
munsell 0.5.1 2024-04-01 [1] CRAN (R 4.3.2)
nlme 3.1-163 2023-08-09 [2] CRAN (R 4.3.2)
nloptr 2.1.1 2024-06-25 [1] CRAN (R 4.3.2)
nnet 7.3-19 2023-05-03 [2] CRAN (R 4.3.2)
pan 1.9 2023-12-07 [1] CRAN (R 4.3.2)
permute 0.9-7 2022-01-27 [2] CRAN (R 4.3.2)
pillar 1.9.0 2023-03-22 [2] CRAN (R 4.3.2)
pkgbuild 1.4.2 2023-06-26 [2] CRAN (R 4.3.2)
pkgconfig 2.0.3 2019-09-22 [2] CRAN (R 4.3.2)
pkgload 1.3.3 2023-09-22 [2] CRAN (R 4.3.2)
plyr 1.8.9 2023-10-02 [2] CRAN (R 4.3.2)
prettyunits 1.2.0 2023-09-24 [2] CRAN (R 4.3.2)
processx 3.8.2 2023-06-30 [2] CRAN (R 4.3.2)
profvis 0.3.8 2023-05-02 [2] CRAN (R 4.3.2)
promises 1.2.1 2023-08-10 [2] CRAN (R 4.3.2)
ps 1.7.5 2023-04-18 [2] CRAN (R 4.3.2)
purrr 1.0.2 2023-08-10 [2] CRAN (R 4.3.2)
R6 2.5.1 2021-08-19 [2] CRAN (R 4.3.2)
radiator 1.3.4 2024-09-13 [1] Github (3a8cb4f)
ragg 1.2.6 2023-10-10 [2] CRAN (R 4.3.2)
Rcpp 1.0.13 2024-07-17 [1] CRAN (R 4.3.2)
RCurl 1.98-1.16 2024-07-11 [1] CRAN (R 4.3.2)
readr 2.1.5 2024-01-10 [1] CRAN (R 4.3.2)
remotes 2.5.0 2024-03-17 [1] CRAN (R 4.3.2)
reshape2 1.4.4 2020-04-09 [2] CRAN (R 4.3.2)
rlang 1.1.4 2024-06-04 [1] CRAN (R 4.3.2)
rpart 4.1.21 2023-10-09 [2] CRAN (R 4.3.2)
Rsolnp 1.16 2015-12-28 [1] CRAN (R 4.3.2)
rstudioapi 0.15.0 2023-07-07 [2] CRAN (R 4.3.2)
S4Vectors 0.40.1 2023-10-26 [2] Bioconductor
scales 1.3.0 2023-11-28 [1] CRAN (R 4.3.2)
SeqArray 1.42.4 2024-04-03 [1] Bioconductor 3.18 (R 4.3.2)
seqinr 4.2-30 2023-04-05 [2] CRAN (R 4.3.2)
sessioninfo 1.2.2 2021-12-06 [2] CRAN (R 4.3.2)
shape 1.4.6.1 2024-02-23 [1] CRAN (R 4.3.2)
shiny 1.7.5.1 2023-10-14 [2] CRAN (R 4.3.2)
SNPRelate 1.36.1 2024-02-26 [1] Bioconductor 3.18 (R 4.3.2)
stringi 1.8.4 2024-05-06 [1] CRAN (R 4.3.2)
stringr 1.5.1 2023-11-14 [1] CRAN (R 4.3.2)
survival 3.5-7 2023-08-14 [2] CRAN (R 4.3.2)
systemfonts 1.0.5 2023-10-09 [2] CRAN (R 4.3.2)
textshaping 0.3.7 2023-10-09 [2] CRAN (R 4.3.2)
tibble 3.2.1 2023-03-20 [2] CRAN (R 4.3.2)
tidyr 1.3.1 2024-01-24 [1] CRAN (R 4.3.2)
tidyselect 1.2.1 2024-03-11 [1] CRAN (R 4.3.2)
truncnorm 1.0-9 2023-03-20 [2] CRAN (R 4.3.2)
tzdb 0.4.0 2023-05-12 [2] CRAN (R 4.3.2)
UpSetR 1.4.0 2019-05-22 [1] CRAN (R 4.3.2)
urlchecker 1.0.1 2021-11-30 [2] CRAN (R 4.3.2)
usethis 2.2.2 2023-07-06 [2] CRAN (R 4.3.2)
utf8 1.2.4 2023-10-22 [2] CRAN (R 4.3.2)
vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.3.2)
vegan 2.6-4 2022-10-11 [2] CRAN (R 4.3.2)
vroom 1.6.5 2023-12-05 [1] CRAN (R 4.3.2)
withr 3.0.1 2024-07-31 [1] CRAN (R 4.3.2)
xtable 1.8-4 2019-04-21 [2] CRAN (R 4.3.2)
XVector 0.42.0 2023-10-24 [2] Bioconductor
zlibbioc 1.48.0 2023-10-24 [2] Bioconductor

[1] /home/feu003/R/x86_64-pc-linux-gnu-library/4.3
[2] /apps/R/4.3.2/lib/R/library

Happy to send the data if required

@thierrygosselin
Copy link
Owner

Surprised you even got that far! With your dataset I got an error very early while reading the DArT file:

################################################################################
############################# radiator::read_dart ##############################
################################################################################
Execution date@time: 20241107@2039
Folder created: read_dart_20241107@2039
File written: radiator_tidy_dart_args_20241107@2039.tsv             
Reading DArT file...
    Number of individuals: 658                                      
Analyzing strata file                                               
    Number of strata: 3                                             
    Number of individuals: 658
Using individuals in strata file to filter individuals in DArT file
Number of blacklisted samples: 0
Error in `import_dart()` at radiator/R/dart.R:371:3:
! 
Problem tidying DArT dataset: contact author
Run `rlang::last_trace()` to see where the error occurred.

Computation time, overall: 1 sec
############################# completed read_dart ##############################

@thierrygosselin
Copy link
Owner

Using the other file you sent:

> data <- radiator::read_dart(data = "Report_DWs24-9586_Counts_mod.csv", strata = "Whale_Shark_Strata.txt", verbose = TRUE)
################################################################################
############################# radiator::read_dart ##############################
################################################################################
Execution date@time: 20241107@2042
Folder created: read_dart_20241107@2042
File written: radiator_tidy_dart_args_20241107@2042.tsv             
Reading DArT file...
    Number of individuals: 658                                      
Analyzing strata file                                               
    Number of strata: 3                                             
    Number of individuals: 658
Using individuals in strata file to filter individuals in DArT file
Number of blacklisted samples: 0

DArT characteristics:
DArT SNP format: alleles coverage in 2 Rows counts
fstcore package v0.9.18
(OpenMP detected, using 56 threads)
File written: radiator_tidy_dart_metadata_20241107@2042.rad
Generating genotypes and calibrating REF/ALT alleles...
Number of markers recalibrated based on counts of allele read depth: 1664
Generating GDS...
File written: radiator_20241107@2042.gds.rad                        
done!

Genotypes formats generated with 8057 SNPs: 
    GT_BIN (the dosage of ALT allele: 0, 1, 2 NA): TRUE
    GT_VCF (the genotype coding VCFs: 0/0, 0/1, 1/1, ./.): FALSE
    GT_VCF_NUC (the genotype coding in VCFs, but with nucleotides: A/C, ./.): FALSE
    GT (the genotype coding 'a la genepop': 001002, 001001, 000000): FALSE
################################### SUMMARY ####################################

Number of chrom: 1
Number of locus: 8057
Number of SNPs: 8057
Number of strata: 3
Number of individuals: 658

Number of ind/strata:
Madagascar = 50
South_Africa = 2
Ningaloo = 606

Number of duplicate id: 0

Computation time, overall: 27 sec
############################# completed read_dart ##############################

@thierrygosselin
Copy link
Owner

I assume the first file is the one you received from DArT and the one with _mod is the one you modified to be read by radiator ?

Report_DWs24-9586_Counts.csv
Report_DWs24-9586_Counts_mod.csv

If it's the case, do you think it's an error from DArT or we will likely see in the future this modified format from them ?

@thierrygosselin
Copy link
Owner

I was able to reproduce the error which is very specific to your dataset ...

The reproducibility was very strange for a DArT dataset.
The coverage of markers, very weird, nothing I've seen with DArT so far...

################################################################################
###################### radiator::filter_snp_position_read ######################
################################################################################
Execution date@time: 20241107@2058
Function call and arguments stored in: radiator_filter_snp_position_read_args_20241107@2058.tsv
2 steps to visualize and filter the data based on the number of SNP on the read/locus:
Step 1. Visualization (boxplot, distribution
Step 2. Threshold selection
Generating SNP position on read stats
Saving 32.9 x 10 cm image
Error in `ggplot2::geom_bar()` at radiator/R/filter_snp_position_read.R:239:5:
! Problem while computing aesthetics.
ℹ Error occurred in the 1st layer.
Caused by error in `check_aesthetics()`:
! Aesthetics must be either length 1 or the same as the data
  (6162).
✖ Fix the following mappings: `x`.
Run `rlang::last_trace()` to see where the error occurred.
Warning message:
In ggplot2::scale_y_log10(labels = scales::number_format(), oob = scales::squish_infinite) :
  log-10 transformation introduced infinite values.

Computation time, overall: 1 sec
###################### completed filter_snp_position_read ######################

@thierrygosselin
Copy link
Owner

thierrygosselin commented Nov 8, 2024

I've had a look and the problem above.

It's generated when using this file : Report_DWs24-9586_Counts_mod.csv in filter_rad and more precisely in this part of the filtering pipeline: radiator::filter_snp_position_read

What I see so far is that radiator is generating a lot of NA regarding the position of the SNP on the read sequence position. So obviously something is not read correctly...

It's been more than 4 years since radiator as read incorrectly a DArT file, consequently, I'm going to wait until you've answered the questions above...

@thierrygosselin
Copy link
Owner

I can easily adapt the script to read it correctly, just want to make sure it's not a one time format...

@thierrygosselin
Copy link
Owner

IMPORTANT
Don't use the datasets, modified or not.
It's really not compatible.

The new DArT format in Report_DWs24-9586_Counts.csv:

  • 2 new column names: MarkerName and Variant
  • MarkerName is similar to the old AlleleID
  • AlleleID column is missing
  • SnpPosition column is missing but remains embedded in MarkerName like it was in AlleleID, but not all the times
  • MarkerName: not consistent. Up to row 116 it's similar to this: 100013135|F|0--33:T>G in the file Report_DWs24-9586_Counts.csv and after it's more like this for remaining rows: 100258_198 . This prevent the extraction of useful DArT information and generate parsing problems because not all rows are coded the same.
  • Up until now DArT was making things complicated not using the more useful and accepted VCF format, but was nonetheless consistent with its naming scheme...

Which begs the question: is this a legitimate unmodified DArT file or it was modified and / or combined by someone ?

@pfeutry
Copy link
Author

pfeutry commented Nov 8, 2024 via email

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants