Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

bug in genlight filtering #363

Open
maxecoulter opened this issue Aug 30, 2024 · 1 comment · May be fixed by #364
Open

bug in genlight filtering #363

maxecoulter opened this issue Aug 30, 2024 · 1 comment · May be fixed by #364

Comments

@maxecoulter
Copy link

 #R version 3.6.3

  library(testthat)
  library(adegenet)
  
  #adegenet_2.1.10
  
  dat <- list(toto=c(1,2,0,0), titi=c(NA,1,2,0), tata=c(NA,0,2, NA))#3 individuals, 4 binary snps
  
  x <- new("genlight", gen = dat, ploidy = c(2,2,2,2), loc.names = c("m1", "m2", "m3", "m4"), loc.all = c("A/T", "A/T", "A/T", "A/T") )
  
  
  x2 <- x[1:3,] #should be the same, as there are 3 individuals
  
  expect_equal(x@ind.names, x2@ind.names) #fine, all individuals present
  expect_equal(x, x2) #uh oh
  
  length(x@gen[[1]]@snp[[1]]) == length(x2@gen[[1]]@snp[[1]]) #FALSE - x2 has an extra byte stuck on the end
  
  #However ...
  x3 <- x[1:3, 1:4] #should be the same, as there are 3 individuals
  expect_equal(x@ind.names, x3@ind.names) #fine, all individuals present
  expect_equal(x@loc.names, x3@loc.names) #fine, all loci present
  expect_equal(x, x3) #fine!
  

Hi,

It seems that when filtering a genlight by individual index without the loci indexes you get an extra byte at the end of each snpbin. This byte is probably ignored in all subsequent downstream applications, however, it is probably not intended behavior.

The problem seems to be (in method at glHandle.R line 110) something like this:

    j <- TRUE #we have all the loci indexes
    j <- rep(TRUE, nLoc(x)) #rep 4x
    
    test <- lapply(x@gen, function(e) e[j])
    expect_equal(x@gen, test) #extra byte added in
    
    j2 <- 1:4
    test2 <- lapply(x@gen, function(e) e[j2])
    expect_equal(x@gen, test2) #fine


    

In glHandle.R line 5 in .subsetbin

xint   <- as.integer(rawToBits(x)[i]) #if gl[1:3, 1:4], x  == 1:4, if gl[1:3,], x == c(T,T,T,T)
zeroes <- 8 - (length(xint) %% 8) #if xint is 8 bits, we get 8 to add. Surely that is not correct?
   
return(packBits(c(xint, rep(0L, zeroes)))) 

Because i at this point is T,T,T,T this gets expanded by R to rep(T, 8) and all 8 bits are taken, instead of just the first 4.
That shouldn't be an issue, except that in the second line, it doesn't take into account that if length(xint) %% 8 ==0, then zeroes should be 0, not 8. Note this will be an issue anytime that length(xint) %% 8 == 0, (not just this unusual scenario where we are filtering to less than a byte)
A fix might look something like:

xint   <- as.integer(rawToBits(x)[i]) #if gl[1:3, 1:4], x  == 1:4, if gl[1:3,], x == c(T,T,T,T) 
extra_bits <- length(xint) %% 8
if (extra_bits) zeroes <- 8 - extra_bits

But I'm not sure of the rest of the details.

Let me know your thoughts.

Max

@zkamvar
Copy link
Collaborator

zkamvar commented Sep 22, 2024

Hi @maxecoulter, Thank you for writing this up and thoroughly detailing your findings. I really appreciate the effort!

You are indeed correct that this is unintended behavior and I've opened #364 to fix this. My apologies for not getting to this sooner, things have been busy lately -_-

Given that I used some of the code you wrote to fix the bug and you did the analysis to find it, you should be credited as a contributor. Please comment on #364 if you have an ORCiD that you want to attach to your credit.

FWIW, I believe that I noticed this soon after I wrote these functions in #48 nearly a decade ago, but I was too deep in my dissertation work to effectively sit down and debug it.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
2 participants