Skip to content

as.treedata is not compatible with merge manipulation? #36

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
6 tasks done
ETaSky opened this issue Jul 1, 2020 · 4 comments
Closed
6 tasks done

as.treedata is not compatible with merge manipulation? #36

ETaSky opened this issue Jul 1, 2020 · 4 comments

Comments

@ETaSky
Copy link

ETaSky commented Jul 1, 2020

Prerequisites

  • Have you read Feedback and follow the guide?
    • make sure your are using the latest release version
    • read the documents
    • google your quesion/issue

Describe you issue

I was having trouble with "the manipulating tree data using tidy interface". Briefly, I have a tree file created and converted to a tibble using as_tibble, but after some manipulation, this tibble cannot be converted back by as.treedata.

  • Make a reproducible example
  • your code should contain comments to describe the problem (e.g. what expected and actually happened?)
# This is a tree I created using the taxonomy info of several genera, the branch length is fake
Tre = "(((((D_5_-G1:20)D_4_Enterobacteriaceae:20)D_3_Enterobacteriales:20)D_2_Gammaproteobacteria:20)D_1_Proteobacteria:20,((((D_5_Agathobacter-G2:20,D_5_CAG.56-G3:20)D_4_Lachnospiraceae:20,(D_5_Ruminococcaceae.UCG.010-G10:20)D_4_Ruminococcaceae:20)D_3_Clostridiales:20)D_2_Clostridia:20,(((D_5_Asteroleplasma-G4:20)D_4_Erysipelotrichaceae:20)D_3_Erysipelotrichales:20)D_2_Erysipelotrichia:20,(((D_5_Dialister-G9:20)D_4_Veillonellaceae:20)D_3_Selenomonadales:20)D_2_Negativicutes:20)D_1_Firmicutes:20,((((D_5_Bacteroides-G5:20)D_4_Bacteroidaceae:20)D_3_Bacteroidales:20)D_2_Bacteroidia:20)D_1_Bacteroidetes:20,((((D_5_Candidatus.Lumbricincola-G6:20)D_4_Mycoplasmataceae:20)D_3_Mycoplasmatales:20,((D_5_uncultured.bacterium-G7:20)D_4_uncultured.bacterium:20,(D_5_-G8:20)D_4_:20)D_3_Mollicutes.RF39:20)D_2_Mollicutes:20)D_1_Tenericutes:20)D_0_Bacteria:1;"

# convert to treeio tree object
Tre_td <- as.treedata(ape::read.tree(text = Tre))
# convert to tibble
Tre_tb <- as_tibble(Tre_td)
str(Tre_tb)
# test manipulation
Tre_tb_t <- merge(Tre_tb, Tre_tb %>% select(4) %>% mutate(Test = "AAA"), by.x = 4, by.y =1) %>% as_tibble()
str(Tre_tb_t)
Tre_tb_t %>% as.treedata() # then the error message is `Error in check_edgelist(x) : Cannot find root. network is not a tree!`

After doing some digging, I think the problem is due to merge is not compatible with treeio. The output of str(Tre_tb) shows (S3: tbl_tree/tbl_df/tbl/data.frame); however, the output of str(Tre_tb_t) shows (S3: tbl_df/tbl/data.frame). This won't happen if the manipulation is performed using dplyr. I guess this is where the problem arises.

A side note, for some reason, the as_tibble(Tre_td) will generate a warning message:

Warning message:
Unknown or uninitialised column: `node`.

not sure why?

Thank you!

Jincheng

Session Info
R version 4.0.0 (2020-04-24)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19041)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] forcats_0.5.0   stringr_1.4.0   dplyr_1.0.0     purrr_0.3.4     readr_1.3.1     tidyr_1.1.0     tibble_3.0.1    ggplot2_3.3.2   tidyverse_1.3.0
[10] treeio_1.12.0  

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.4.6     cellranger_1.1.0 pillar_1.4.4     compiler_4.0.0   dbplyr_1.4.4     tools_4.0.0      lubridate_1.7.9  jsonlite_1.7.0   tidytree_0.3.3  
[10] lifecycle_0.2.0  nlme_3.1-147     gtable_0.3.0     lattice_0.20-41  pkgconfig_2.0.3  rlang_0.4.6      reprex_0.3.0     cli_2.0.2        DBI_1.1.0       
[19] rstudioapi_0.11  parallel_4.0.0   haven_2.3.1      withr_2.2.0      xml2_1.3.2       httr_1.4.1       fs_1.4.2         hms_0.5.3        generics_0.0.2  
[28] vctrs_0.3.1      grid_4.0.0       tidyselect_1.1.0 glue_1.4.1       R6_2.4.1         fansi_0.4.1      readxl_1.3.1     modelr_0.1.8     blob_1.2.1      
[37] magrittr_1.5     backports_1.1.8  scales_1.1.1     ellipsis_0.3.1   rvest_0.3.5      assertthat_0.2.1 ape_5.4          colorspace_1.4-1 stringi_1.4.6   
[46] lazyeval_0.2.2   munsell_0.5.0    broom_0.5.6      crayon_1.3.4    
@ETaSky ETaSky changed the title as.treedata not compatible with merge manipulation as.treedata is not working after tbl manipulation Jul 1, 2020
@ETaSky ETaSky changed the title as.treedata is not working after tbl manipulation as.treedata is not compatible with merge manipulation? Jul 1, 2020
@GuangchuangYu
Copy link
Member

GuangchuangYu commented Jul 2, 2020 via email

@ETaSky
Copy link
Author

ETaSky commented Jul 2, 2020

@GuangchuangYu Thank you! I tried to make it work by using full_join and it is good to know that the class can be assigned.

It seems that without the "tbl_tree" attribute, as.treedata will use as.treedata.tbl_df function, in which the check_edgelist function caused the error message.

check_edgelist <- function(edgelist) {
    if (dim(edgelist)[2] < 2)
        stop("input should be a matrix of edge list that holds the relationships in the first two columns")
    if (length(unique(edgelist[[1]])) > length(unique(edgelist[[2]]))) {
        children <- edgelist[[1]]
        parents <- edgelist[[2]]
    } else {
        children <- edgelist[[2]]
        parents <- edgelist[[1]]
    }
    root <- unique(parents[!(parents %in% children)])
    if (length(root) != 1)
        stop("Cannot find root. network is not a tree!")

    matrix(c(parents, children), ncol=2)
}

The line root <- unique(parents[!(parents %in% children)]) in my case would return an empty value. Because all parents are in children, which I think is not uncommon (because even for the root node, in the table, parent and child has the same node number). As a result, I think maybe this function should be updated?

Thanks!

@GuangchuangYu
Copy link
Member

I noticed this either.

I think a better solution is to implemented a merge method.

If you install the github version of tidytree. The following code should works:

# This is a tree I created using the taxonomy info of several genera, the branch length is fake
Tre = "(((((D_5_-G1:20)D_4_Enterobacteriaceae:20)D_3_Enterobacteriales:20)D_2_Gammaproteobacteria:20)D_1_Proteobacteria:20,((((D_5_Agathobacter-G2:20,D_5_CAG.56-G3:20)D_4_Lachnospiraceae:20,(D_5_Ruminococcaceae.UCG.010-G10:20)D_4_Ruminococcaceae:20)D_3_Clostridiales:20)D_2_Clostridia:20,(((D_5_Asteroleplasma-G4:20)D_4_Erysipelotrichaceae:20)D_3_Erysipelotrichales:20)D_2_Erysipelotrichia:20,(((D_5_Dialister-G9:20)D_4_Veillonellaceae:20)D_3_Selenomonadales:20)D_2_Negativicutes:20)D_1_Firmicutes:20,((((D_5_Bacteroides-G5:20)D_4_Bacteroidaceae:20)D_3_Bacteroidales:20)D_2_Bacteroidia:20)D_1_Bacteroidetes:20,((((D_5_Candidatus.Lumbricincola-G6:20)D_4_Mycoplasmataceae:20)D_3_Mycoplasmatales:20,((D_5_uncultured.bacterium-G7:20)D_4_uncultured.bacterium:20,(D_5_-G8:20)D_4_:20)D_3_Mollicutes.RF39:20)D_2_Mollicutes:20)D_1_Tenericutes:20)D_0_Bacteria:1;"

# convert to treeio tree object
Tre_td <- as.treedata(ape::read.tree(text = Tre))
# convert to tibble
Tre_tb <- as_tibble(Tre_td)
str(Tre_tb)
# test manipulation

#############################
## now merge(tbl_tree, ...) output tbl_tree object
##############################
Tre_tb_t <- merge(Tre_tb, Tre_tb %>% select(4) %>% mutate(Test = "AAA"), by.x = 4, by.y =1) 
str(Tre_tb_t)
Tre_tb_t %>% as.treedata() 

@ETaSky
Copy link
Author

ETaSky commented Jul 10, 2020

Thanks for the quick fix.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants