Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

How to filter non-significant odd named taxa, and only keep the significant odd named taxa? #324

Open
catherineel opened this issue Oct 25, 2021 · 8 comments

Comments

@catherineel
Copy link

Hi there!

I've been using metacoder::filter_taxa(grepl(taxon_names, pattern = "^[a-zA-Z]+$"), reassign_obs = FALSE) %>%
to remove odd taxa, but some of the odd named taxa are significant and I would like them to be displayed on the tree.

Is there a way to only display the significant odd named taxa?

@zachary-foster
Copy link
Contributor

What do you mean by significant? Can you give me an example? You can make a list of taxa you want to be displayed no matter what and do this:

metacoder::filter_taxa(grepl(taxon_names, pattern = "^[a-zA-Z]+$") | taxon_names %in% my_taxon_name_list, reassign_obs = FALSE)

@catherineel
Copy link
Author

Statistical signifiance after correcting for multiple comparisons. This is what I did:

create a new column called wilcox_p_value_p.adjusted to correct for multiple comparison

obj$data$diff_table$wilcox_p_value_p.adjusted <- p.adjust(obj$data$diff_table$wilcox_p_value,
                                                          method = "fdr")

create a new column in diff_table containing log2_median ratio, then mutate this to remove values where wilcox.p.adjusted value is not significant, first create this new column with identical values
obj$data$diff_table$log2_median_ratio_wilcox.adjust <- obj$data$diff_table$log2_median_ratio

then mutate this new column to remove non-signif values
obj$data$diff_table$log2_median_ratio_wilcox.adjust[obj$data$diff_table$wilcox_p_value_p.adjusted > 0.05] <- 0

Then I created the tree to only display significant taxa after correcting for multiple comparisons at the genus level

set.seed(1)
obj %>% 
  metacoder::filter_taxa(taxon_ranks == "g", supertaxa = TRUE, reassign_obs = FALSE) %>%
  metacoder::filter_taxa(grepl(taxon_names, pattern = "^[a-zA-Z]+$"), reassign_obs = FALSE) %>%  
  heat_tree_matrix(
                   data = "diff_table",
                   node_size = n_obs,
                   node_label = taxon_names,
                   node_color = log2_median_ratio_wilcox.adjust, 
                   node_color_range = diverging_palette(), 
                   node_color_trans = "linear", 
                   node_color_interval = c(-8, 8), 
                   edge_color_interval = c(-8, 8), 
                   node_size_axis_label = "Number of OTUs",
                   node_color_axis_label = "Log2 ratio median proportions",
                   layout = "davidson-harel", 
                   initial_layout = "reingold-tilford", 
                   output_file = "diff tree.pdf")

Let me know if I am doing anything wrong

@zachary-foster
Copy link
Contributor

Ok, I understand now. Thanks for the code! I see that you set the non-significant taxa to 0 but I dont see where you are filtering them out. Either way, if you want to remove and taxa with odd names that are not significant you can do something like:

metacoder::filter_taxa(! (wilcox_p_value_p.adjusted > 0.05  & ! grepl(taxon_names, pattern = "^[a-zA-Z]+$")), reassign_obs = FALSE)

@catherineel
Copy link
Author

Thanks for that, but unfortunately I get this error when I replace

metacoder::filter_taxa(grepl(taxon_names, pattern = "^[a-zA-Z]+$"), reassign_obs = FALSE) %>%
with
metacoder::filter_taxa(! (wilcox_p_value_p.adjusted > 0.05 & ! grepl(taxon_names, pattern = "^[a-zA-Z]+$")), reassign_obs = FALSE)

Error: TRUE/FALSE vector (length = 1452) must be the same length as the number of taxa (242)

Oh did I do something wrong? I thought I did filter them out by having this line:
obj$data$diff_table$log2_median_ratio_wilcox.adjust[obj$data$diff_table$wilcox_p_value_p.adjusted > 0.05] <- 0
as it would filter the non signif ones after mutating and by choosing it to be displayed in the node_colour section?
Somehow it looked like it was filtered out in my tree when I did this

set.seed(1)
obj %>%
metacoder::filter_taxa(taxon_ranks == "g", supertaxa = TRUE, reassign_obs = FALSE) %>%
metacoder::filter_taxa(grepl(taxon_names, pattern = "^[a-zA-Z]+$"), reassign_obs = FALSE) %>%
heat_tree_matrix(
data = "diff_table",
node_size = n_obs,
node_label = taxon_names,
node_color = log2_median_ratio_wilcox.adjust,
node_color_range = diverging_palette(),
node_color_trans = "linear",
node_color_interval = c(-8, 8),
edge_color_interval = c(-8, 8),
node_size_axis_label = "Number of OTUs",
node_color_axis_label = "Log2 ratio median proportions",
layout = "davidson-harel",
initial_layout = "reingold-tilford",
output_file = "diff tree.pdf")

@zachary-foster
Copy link
Contributor

Can you send me an example data set with associated code that reproduces the issue? Its hard for me to debug without reproducing the error.

@catherineel
Copy link
Author

Sorry dumb question, but how do I send an example data?

My original data file is huge as it's a qza file from QIIME2 analysis and I'm not sure what I need to do to it.

@zachary-foster
Copy link
Contributor

No problem, its a common question.

If you can reproduce the error with a subset of the data, you can attach it to this issue to upload them. You can save the needed R objects to a file with readRDS at the point before the example code starts. You can also email the original data at zacharyfoster1989@gmail.com if you dont want it public and its small enough to email.

@catherineel
Copy link
Author

Thanks, I just emailed it to you!
I'm not sure if I did it correctly

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants