Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Reversibility of SPR #164

Open
teng-gao opened this issue Dec 6, 2024 · 8 comments · Fixed by #165
Open

Reversibility of SPR #164

teng-gao opened this issue Dec 6, 2024 · 8 comments · Fixed by #165

Comments

@teng-gao
Copy link

teng-gao commented Dec 6, 2024

Is the probability of moving from tree A to tree B via SPR the same as moving from tree B to A? Seems like for NNI it is the same but not for SPR. Asking because the regular SPR doesn't work with my tree MCMC (seems like there's a bias towards moving to certain tree topologies) ..

@ms609
Copy link
Owner

ms609 commented Dec 9, 2024

Interesting question; the function wasn't designed with MCMC in mind so I haven't thought about this from a probabilistic direction. I'll look into this further when I get the chance.

@teng-gao teng-gao changed the title RootIrrelevantSPR Reversibility of SPR Dec 9, 2024
@teng-gao
Copy link
Author

teng-gao commented Dec 9, 2024

Interestingly, while testing the frequency of trees in a chain with uniform transition probability I noticed that TreeSearch::SPR throws an error stochastically:

run_chain = function(tree_init, max_iter = 100, propose_tree = TreeSearch::NNI, seed = 0) {
    set.seed(seed)
    trees = list()
    trees[[1]] = tree_init
    for (i in 1:max_iter) {
        message(i)
        trees[[i+1]] = propose_tree(trees[[i]])
    }
    class(trees) = 'multiPhylo'
    return(trees)
}

set.seed(4)
ntips = 4
tree_init = rtree(ntips)

trees = run_chain(tree_init, max_iter = 2, propose_tree = TreeSearch::SPR)

output:

1
2
Error in child[brokenEdgeSister] <- child[mergeEdge]: replacement has length zero
Traceback:

1. propose_tree(tree)
2. SPRSwap(parent, edge[, 2], edgeToBreak = edgeToBreak, mergeEdge = mergeEdge)
3. .handleSimpleError(function (cnd) 
 . {
 .     watcher$capture_plot_and_output()
 .     cnd <- sanitize_call(cnd)
 .     watcher$push(cnd)
 .     switch(on_error, continue = invokeRestart("eval_continue"), 
 .         stop = invokeRestart("eval_stop"), error = invokeRestart("eval_error", 
 .             cnd))
 . }, "replacement has length zero", base::quote(child[brokenEdgeSister] <- child[mergeEdge]))

ms609 added a commit that referenced this issue Dec 10, 2024
@ms609
Copy link
Owner

ms609 commented Dec 10, 2024

Thanks for the report. I've identified the error, which pertains to how moves close to the root node were handled. This potentially exacerbated your initial observation, but was probably unrelated.

An SPR move is selected by first selecting an edge, then selecting a location to which the edge should be moved. Intuitively I feel that this does not correspond to a uniform sample of all possible SPR moves: edges are no more likely to be selected if there are lots of places to which they could be grafted, than if there is only one place they could be grafted; hence trees that can be reached by pruning a large clade will be chosen with a higher probability than those that can be reached by pruning a small clade. It feels plausible but unlikely that this always would result in symmetrical move probabilities, such that P(propose B given currently at A) = P(propose A given currently at B).

@ms609 ms609 linked a pull request Dec 10, 2024 that will close this issue
@ms609 ms609 reopened this Dec 10, 2024
@ms609
Copy link
Owner

ms609 commented Dec 10, 2024

PS you might see a modest speedup if you preallocate your list by replacing trees = list() with trees = vector("list", max_iter + 1) – as extending an existing list can cause the entire list to be copied, and the corresponding accesses to memory can be slow.

@teng-gao
Copy link
Author

Thanks for the quick fix and helpful explanation 😃

Indeed I did some testing and it seems that in a uniform chain, NNI traverses the tree space in equal frequencies, whereas SPR does not.
image

image
This behavior makes sense for SPR as per your explanation, so a hastings correction factor is probably needed.

@ms609
Copy link
Owner

ms609 commented Dec 11, 2024

You're welcome; thanks for drawing my attention to this, which is less of a concern in the context of parsimony optimisation; it might nonetheless be valuable to draw uniformly from all trees at 1 SPR distance, so I'm considering rewriting the function to do so.

I also wondered whether your analysis might be suited to RevBayes, which is built with phylogenetic MCMC in mind – it might be faster and more powerful than R, if it has the functionality you require.

@teng-gao
Copy link
Author

Thanks! I'm working on a custom phylogenetic method that does not fall within the scope of RevBayes. Both TreeDist and TreeSearch have been very useful in my project. It would indeed be useful to have a SPR function that has the uniform property; This paper seems to say that the hasting ratio should be 1 for rSPR; maybe their tree move function is implemented differently.

@ms609
Copy link
Owner

ms609 commented Dec 12, 2024

Sounds exciting! Glad to see an unanticipated use for these packages, and I'll look forwards to seeing what comes of your work.

Thanks to the link to Lakner et al.; a very interesting read. I'll make a start on implementing a symmetric version of the rSPR move (with Hastings Ratio 1), which is a special case of the eSPR – which I've not seen discussed in a parsimony context and would be useful to incorporate into the package. I'll reopen this issue as a placeholder for progress.

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants