-
Notifications
You must be signed in to change notification settings - Fork 6
Likelihood derivatives
Optimization methods like L-BFGS-B or Newton-Raphson (NR) require the computation or the likelihood function derivatives. In particular, NR method requires us to evaluate both the function f (x), its first derivative f'(x), and its second derivative f''(x) at an arbitrary point x. Derivating the likelihood function over certain parameters is not easy at all, so in methods like L-BFGS-B where only the first derivative is required, we use an approximation instead of the actual derivative:
Nevertheless, for branch length optimization we can easily compute the derivatives, and hence the optimization procedure will be more accurate. In particular, NR method for optimization (or, in other words, NR for approximating the root of the first derivative) is as follows:
In our case, f (x) is the likelihood function, so the NR formula is then:
where,
P r (t) is the P matrix computed for rate r and branch length t. U , λ is the eigende- composition of the Q matrix (eigenvectors and eigenvalues). Although NR algorithm is not covered in this section, it is the main motivation for computing the first and second likeli- hood derivatives. P (t) is the only element in the likelihood formula that depends on t. When derivating the likelihood function and iterating over the branch lengths, everything else but the P matrix remains constant, so we can focus on the P matrix derivatives:
Finally, in order to compute the likelihood derivatives we only need to combine equations 6 amd 7 with equation 4. However, we also need to take into account that instead of the product of the per-site likelihoods, we work with the sum of the per-site log-likelihoods. Therefore,