Frequently asked questions about BEAST.

Introduction

BEAST is a cross-platform program for Bayesian analysis of molecular sequences using MCMC. It is entirely orientated towards rooted, time-measured phylogenies inferred using strict or relaxed molecular clock models. It can be used as a method of reconstructing phylogenies but is also a framework for testing evolutionary hypotheses without conditioning on a single tree topology. BEAST uses MCMC to average over tree space, so that each tree is weighted proportional to its posterior probability. We include a simple to use user-interface program, called BEAUti, for setting up standard analyses and a suit of programs for analysing the results.
BEAST is a software package to perform Bayesian analysis of molecular sequences using MCMC and focuses on estimating phylogenies using a wide range of models.

Installing and running BEAST

BEAGLE is a high-performance computational library that can perform the core calculations at the heart of most Bayesian and Maximum Likelihood phylogenetics packages. It can make use of highly-parallel processors such as those in 3D graphics boards (referred to as Graphics Processing Units or GPUs) found in many PCs. In general using it (even if not using a GPU) will improve the performance of BEAST. However, it is not necessarily going to benefit all data sets. In particular, for the use of a GPU to be efficient, long partitions are required (perhaps >500 unique site patterns). Only high-end GPUs such as the GTX590 or Tesla boards will provide sufficient benefits.

See this page for more information about BEAGLE.

Interpreting the results

Accurate model comparison in Bayesian phylogenetics is typically performed using path sampling, stepping-stone sampling or generalized stepping-stone sampling.

The tree produced by TreeAnnotator (denoted the maximum clade credibility or MCC tree) is not a consensus tree such as that produced by the 'sumt' command in MrBayes. Instead, TreeAnnotator picks one of the trees actually present in the sample produced by BEAST - thus it is a tree that was actually visited by the MCMC sampler. The tree it picks is the one that has the highest product of all the clade credibilities (posterior probabilities) for all the clades in the tree. The motivation with this is to find a single 'point estimate' tree that is in some way central to the distribution of trees. This tree is then given (annotated with) summary information for the full set of trees from the sample. For example, it is given average node heights, credible intervals for the node heights, rates, etc.

See this page for an explanation of various approaches to summarizing trees.

MCC trees produced by TreeAnnotator can have a descendent node that is older than its direct ancestor (a negative branch length). This may seem like an error but is actually the correct behaviour. The MCC tree is, by default, generated with average node heights across all trees in the sample which contain that clade. The negative branch lengths result when a clade is at low frequency and tends not to occur in those trees that have the MCC tree's ancestral clade (or vice versa). This means the average heights are for the adjacent nodes are derived from different sets of trees and may not have any direct ancestor-descendent relationship.

Effective Sample Size (ESS) of parameters

The Effective Sample Size (ESS) of a parameter sampled from an MCMC (such as BEAST) is the number of effectively independent draws from the posterior distribution that the Markov chain is equivalent to.
The simplest way is to load your BEAST log files into Tracer.
If the ESS of a parameter is small then the estimate of the posterior distribution of that parameter will be poor. In Tracer you can calculate the standard deviation of the estimated mean of a parameter. If the ESS is small then the standard deviation will be large. This is exactly the same as the sample size of an experiment consisting of measurements.
The larger the better. Tracer flags up ESSs < 100 but this may be liberal and > 200 would be better. On the other hand chasing ESSs > 10000 may be a waste of computational resources.
Possibly not. Really low ESSs may be indicative of poor mixing but if a couple of parameters that you are not interested in are a little low it probably doesn't matter. The likelihoods (both of the tree and coalescent model) should have decent ESSs.
Definitely. At the moment we don't have anyway of directly examining the ESS of the tree or the clade frequencies. Therefore, it is important that the continuous parameters and likelihoods have adequate ESS to demonstrate good mixing of the MCMC.
No. Tuning the operators will only increase the efficiency of the sampling - resulting in better ESSs for the same chain length. If you are already getting suitable ESSs then that is fine. See this tutorial for more details about this subject.
The size value in the <subtreeSlide> operator should be proportional to the height of your tree (say about 10% initially). If the tree is uncalibrated then the height of the tree is given in substitutions per site which can be very small.