Frequently Asked Questions | BEAST Documentation

Frequently asked questions about BEAST.

Introduction

BEAST is a cross-platform program for Bayesian analysis of molecular sequences using MCMC. It is entirely orientated towards rooted, time-measured phylogenies inferred using strict or relaxed molecular clock models. It can be used as a method of reconstructing phylogenies but is also a framework for testing evolutionary hypotheses without conditioning on a single tree topology. BEAST uses MCMC to average over tree space, so that each tree is weighted proportional to its posterior probability. We include a simple to use user-interface program, called BEAUti, for setting up standard analyses and a suit of programs for analysing the results.

What can BEAST do?

BEAST is a software package to perform Bayesian analysis of molecular sequences using MCMC and focuses on estimating phylogenies using a wide range of models.

Here is list some of BEAST's features and models.

Introductory, advanced and workshop tutorials can be found on this website.

Why is it called BEAST X?

We have renamed the release that would have been called BEAST v1.10.5 as BEAST X v10.5.0. This is to distinguish it from the independent project called BEAST 2 – see: https://www.beast2.org. These projects have been running in parallel for many years which has restricted this BEAST project to using the v1 major release version and only incrementing the minor release version. We have renamed this project BEAST X to denote its orthogonal axis in project-space and thus call this v10.#.# as the new major version. The next major version of BEAST X will be v11.0.0, and so on.

Installing and running BEAST

How do I install and run BEAST?

Look at this page for instructions about installing BEAST on different operating systems

What is BEAGLE and should I use it?

BEAGLE is a high-performance computational library that can perform the core calculations at the heart of most Bayesian and Maximum Likelihood phylogenetics packages. It can make use of highly-parallel processors such as those in 3D graphics boards (referred to as Graphics Processing Units or GPUs) found in many PCs. In general using it (even if not using a GPU) will improve the performance of BEAST. However, it is not necessarily going to benefit all data sets. In particular, for the use of a GPU to be efficient, long partitions are required (perhaps >500 unique site patterns). Only high-end GPUs such as the GTX590 or Tesla boards will provide sufficient benefits.

See this page for more information about BEAGLE.

How do I achieve optimal performance when using BEAGLE with BEAST?

See this page on how to optimize performance when using BEAGLE with BEAST..

Interpreting the results

How do I do model comparison?

Accurate model comparison in Bayesian phylogenetics is typically performed using path sampling, stepping-stone sampling or generalized stepping-stone sampling.

How do I summarize the posterior distribution of trees?

See this page for an explanation of various approaches to summarizing trees.

What is the maximum clade credibility (MCC) tree produced by TreeAnnotator?

The tree produced by TreeAnnotator (denoted the maximum clade credibility or MCC tree) is not a consensus tree such as that produced by the 'sumt' command in MrBayes. Instead, TreeAnnotator picks one of the trees actually present in the sample produced by BEAST - thus it is a tree that was actually visited by the MCMC sampler. The tree it picks is the one that has the highest product of all the clade credibilities (posterior probabilities) for all the clades in the tree. The motivation with this is to find a single 'point estimate' tree that is in some way central to the distribution of trees. This tree is then given (annotated with) summary information for the full set of trees from the sample. For example, it is given average node heights, credible intervals for the node heights, rates, etc.

See this page for an explanation of various approaches to summarizing trees.

Why does my tree produced by TreeAnnotator have negative branch lengths?

MCC trees produced by TreeAnnotator can have a descendent node that is older than its direct ancestor (a negative branch length). This may seem like an error but is actually the correct behaviour. The MCC tree is, by default, generated with average node heights across all trees in the sample which contain that clade. The negative branch lengths result when a clade is at low frequency and tends not to occur in those trees that have the MCC tree's ancestral clade (or vice versa). This means the average heights are for the adjacent nodes are derived from different sets of trees and may not have any direct ancestor-descendent relationship.

Effective Sample Size (ESS) of parameters

What is an ESS?

The Effective Sample Size (ESS) of a parameter sampled from an MCMC (such as BEAST) is the number of effectively independent draws from the posterior distribution that the Markov chain is equivalent to.

How do I calculate an ESS?

The simplest way is to load your BEAST log files into Tracer.

Why do I need to increase it?

If the ESS of a parameter is small then the estimate of the posterior distribution of that parameter will be poor. In Tracer you can calculate the standard deviation of the estimated mean of a parameter. If the ESS is small then the standard deviation will be large. This is exactly the same as the sample size of an experiment consisting of measurements.

What size ESS is adequate?

The larger the better. Tracer flags up ESSs < 100 but this may be liberal and > 200 would be better. On the other hand chasing ESSs > 10000 may be a waste of computational resources.

Do I need adequate ESSs for all my parameters?

Possibly not. Really low ESSs may be indicative of poor mixing but if a couple of parameters that you are not interested in are a little low it probably doesn't matter. The likelihoods (both of the tree and coalescent model) should have decent ESSs.

Is the ESS important if I am interested in the sample of trees?

Definitely. At the moment we don't have anyway of directly examining the ESS of the tree or the clade frequencies. Therefore, it is important that the continuous parameters and likelihoods have adequate ESS to demonstrate good mixing of the MCMC.

Do I need to worry about optimizing operators if my ESSs are okay?

No. Tuning the operators will only increase the efficiency of the sampling - resulting in better ESSs for the same chain length. If you are already getting suitable ESSs then that is fine. See this tutorial for more details about this subject.

How do I increase the ESS of a parameter?

Take a look at this brief tutorial for ways of increasing the ESS.

Why does the operator analysis continue to suggest that I decrease my <subtreeSlide>'s size attribute in order to improve my acceptance probability?

The size value in the <subtreeSlide> operator should be proportional to the height of your tree (say about 10% initially). If the tree is uncalibrated then the height of the tree is given in substitutions per site which can be very small.

Setting up models

How do I run BEAST without data to sample the Prior?

In BEAUti, on the MCMC tab, click the checkbox 'Sample from prior only - create empty alignment', save the XML and run with BEAST. Alternatively, in the XML file, remove (or comments out) the entries in the <likelihood> block.

How do I tell BEAST to use an outgroup?

The simple answer is that you may not want to - BEAST will sample the root position along with the rest of the nodes in the tree. If you then calculate the proportion of trees that have a particular root, you obtain a posterior probability for this root position. However if you have a strong prior for an outgroup then you can constrain the ingroup to be monophyletic.

Which non-parametric demographic model should I use to analyse a single gene?

A description of the coalescent tree priors (or demographic priors) can be found here. There are essentially three different non-parametric demographic priors available: the Bayesian Skyline, the (GMRF) Bayesian Skyride and the Bayesian Skygrid. Basically, we suggest using Skyride over Skyline as it is a straightforward development of the Skyline with fewer user specified options.

The question of Skygrid vs Skyride is more complex. Skyride, like the original Skyline, scales its demographic curve to the height/age of the tree, with the changes in population sizes being concordant with the nodes in the tree. That is, as the tree grows and shrinks over the course of the MCMC, the Skyride timescale does too. Skygrid on the other hand requires you to define a fixed time line and grid points where the population size changes. The tree scales within this grid (i.e., the grid doesn't change even though the tree does). This is useful when you know what the timeline of the process is. Skygrid allow allows multi-locus analysis but this is probably not useful for viruses (unless they are independent epidemics being controlled by the same process). Finally, Skygrid allows covariates of the population dynamics to be incorporated and tested.

Does it matter what order the Priors & Likelihoods come in the XML?

Yes. BEAST will evaluate each component in order starting with the priors. If any of these are zero, then the rest of the posterior is not calculated. Thus it is particularly important that constraints, like <booleanLikelihood> and <uniformPrior> which may give zero probabilities, are put at the beginning of the <prior> section:

<mcmc id="mcmc" chainLength="1000000" autoOptimize="true">
    <posterior id="posterior">
        <prior id="prior">
            <uniformPrior id="constraint" lower="90.0" upper="100.0">
                <parameter idref="rootHeight"/>
            </uniformPrior>
            <coalescentLikelihood idref="coalescent"/>
        </prior>
        <likelihood id="likelihood">
            <treeLikelihood idref="treeLikelihood"/>
        </likelihood>
    </posterior>

Starting tree and fixing trees

Can you specify a user-defined starting tree?

Yes, you can insert a starting tree in Newick format into the BEAST XML using a text editor:

<newick id="startingTree">
    insert you starting tree here in Newick format
</newick>

The <treeModel> XML element then needs to contain a reference to this starting tree XML element:

<newick idref="startingTree"/>

Alternatively, you can provide a starting tree using BEAUti, in the Trees panel. You can load a file containg one or more trees (in Newick format) into BEAUti using the "Import Data" menu option.

How can you keep this topology constant while estimating other parameters, e.g. node height?

In short, you need to remove all the operators that act on the treeModel. In BEAUti, you can do this by deselecting the following operators: narrow exchange, wide exchange, Wilson Balding and subtree slide. Alternatively, you can remove these operators from the XML using a text editor (but don't remove the operators acting upon treeModel.rootHeight or treeModel.internalNodeHeights). Without these operators, the actual topology of the tree will not be altered.

How can I see the initial tree of my analysis?

If you want to print the initial tree for your analysis printed to screen, you can add the following to the BEAST XML file using a text editor:

<report>
    <treeModel idref="treeModel"/>
</report>

Tags: