## Introduction

## Installing and running BEAST

BEAGLE is a high-performance computational library that can perform the core calculations at the heart of most Bayesian and Maximum Likelihood phylogenetics packages. It can make use of highly-parallel processors such as those in 3D graphics boards (referred to as Graphics Processing Units or GPUs) found in many PCs. In general using it (even if not using a GPU) will improve the performance of BEAST. However, it is not necessarily going to benefit all data sets. In particular, for the use of a GPU to be efficient, long partitions are required (perhaps >500 unique site patterns). Only high-end GPUs such as the GTX590 or Tesla boards will provide sufficient benefits.

## Interpreting the results

Accurate model comparison in Bayesian phylogenetics is typically performed using path sampling, stepping-stone sampling or generalized stepping-stone sampling.

The tree produced by TreeAnnotator (denoted the maximum clade credibility or MCC tree) is not a consensus tree such as that produced by the 'sumt' command in MrBayes. Instead, TreeAnnotator picks one of the trees actually present in the sample produced by BEAST - thus it is a tree that was actually visited by the MCMC sampler. The tree it picks is the one that has the highest product of all the clade credibilities (posterior probabilities) for all the clades in the tree. The motivation with this is to find a single 'point estimate' tree that is in some way central to the distribution of trees. This tree is then given (annotated with) summary information for the full set of trees from the sample. For example, it is given average node heights, credible intervals for the node heights, rates, etc.

See this page for an explanation of various approaches to summarizing trees.

## Effective Sample Size (ESS) of parameters

## Setting up models

A description of the coalescent tree priors (or demographic priors) can be found here. There are essentially three different non-parametric demographic priors available: the Bayesian Skyline, the (GMRF) Bayesian Skyride and the Bayesian Skygrid. Basically, we suggest using Skyride over Skyline as it is a straightforward development of the Skyline with fewer user specified options.

The question of Skygrid vs Skyride is more complex. Skyride, like the original Skyline, scales its demographic curve to the height/age of the tree, with the changes in population sizes being concordant with the nodes in the tree. That is, as the tree grows and shrinks over the course of the MCMC, the Skyride timescale does too. Skygrid on the other hand requires you to define a fixed time line and grid points where the population size changes. The tree scales within this grid (i.e., the grid doesn't change even though the tree does). This is useful when you know what the timeline of the process is. Skygrid allow allows multi-locus analysis but this is probably not useful for viruses (unless they are independent epidemics being controlled by the same process). Finally, Skygrid allows covariates of the population dynamics to be incorporated and tested.

<mcmc id="mcmc" chainLength="1000000" autoOptimize="true"> <posterior id="posterior"> <prior id="prior"> <uniformPrior id="constraint" lower="90.0" upper="100.0"> <parameter idref="rootHeight"/> </uniformPrior> <coalescentLikelihood idref="coalescent"/> </prior> <likelihood id="likelihood"> <treeLikelihood idref="treeLikelihood"/> </likelihood> </posterior>

## Starting tree and fixing trees

<newick id="startingTree"> insert you starting tree here in Newick format </newick>The <treeModel> XML element then needs to contain a reference to this starting tree XML element:

<newick idref="startingTree"/>Alternatively, you can provide a starting tree using BEAUti, in the Trees panel. You can load a file containg one or more trees (in Newick format) into BEAUti using the "Import Data" menu option.

<report> <treeModel idref="treeModel"/> </report>