This tutorial provides a step-by-step explanation on how to reconstruct the evolutionary dynamics of seasonal influenza based on a set of H3N2 virus sequences which have been isolated over several seasons in New York State. The data is a subset of a comprehensive data set spanning several epidemic seasons in the New York state, which has been used to unravel the genomic and epidemiological dynamics of this virus (Rambaut et al., 2008). We will examine how H3N2 diversity fluctuates through time.

In this tutorial, we will reconstruct a Bayesian skygrid of human influenza A, H3N2 subtype, over three northern hemisphere epidemic seasons. The data set contains 165 Hemagglutinin gene sequences and can take a few hours to run so this tutorial will discuss how to set up this analysis and then how to summarize the results based on runs, already performed, that can be downloaded from this site.

Running BEAUti

Run BEAUti, load the nexus file (NewYork.HA.2000-2003.nex) and set the Tips panel’s Parse Dates to the last numerical field in the sequence names as previously. Set the same evolutionary model (including gamma distributed rate variation) and clock model as in the previous exercise. In the Trees tab, select a Coalescent: Bayesian SkyGrid as the Tree Prior. We will construct a grid of 50 intervals over 5 years (Time at last point: 5) before the most recent sampling date (2003.98 in our case, so going back to about 1999) thus estimating 10 population sizes per year. The panel should look like this:

At this point we would usually generate the BEAST XML file, load it into BEAST and run it. However, this data set is a bit bigger than before and the model is a bit more computationally intensive so rather than waiting around for it to finish, you can go straight on an analyse some results files that have already been run.

Analyzing the BEAST output

Using Tracer, we can analyze the run based on the output files provided (load the file called NewYork.HA.2000-2003.log). This has been run with a chain length of 50,000,000 sampling every 5,000 steps so a total of 10,000 samples:

To reconstruct the Bayesian skygrid plot, select SkyGrid reconstruction... under the Analysis menu. The following window should appear:

Set the manual bin range from 1999 to 2004 and specify 2003.98 as the Age of the youngest tip at the bottom. Press OK and after some time, the following Bayesian skyGrid reconstruction should appear (with solid interval selected):

By default the y-axis is in a log scale. Press the Axes button and turn off Log axis for the Y axis. You will also need to set the Manual range because the upper HPD bounds will be very large. Set this to 0 to 100 and you get the following:

Here you can see that the seasonal peaks very strong (but that the uncertainty denoted by the credible intervale is also very large).

References

  • Gill MS, Lemey P, Faria NR, Rambaut A, Shapiro B, and Suchard MA (2013) Improving Bayesian population dynamics inference: a coalescent-based model for multiple loci. Mol Biol Evol 30, 713-724.
  • Minin VN, Bloomquist EW and Suchard MA (2008) Smooth Skyride through a Rough Skyline: Bayesian Coalescent-Based Inference of Population Dynamics. Molecular Biology and Evolution 25:1459-1471; doi:10.1093/molbev/msn090.
  • Rambaut A, Pybus OG, Nelson MI, Viboud C, Taubenberger JK, Holmes EC (2008) The genomic and epidemiological dynamics of human influenza A virus. Nature, 453: 615-9.

Help and documentation

The BEAST website: http://beast.community

Tutorials: http://beast.community/tutorials

Frequently asked questions: http://beast.community/faq

Tags: tutorial