This tutorial presents an outbreak scenario that starts with a small number of initial cases with no known infectious agent. We will initially identify the pathogen and perform some initial characterisation of the cases. Once the pathogen is known, we will get a curate an appropriate background set to contextualise the cases in the known diversity of the pathogen. The second timepoint brings us to later in the outbreak, with more human cases reported and a concurrent outbreak identified in pigs. The tutorial covers use of various tools for initial characterisation of a pathogen such as BLAST searching, MAFFT alignment, and maximum likelihood tree estimation using IQTREE, and assessing temporal signal with TempEst.

Introduction

Genomic epidemiology combines classical epidemiological methods with genome sequence data to track and monitor both endemic pathogens and the emergence and spread of novel pathogens. In this tutorial, we will follow an outbreak through different phases and use phylogenetic methods to answer questions pertinent to understanding and controlling the outbreak.

Disclaimer: this outbreak is a fictional scenario based on simulated data. It is not intended to be predictive or relate to any real-world outbreak, and any similarities are coincidental. For details on how this outbreak was simulated see Simulation Notes.

To undertake this tutorial, you will need to download a number of software packages in a format that is compatible with your computer system (all three are available for Mac OS X, Windows and Linux/UNIX operating systems):

FigTree - this is an application for displaying and printing molecular phylogenies, in particular those obtained using BEAST. At the time of writing, the current version is v1.4.4. It is available for download from https://github.com/rambaut/figtree/.
Run TempEst by double clicking on its icon. TempEst is an interactive graphical application for examining the temporal signal in a tree of time-stamped sequences by plotting the divergence of each tip from the root against the date of sampling (a root-to-tip plot).

You can download TempEst from here: https://github.com/beast-dev/Tempest/releases/latest

EXERCISE: A cluster of cases of unknown aetiology

Quick identification of causative pathogen

The below sequence is a contig ~18,000 bases long. A similar contig was found in each of the assemblies for the initial cases. We will perform a quick BLAST search to identify a likely causative agent.

>contig1|pathogenX
ACCAAACAAGGGAGAATATGGATACGTTACAATATATAACGTATTTTTAAAACTTAGGAACCAAGACAAACACTTTTTGTCTTGGTATTGGATCCTCAAGAAATATATCATCATGAGTGATATCTTTGAAAAGGCGGCGAGTTTTAGGAGTTATCAATCTAAGTTAGGGACAGATGGGAGGGCTAGTGCAGCAACTGCTACTTAGACAACCAAGATAAGGGTATTTGTACCAGCTACTAATAGTCCAGAGCTCAGATGGGAACTAACATTGTTTGCACTTGATGTGATTAGATCTCCGAGTGCTGCCGAGTCAATGAAAGTTGGAGCTGCTTTTACACGCATCTCTATGTATTCAGAGAGACCCGGGGCTCTCATTAGAAGTCTCCTCAATGACCCAGACATTGCAGCTGTAATAATTGATGTTGGATCAATGGTCAACGGAATACCAGTGATGGAGAGGAGAGGAGACAAGGCTCAGGAGGAGATGGAAGGCTTGATGAGAATCCTCAAAACTGCTCGGGACTGCAGCAAGGGAAAGACACCTTTTGTTGACAGGCGAACTTACGGCCTACGGATAACAGACATGAGCACCCTGGTCTCTGCACTTATCACCATCGAGGCCCAGATCTGGATACTGATCGCTAAAGCAGTTACAGCTCCCGACACTCCCGAGGAAAGTGAAACTAGAAGATTGGCTAAATACGTCCAAGAAAAGAGAGTCAATCCGTTCTTTGCTCTAACTCAGCAATGGCTAACAGAAATGAGGAATCTGCTCTCCCAGAGTCTATCAGTAAGGAAGTTTATGGTTGAGATCCTCATAGAAGTCAAGAAAGGAGGATCTGCTAAAGGCAGAGCAGTAGAAATAATCTCAGACATTGGAAACTATGTCGAGGAAACTGGTATGGCAGGATTCTTCGCAACCATCAGATTCGGGTTGGAGACAAGGTATCCAGCCCTTGCACTCAACGAATTCCTGAGTGACCTCAACACCATCACAAGCTTGATGCTACTCTACAGCGAAATTGGCCCAAGGGTCCCTGATATGGTGCTTCTTGAAGAATCAATTCAGACTAAGTTTGCCCCTGGAGGTTACCCATTATTGTGGAGCTTTGCCATGGGTGTGGCTACTACTATTGACAGGTCTATGGGGGCATTAAATATCAATCGTGGTTATCTTGAGCCTATGTATTTCAGACTTGGCCAAAAATCAGCACGTCACCATGCTGGAGGAATTGCTCAAAACATGGCACATAGACTGGGACTAAGTTCAGATCAAGTTGCAGAACTCGCTGCTGCAGTTCAGGAAACATCAGCAGGAAGGCAAGAGAGTAACGTTCAGGCTAGAGAGGCAAAATTTGCTGCAGGAGGTGTGCTCATTGGAGGCAGTGATCAAGATGTCGATGAAGAGGAAGAACCTATAGAATAGAGTGGCAGACAGTCAGTTACCTTCAAAAGGGAGATGAGTATTGCATCCCTTGCTGACAGTGTACCGAGCAGTTCTGCGAGCACATCCGGTGGGACCAGATTGACTAATTCATTACTAAACCTCAGATCAAGACTGGCTGCTAAAGCAGCAAAAGAAGCCGCCTCATCCAATGCAACAGATGATCCAGCAATCAGCAACAGAACTCAAGGGGAATCAGAGAAGAAGAATAATCAAGACCTCAAACCTACTCAAAATGACCTTGATTTCGTCAGAGCTGATGTGTGACGTCTATTTCCAATATTCTACAATATCCAAAAATCTTTTTATGGTACACTATCATAATACGACACTAAGGGATCAACCACCTCAAAGTTGCGACTCGTTTTAATTATATTAATCAAATGATACTCTTTTATGGGCAAACCGAAGAACCAATGTCTACACGTAAATTGAGCTTTGGTATTGCAATCTAATACTTGCTCAAAATCTTGAACTATTTGTGTAATTTCTATCATCATAGAGTTATAAAGTTTTTATTATATAAGTTGGTGCAGATCTTTGGACATGAATTACACACTACACTCTAATGAAGATAAAATTTTCATTACATATTTAAGGACTATTTCCTATCCTTTCAATGGTGCTTGGTTATGAAGGTTTCTTAATTTAAACAAGCTACTATCTTCGCACTGGAATATACAATACCTCTTACCTCATTTCTTACTTTAATATCATGTTATTTTTTTGATAAGTCACTTAACTTGACCAAGGTCTACCAGGCAATTCTCACACAAGTGAACTGCAGTCCCAACTTAGATTAAACATAATCATGCAAAATCACTATTTTGCACTACTAACTTATTAAGAAAGACTTAGGATCCAAGATATTTACTCTAGGATATCCTATTAGACTTAGCAGTCATTGGTTGAGGGTTCCCCTTACAAAACTCTAACCTTCACTCTAATAACAATTCATCCAATGGATAAATTGGAACTAGTCAATGATGGCCTCAATATTATTGACTTTATTCAGAAGAACCAAAAAGAAATACAGAAGACATACGGACGATCAAGCATTCAACAACCCAGCATCAAAGCCCGAACAAAAGCCTGGGAAGATTTGCTGCAGTGCACCAGTGGAGAATCTGAACAAGTTGAGGGGGGAATGTCTATGGATGATGGAGATGTTGAAAGAAGAAACGTGGAGGATCGATCCAGTACTTCTCCCACAGATGGAACTATTGGAAAGAGAGTGTCGAACACCCGTGACTGGGCAGAAGGTTCAGATGACATACAACTGGACCCAGTGGTTACAGACTTTGTATACCATGATCATGGAGGAGAATGTACCGGATATGGATTTACTTCAAGCCCTGAGAGAGGGTGGAGTAATTACACATCAGGAGCAAACAATGGGAATGTATGTCTTGTATCTGATGCAGAGATGCTGTCCTATGCTCTCGAAATTGCAGTTTCTAAAGAAGATCGGGAAACTGATCTAGTTCATCTTGAGAATAAACTATCTACTAAAGGACTGAAGCCCACAGCAGTACCGTTCACTCCGAGAAACCTGTCTGATCCTGCAAAAGACCCTCCTGTGATTGCTGAACACTACTACGGACTAGGAGTTAAAGAGCAAAACGTTGGCCCCCAGACTAGCAAAAATGTCAATTTGGACAGCATCAAATTGTACACATCAGATGACGAAGAGGCAGATCCGCTTGAATTCGAAGATGAGTTAGCAGGGAGCTCAAGTGAAGTGATACCCGGCAATACTCCTGAAGATGAAGAGCCTTCAAGTGTTGGCGGAAAACCCAATGAATCGATTGGACATACAATCGAAGGCCAATCAACCCGAGACAGCCTCCAAGCCAAGGGCAACAAACCAGCAGATGAACCAGGAGCAGGACCGAAAGATTAGGCCGTGAAGGAAGAACAACCCCAGAAGAGGCTACCTATGTTAGCTGAAGAATTTGAGTGCTCTGGATCGGAAGACCCAATCCTTCGGGAGCTGCTGAAGGAGAACTCACTCATAAATTGTCAACAAGGGAAAGATGCTCAGCCTCCATATCATTGGAGCATCGAGAGGTCAATAAGCCCGGATAAGACTGTGATCGCCAACGGTGCTGTGCAAACTGCTGACAGGCAAAGACCAGGAACTCCGATGCCAAAGTACCGAGGCATTCCTATTAAAAAGGGCACAGACGCGAAATATCCATCTGCTGGGACGGAAAACGTGCCTGGGTCGAAGAGTGGTGCAACCCGGCATGTTCGAGGATCACCCCCCTACCAAGAAGGCAAGAGTGTCAATGCGGAGAATGTCCAATTGAATGCTTCCACTGCGGTGAAGGAAACTGATAAGGCAGAAGTAAACCCCGCAGACGACAACGACTCACTCGATGATAAATACATCATGCCTTCAGATGATTTCTCAAACACTTTCTTCCCGCACGACACTGATCGCTTGAATTATCACGCAGATCATTTAGTTGATTATTACCTCGAAACCCTGTGTGAAGAGTCAGTTCTGATGGGAGTGATCAACTCTATAAAATTAATTAATCTGGACATGCGCTTAAATCACATTGAAGAACAAGTTAAAGAGATCCCAAAGATCATCAATAAGCTTGAGTCCATTGACAGAGTTTTGGCCAAGACCAATACCGCACTCTCAACCATTGAAGGACACCTGGTTTCCATGATGATAATGCTACCAGGGAAAGGGAAAGGAGAAAGAAAGGGCAAAAGTAATCCTGAGCTTAAACCAGTGATAGGAAGAGACATTCTAGAGCAGCAATCTCTTATCTCTTTTGACAATTTCAAGAATTTCAGAGATGGATCGTTGACAAACGAACCGTATGGGGCAGCTGTACAATTGAGAGAAGATCTTATTCTTCCTGAACTTAAGTTTGAGGAGACAAATGCATCTCAATTTGTTCCTATGGTAGATGGTTCATCCAGAGATGTTGTCAAGACATTGATAAGGACTCACATTAAAGATAGAGAGTTGAGATCAGAACTGATTGGTTACCTGAATAAAGCGGAAAATGATGAGGAAATTCAGGAGATAGCCAACACTGTCAATGACATCATTGACGGTAACATTTGATCACTGAATTGTCAGCAGAAATACAATGATCTATCAACAATCTCCCACAAGTAGACAATGGTTTCAGGTCAATAATAACAACCTCAATACTAATCTTTCACATAAGCATTACCCATTCTAGCCCTCAGACGATAACACAGTACTTGATACATGTTTATTGAAGTGTATGTAGCATGATTGAACTATTCAATCACTGTATTTCTCACTGTTGCTCTTAGTTAGTCATTGTATCTAATAATTATTATTACAGTACAAGGTATTATGAATTCAAAGATACGCAATAAATCTGATATCAGCATAGAGTAGAAAATTGTTGTTTTTGTCATGATCATTAGAAGATTTAACAATGATGTCAACTATCGTACCTAAACACAATAACATAAAATGGTCGATTTGTATTGTAGATCTCTCACGCATTTTAGTCTTATGAATTAGTGTTTCAAATCAGTTGCATATCAATTAAGAAAAACTTAGGAGACAGGTATAGAACCTCTCTTTCAGATAACTCGTCAATTAAGGACAGAAATTCTGTTTCTCAAATCCGCTAGCCTTTGCCAGAGAGGACACAAGCAATGTAGCCGGACATCAAGAGTATTTCAAGTGAGTCAATGGAAGGAGTATCTGATTTCAGCCCTAGTTCTTGGGAGCATGGTGGGTATCTTGATAAGGTTGAACCAGAAATTGATTAAAATGGCAGTATGATTTCAAAATACAAGACCTAAACCCCAGGAGCTAACGAGAGGAAATACAACAACTACATGTACCTTATATGTTACGGCTTTGTTGAAGATGTTGAGAGAACCCCAGAGACAGGGAAACGCAAGAAGATCAGGACAATTGCTGCCTCCCCTCTGGGTGTTGGTAAGAGTGCCTCTCATCCCCTAGATCTTCTGGAGGAACTCAGTTCCCTCAAAGTTACTGTGAGGAGAACAGCTGGATCAACTGAGAAAATTGTGTTTGGATCATCTGGCCCTCTAAATCACCTCGTTCCCTGGAAGAAAGTACTGACTGGTGGTTCAATTATTAATGCAGTCAAGGATTGTCGGAACGTTGATCAGATACAGCTCGACAAGCATCAAGCTCTGAGAATATTTATTCACAGTATCACAAAGCTCAATGATTCTGGAATCTACATGATTCCACGAACCATGCTAGAGTTCAGGAGAAACAATGCCATTGCCTTCAATCTTCTAGTGTACTTGAAGATTGATGCTGATTTATCCAAAATGGGGATCCGGGGAAGCCTCGATAAAGATGGCTTCAAGGTTGCCTCCTTCATGCTACACTTGGGGAAATTTGTCCGTCGTGCAGGGAAGTATTACTCTGTTGATTATTGAAGGAGAAAGATTGATAGGCTGAAATTGCAGTTTTCACTGGGTTCCATAGGCGGACTAAGTCTCCACATTAAGATCAATGGTGTAATCAGCAAACGGCTGTTTGCTCTAATGGGATTCCAAAAATACCTTTGTTTCTCCTTGATGGACATCAATCCTTGGCTCAACAGATTGACCTGGAACAACAGTTGTGAGATCAGCCGAGTAGCAGCTGTCTATCAGCCTTCTGTTCCAAGAGAGTTCATGATCTATGATGATGTCTTCATTGACAATACAGGGAGACTTCTAAAGGGCTAAACAGAATTCTTCTAAAATTTAATCAGTCATGAGTTTAGTAATCATACCTAGTCTTAATACATCACACAGGACTATTTACAAAAGACAATTAAAAAATAGAATAATCATGTAGTAGTAATTGAGAACATTATTAGAATGGTATAGCTAAAATGTAGTTTTTTTGAGTATTTGGTTTAAAATTAGATAACTATTACAAAAAACTTAGGAGCCAAGCTCTTGCCTCGTTCAGAAGTTTAAACAAGCATTCTTACCATTGGATAAACAAAAGGATTGGTTTTATCGTCTAAGAAATTTCTTGAAAGGCAAAGAGATTCCAGGTTTTATGTTGAATGAGGTCTATCAAACCAAGGAGACCCTCTAACAGCCAGGTCATAGGAATATAAATAAAAATAAGAATAAATATAAAATTGATTCCATCGAAAGATTCATTTCAAAAAGTGACCAAATAAAAGCGGTTGGCAGACCTACCAATCATATACCACAAGACTCGACAATGGTAGTTATACTTGACAAGAGATGTTAGTCTAATCTTTTAATACTGACTTTGATGATCTCGGAGTGTAGTGTTGGGCTTCTACATTATGAGAACTTGAGTAAAATTGGAATTGTCAAAGGAATAACAAGAAAATACAAGATTAAAAGCAATCCTCTCACAAAAGCCATTGTTATAAAAATGATTCCGAATGTGTCGAACATGTCTCAGTGCACAGGGAGTGTCATGGAAAATTATAAAACACGATTAAACGGTATCTTAACACCTATAAAGGGAGCGTTAGAGATCTACAAAAACAACACTCATGACCTTTTCGGTGATGTGAGATTAGCCGGAGTTTTAATGGCAGGAGTTGCTATTGGGATTGCAACCGCAGCTCAAATCACTGCAGGTGTAGCATTATATGAGGCAATGAAAAATGCTGACAACATCAACAAACTCTAAAGCAGCATTGAATCAACTAATGAAGCTGTCGTTAAACTTCAAGAGACTGCGGAAAAGACAGTCTATGTGCTGACTGCTCTACAGGATTACATTAATACTAACGTGGTACCGACAATTGACAAGATTAGCTGCAAACAGACAGAACTCTCACTTGATCTGGCATTATCAAAGTACCTTTCTGATTTGCTTTTTGTATTTGGCCCCAACCTTCAAGACCCAGTCTTTAAATCAATGACTATACAGGCTATATCTCAGGCATTCGGTGGAAATTATGAAACACTGCTAAGAACATTGGGTAACGCTACAGAAGACTTTGATGATGTTCTAGAAAGTGACAGCATAACGGGTCAAATCATCTATCTTGATCTAAGTAGTTAATATATAATTGTCAGGGTTTATTTTCCTATTCTCACTGAAATTCAACAGGCCTATATCCAAGAGTTGTTACCAGTGAGCTTCAACAATGATATTTCAGAATGGATCAGTATTGTCCCAAATTTCACATTGGTAAGGAATACATTAATATCAAATATAGAGATTGGATTTTGCCTAATTTCAAAGAGGAGCGTGATCTGCAACCAAGATTATGCCACACCTATGACCAACAACATGAGAGAATGTTCGACGGGATCGACTGAGAAGTGTCCTCGAGAGCTGGTTGTTTCATCACATGTTCCCAGATTTTCACTATCTAACGGGGTTCTGTTTGCCAATTGCATAAGTGTCACATGCCAGTGTCAAACAACAGGCAGGGCAATCTCACGGTCAGGAGAACAAACTCTGCTGATGATTAACAACACCACCTGTCCTACAGCCGTACTCGGTAATGTGATTATCAGCTTAGGGAAATATCTGGGGTCAATAAATTATAATTCTGAAGGCATTGCTTTCCGTCCTGCAGTCTTTACAGATAAAGTTGATATATCAAGTCAGATATCCAGCATGAATCAGTCCTTACAACAGTCTAAGGACTATCTCAAAGAGGCTCAACGACTCCTTGATACTGTTAATCCATCATTAATAAGCACGTTGTCTATGATCATAGTGTATGTACTATCGATCGCATCGTTGTGTATAGGGTTGATTACATTTATCAGTTATATCATTGTTAAGAAAAATAGAAACACCTACAGCAGATTAGAGGATAGAAGAGTCAGACCTACAAGCAGTGGGGATCTCTATTACATTGGGACATAGTGTATTCAGATTGATGAAATTATGTCAAAGAAATCAGAGAACTTCTGACTTTCAGAAATGGATTGTAGACAATTAGTTAGATCATCCTGAACAATCGAGGTGAAAACATTGCAACTTTAGAATCAGATCATGTAAATAGTTGTAAAAAATTATAAGCTTCTTTTAATTTGTTTGAACAATAATTTGATTAATATATAACATATTCGCTCACACGAGCGCTAACCTATACACTCTTTACTAATATGTTATACTCATAATTAATGATATAATGACAAATAAGGATTCAAATTGAATTATGATATAGTTTCACACTACAATAGCATTTCGACCCAGAAAATATCCTTACAATTATACAATGTACTTAACCGTGAATATGTAGTTGATAATTTCCCTTTAAAAATTTAATAAAAAACTTAGGACCCAGGTCCATAACTCATTGGATACTTAACTGTATACTTCTAAGCTATCACATATCAAAGGAGAGATTGAATGTTTTTTTAGAGATCTGGATCATTACTATATGTGTCTCCTATAATCACATCATAGGAGTCAGCCATAATACACATCTTTGGGTAAGGAAAGGAAAGTATTGTTGACGTACAGATTGATCTGCTTGAATCAAATAATCAGTCATAACAATTCAAGAGAATGCCGGCAGAAAGCAAGAAAGTTAGATCGGAAAATACTACTTCAGACAAAGGGAAATTTCCTAGTAAAATTATTAAGAGCTACTACGGTACCATGGACATTAACAAAATAAATGAAGGATTATTGGACAGCAAAATATTAAGTGCTTTCAACACAGTAATAGCATTGCTTGGATGTATCGTGATCATAGTGATGAATATAATGATCATCCAAAATTACACAAGATCAACAGACAATCAGGCCGTGATCAAACATGCGTTGCAGTGTATCCAACAGCAGATCAAAGGGCTTGCTGACAAAATCGGCACAGAGATAGGGCCCAAAGTATCACTGATTGACACATCCAGTACTATTACTATCCCAGCTAACATTGGGCTGTTAGGTTCAAAAATCAGCCAGTCGACTGCAAGTAGAAATGAGAATGTGAATGAAACATGCAAATTCACACTGCCTCCCTTGAAAATCCACGAATGTAACATTTCTTGTCGTAACCCACTCCCTTTTAGAGAGTATAGGCCGCAGACAGAAGGAGTGAGCAATCTGGTAGGATTACCTAATAATATTTGCCTGGAAAAGACATCTAATCAGATACTGAAGCCAAAGCTGATTTCATACACTTTACCCGTAGTCGGTCAAAGTGGTACCTGTATCCCAGACCCATTGCTGGCTATGGACGAGGGCTATTTTGCATATAGCCACCTGGAAAGAATCGGATCATGTTCAAGAGGGGTCTCCAAACAAAGAATAATAGGAGTTGGAGAGGTACTAGACAGAGGTGATGAAGTTCCTTCTTTATTTATGACCAATGTCTGGACCCCACCAAATCCAAACACCGTTTACCACTGTAGTCCTGTATACAACAATGAATTCTATTATGGGCTTTGTGCAGTGTCAACTGTTGGAGACCCTATTCTGAATAGCACTTACTGGTCCGGATCTCTAATGATGACCCGTCTAGCTGTAAAACCCAAGAGTAATGATGGGGGTTACAATCAACATCAACTTGCCCTACGAAGTATCGAGAAAGGGAGGTATGATAAAGTTATGCCGTATGGACCTTCAGGCATCAAACAGGGTGACACCCTGTATTTTCCTGCTGTAGGATTTTTGGTCAGGACAGAGTTCAAATACAATGATTCAAATTGTCCCATCACGAAGTGTCAATACAGTAAACCTTAAAATTACAGGCTTTCTATCGGGATTAGACCAAACAGCCATTATATCCTTCGATCTGGACTATTAAGATACAATCTATCAGATGGGGAGAACCCTAAAATTGTATTCATTGAAATATCTGATCAAAGATTATCTATTGGATCTCCTAGCAAAATCTCTGATTCTTTGGGTCAACCTGTTTTCTACCAAGCGTCATTTTCATGGGATACTATGATTAAATTTGGAGATGGTCAAACCGTCAACCCTATGGTTGTATATTGGCGTGATAAGACGGTAATATCAAGACCTGGGCAATCACAATGCCCTAGATTCAATACATGTCCAGAGATCTGCTGGGAAGGAGTTTATAATGATGCATTCCTAATTGACAGAATCAATTGGATAAGCGCGGGTGTACTCCTAGACAGCAATCAGACCGCAGAAAATCCTGTTTTTACTGTATTCAAAGATAATGAAATACTTAATAGAGCACAACTGGCTCCTAAGGACACCAATGCACAAAAAACAATAACTAATTGCTTTTTCTTGAAGAATAAGATTTGGTGCATATCATTGGTTGAGATATATGACACAGGAGACAATGTCATAAGACCCAAACTATTCGCAGCTACGATACCAGAGCAATGTACATAAAAATCAACCTCATAATTTAATGGATTGATCTAATATAATGATAATAACCGTACAAAGACATGTGATGTAAACAAAATTGTCGTAATTAAATAAGTCCTCAGCTGAATATTTTTTTAAGATTAGCAATAGCATGTTTATCCAGTTATTGGATAGTTGATAATTTAATTCTGAAACTGGGTTAATAAATAATCTTGATCGATGATCTTTGAGAACAATGATATCATATAGTTCATCAAGTGATAATCAATTCTTTATATGTACACTTTAGAGTATATTTTGAGACTTAGTATTTTCGGCCCGAATGTTAAAGTTAATAGTTCATACATAACCTAATCTCAAGTTCTAAGCATAATGATAACTATTAATGCGAACATGTCTTGATGTAAGGAAGATTTGATATCAACTGAGACTCCACTTGATATAGTAGAGCTGAATCTTGTAAATAAATTATAATGAATAGTTTATTCAAAGATTATCATTCATATCAGTATAATTTAAGAAAAACTTAGGACCCAGGTCCTTGATTGTGCCAATTTTCTTGAGAAATCATTCAATTGTCCTTAGACTGAAAGCGTTGTTACCTAGTTTTTCAGAAGAGATCTTATTAGAATTGATTTATATGATCTAATTCCCTTAAAAATTGAATACCAAAAAACAAAAATGGCCGATGAATTATCAATATCCGACATCATCTACCCTGATTGTCATTTGGATAGTCCTATAGTCTCTGGTAAACTAATATCAGCTATTGAATATGCTCAATTGAGACACAATCAACCCAGTGATGATAAAAGACTGTCTGAGAATATTAGGTTAAACCTTCACGGGAAAAGAAAGAGTCTATACATATTAAGACAATCCAAACAGGGTGATTACATTAGAAACAACATTAAAAACCTAAAGGAATTCCTGCATATTGCGTACCCTGAATGCGATTACACTTTATTCTCCATCACATCCCAAGGCATGACTAGCAAACTTGATAACATCATGCAAAAGTCATTCAAAGCATACAATATCATTAGTAAGAAAGTAATTGGGATGCTGCAAAATATCACTAGAAATCTCATAACTCAAGATAGAAGAGATGAAATAATTAATATACATGAGTGTAGGCGATTAGGGGATTTAGGGAAGAATATGAGTCAATCTAAATGGTATGAGTGTTTTTTGTTTTGGTTTACTATCAAAACAGAGATGCGAGCAGTGATCAAGAATTCGCCAAAGCCGAAATTCCGTTCAGATTCATGCATAATACACATGCGAGATAAAAGTACTCAAATAATCCTAAATCCAAATCTTATCTGCATTTTCAAATCAGACAAAACTGGATAGAAGTGTTATTATCTTACAACCGAAATGGTTCTAATGTATTGTGACGTCCTAGAGGGAAGGATGATGATGGAGACAACAGTCAAATCGGATATCAAGTACCAGCCTCTAATCTCGAGATCCAATGCCCTCTGGGGGCTAATGGATCCCTTGTTCCCTGTCATGGGAAACAGGATTTACAATATAGTGTCTATGATAGAGCCTTTAGTTCTTGCACTACTCCAACTCACGGATGAGGCGAGGATCCTGAGGGGTGCATTCCTGCATCACTGCATAAAGGAAATTCATCAAGAATTGAGTGAGTGTGGTTTTACAGATCAGAAGATTCGGTCTATGTTTATTGATGATCTTTTATCCATTCTAAATATCGATAATATACATCTGTTGGCAGAGTTCTTTTCTTTCTTTCGTACGTTTGGCCATCCTATTCTTGAGGCTAAAGTTGCTGCAGAAAAAGTGAGTGAACATATGTTGGCAGATAAAGTTCTTGAATATGCCCCTATAATGAAAGCACATGCTATATTCTGCGGGACTATAATAAATGGGTATAGGGATAGACACGGAGGAGCCTGACCTCCTCTTTACCTCCCCGCACATGCATCTAAGCATATAATCCGTTTGAAAAATTCTGGGGAATCTTTGACCATTGATGACTGTGTCAAGAATTGGGAATCATTCTGTGGGATTCAATTTGATTGTTTCATGGAGCTGAAATTGGACAGAGATCTGAGTATGTATATGAAAGCTAAAGCTTTATTTCCAATCAAAGACGAATGGGACAGTGTATACCCACGTGAAGTGTTGAGCTATACCCCACCGAGGTCAACCCAGCCAAGAAGATTGGTTGACGTTTTTGTAAATGATGAAATCTTTGATCCATACAACATGCTAGAATATGTCTTATCCGGTGCTTATCTCGAGGATGAACAATTCAATGTTTCTTATAGCTTGAAGGAGAAAGAGACGAAGCAAGCTGGACGATTGTTCGCAAAGATGACCTACAAAATGCGTACATGTCAAGTCATAGCAGAGGCCCTGATAGCCTCAGGTGTCGGTAAATATTTTAAGGAGAACGGGATGGTTAAGGATGAGCACGAACTTTTGAAGACACTCTTCCAATTGTCTATTTCCTCAGTTCCTCGAGGGAACAGTCAGGGTAATGATCCTCAATCCATCAATAATATAGAAAGAGATTGCCAATACTTTAAAGGGGTCACCACCACTGTGAAAGACAATAAGAATAACTCTTTTAATAAGGTTAAATCTGCTCTCAATAATCCGTGCCAAGCTGACGGAGTCCATCATAACATGTCACCCAATACACGAGATCGTTATAAGTGTAGTAATACAAGTAAGTCTTTTCTCGATTATCATACCGAGTTTAATCCTCACAATCACTACAAATCAGACAATACAGAGGCGGCCGTACTGTCCAGGTATGAAGACAACACCGGGACAAAATTTGATACAGTAAGTGCATTTCTTACAACTGATCTTAAGAAATTCTGTCTCAATTGGAGATACGAATCAATGGCTATATTTGCTGAACGTCTGGATGAGATATACGGTTTACCTGGATTATTTAATTGGATGCACAAACGACTAGAAAGATCTGTTATCTATGTTGCAGACCCTAATTGCCCCCGTAATATTGACAAACATATGGAACTAGAAGAAACTCCTGAAGATGATATATTCATTCATTATCCTAAATGCGGTATTGAAGGATATAGCCAAAAAACATGGACTATAGCAACTATCCCCTTTTTATTCTTGAGTGCCTGTGCGACGAACACGAGGATTGCTGCAATTGTCCAAGGAGACAATGAATCAATTGCTATCACTCAAAAAGTTCATCCTAATCTTCCCTACAAGGTAAAGATAGAGATCTGTGCAAAGCAAGCTCAGCTGTATTTTGAAAGGTTGAGGATGAACTTAAGAGCCCTCGGCCACAATCTTAAAGCTACAGAATCTATCATCAGTACACATCTTTTTGTTTATTCGAAGAAAATTCATTATGATGGAGCTGTGCTGTCTCAGGCACTCAAATCAATGTCCAGATGTTGCTTTTGGTCAGAGACCCTGGTGGATGAAACTAGATCAGCTTGTAGTAACATCAGCACTACGATAGCTAAAGCTATAGAAAATGGGTTGTCAAGAAATGACGGCTATTGTATCAATATTTTGAAAGTAATTCAGCAGCTTCTCATATCAACTGAGTTTAGTATTAACGAGACATTCACACTGGATGTGACATCTCCCATTTCAAAGAATTTAGATTGGCTTATAACAGCTGCATTAATCCCGGCACCTATTGGAGGATTCAATTACCTTAATTTGTCTAGAATTTTTGTTAGAAATATAGGTGATCCGGTTACAGCATCTTTGGCTGATCTTAAAAGAATGATTGACCACAGTATTATGACTGAAAGCGTATTACAAAAAGTTATGAATCGAGAACCAGGTGATGCGAGTTTCTTGGACTGGGCCAGTGATCCATACTCGGGCAACTTGCCTGACTCACAAAGCATCACTAAAACAATTAAAAATATCACAGCTAGGACTATACTGAGGAACTCACCGAATCCAATGCTAAAAGTTTTATTTTATGACAAATCTTTTGATGAAGATCTTGAACTAGCTAGCTTGTTAATGGACAGGAGGGTTATATTACCTAGAGCCGCTCATGAGATACTGGATAATTCATTGACAGGCGCCAGAGAGGAAATTGCTGGTTTATTAGATACAACTAAAGGCTTGATCAGATCAGGGCTAAGAAAGAGTCGAATTCAGCCAAAGTTAGTTTCTAGACTATCTCATCATGATTATAATCAATTTTTAAGACTGTATAAACTTCTATCAAACAGAAGACAAAATGACTTGATATCATCAAATACTTGCTCAGTTGACTTGGCACGAGCATTGAGATCTCACATGTGGAGCGAATTAGCTTTAGGTAGAGTAATATACGGACTTGAGGTTCCAGATGCACTTGAGGCTATGGTGGGAAGGTACATAACAGGGAGCTTAGAGTGCCAAATTTGTGAGCAGGGAAACACGATGCATGGGTGGTTCTTTGTACCTAGGGATTCCCAATTAGATCAGGTAGATAGAGAGCTCTCATCAATAAGAGTACCTTATGTATGATCAAGTACGGATGAAAGATCGGATATCAAACTAGGCAATGTCAAAAGACCAACTAAGGCCTTGCGTTCTGCTATCAGAATTGCGACAGTATATACTTGGGCCTATGGAGATAATGAAGAGTGTTGGTATGAAGCTTGGTACCTAGCGTCTCAGAGGGTAAACATAGACTTAGATGTATTGAAAGCTATAACCCCAGTATCCACTTCAAACAATTTATCCCATAGATTGAGAGATAAATCCACACAATTTAAGTTTGCAGGGAGTGAACTCAACAGGGTTTCTAGATATGTTAACATAAGCAATCATAGTCTAGATTTCAGAATTGAGGGGGAAAAGGTAGATACGAATCTTATTTATCAACAAGCAATGCTATTAGGGTTATCGCTATTGGAAGGTAAATTCAGATTGAGATTAGAAACTGATGATTACAACGGGATATATCACTTACATGTAAAGGATAATTGTTGTGTCAAAGAAGTGGCTGATGTAGGCCAAGTGGACGCTGAGTAGCCTATCCCAGAATATACTGAAGTGGATAACAATCATCTTATATATGATCCAGACCCCGTTACAGAAATTGATTGCAGCCGTCTTTCTAATCAGGAGTCCAAATCAAGAGAATTAGACTTTCCTTTATGGTCAACTGAGGAACTTCATGATGTCCTAGCTAATACTGTTGCTTAGACCGTTCATGAGATTATAACAAAGGCTGACAAGGATGTTTTAAAGCAACACCTTGCAATAGACTCTGACGAGAACATCAATAGCTTAATCACAGAATTTCTAATACTTGATCCTGAACTGTTTGCACTTTATCTAGGACAATCTATATCAATAAAATGGGCCTTTGAAATTCATCATAGGCGTCCTAGAGGAAGACATACTATGGTCGACCTATTGTCAGATCTTGTATCAAATACATCAAAACACACTTACAAAGTGTTGTCAAATGCCTTGTCACATCCTAGAGTATTCAAGAGATTTGTAAACTGTGGCTTACTATTGCCTACACAGGGTCCTTACCTTCATCAACAAGATTTTGAAAAGTTGTCTCAAAACCTCCTTGTAACATCTTATATGATTTATCTAATGAACTGGTGTGACTTCAAGAAATTCCCCTTTTTAATCGCCGAACAGGATGAAACTGTGATAAGTCTACGAGAGGATATAATAACATCCAAACATCTCTGTGTTATAATTGACTTATACGCAAATCACCATAAACCTCCTTGGATAATAGATCTAAACCCACAAGAAAAAACATGTGTACTGCATGACTTTATTTCTAAATCTAGGCAGGTGGACACGTCCTCCAGATCATGGAATACTTCTGACCTGGATTTGGTAATATTCTATGCATCTTTGACTTATTTGAGAAGAGGTATAATAAAACAATTAAGGATAAGACAAGTTACTGAGGTTATAGATACCACATCAATGTTAAGGGATAATATAATTGTAGAAAATCCTCCTAGTAAAACAGGAGTGTTAGACATCAGAGGTTGTATAATATACAATTTAGAGGAAATCCTGTCTATTAACACAAAATCAGCGTCAAAAAAGATCTTTAATCTTAATAGTAGGCCGTCAGTGGAGAATCATAAATATAGAAGGATAGGTCTCAACTCAACATCGTGTTCCAAGGCATTAAATCTATCACCTCTAATTCAAAGGTATCTGCCGTCAGGAGCTCAAAGGTTGTTTATAGGAGCAGGTTCTGGGAGAATGCTGTTATTATATCAGTCTACATTGGGGCAATCAATTTCTTTTTACAATTCAGGTATAGATGGAGATTATATACCAGGTCAAAGAGAACTGAAACTATTTCCCTCTGAATACTCAATTGCTGTGGAAGACCCATCTCTGACGGGGAAATTGAAAGGACTAGTGGTGCTCTTATTCAATGGGATACCAGAAACAACATGGATCGGGGATATAGTCTCCTGCGAGTATATCTGAAATAGGACAGCGAGGCGAAGTATAGGTCTTGTCCATTCTGACATGGAGTCTGGGATTGACAAAAATGTAGAGGAGATACTAGTAGGACATTCCCATCTAATATCTATGGCGATAAATGTTATGATGGAGGACGGACTATTAGTATCCAAGAGAGCATACACCCATGGATTCCCAATCTCAAGATTATTTAACATGTACAGATCATATTTTGGACTAGTACTGGTGTGTTTCCCAGTGTATAGTAATCCAGATTCTACTGAGGTATATCTTCTTTGCTTAGAAAAGACGGTCAAGACTATTGTTCCCCCGCAAAAAGTCCTTGAGCACTCTAATTTGCACGATGAAGTCAATGACCAGGGGATAACATCAGTGATTTTTAAAATCAAGAATTCACAGTCTAAGCAGTTCCACGATGATCTAAAGAAGTACTATCAGATTGACCAACCTTTTTTTGTACCAACTAAATTCACTAGTGATGAACAAGTACTTGTCCAAGCAGGGCTGAAACTCAATGGGCCAGAAATTCTTAAGAGTGAAATCAGTTATGATATCGGTTCAGATATCAATACATTAAGAGACACCATCATAATTATGTTAAATGAGGCTATGAATTATTTTGATGACAACAGATCACCTTCACACCACCTAGAACCCTATCCAGTTTTGGAGAGAACTAGAATTAAAACAATAATGAATCGTGTGACTAAGCAAGTGATTGTCTACTCACTTATCAAGTTCAAGGCCACCAAAAGTTCAGAACTCTACCACATTAAAAATAACATCAGAAGAAAAGTTCTAATCTTAGATTTCAGATCAAAGCTCATGACAAAGACTCTACCTAAAGCAATGCAAGAGAGAAGAGAAAAAAGCGGTTTCTAAGAGGTTTGGATAGTAGATTTATCGAATCGAGAAGTTAAAATCTGGTGGAAGATATTCGGATACATATCCCTTATCTGATTTAACCTTCCAAATCCAAGTCCCACTGATAACTTATGTTGATCTAAGGTTCAGTTATTAAGAAAAACTTAATAACGATTCTTCGTTACCCTTG

Navigate in a web browser to NCBI BLAST. You should see the following screen:



Click on the button for “Nucleotide BLAST”. This will redirect you to the below webpage.



To input the query, copy and paste the contig sequence into the box (this can include the header, but does not need to).

Default settings should be fine in this first instance (this will search all non-redundant records on NCBI). Click BLAST to run the search.

You may need to wait a few minutes, but results should show up automatically.



Characterising and contextualising the initial cases

We now have virus genome sequences for each of the first 20 cases compiled into a single FASTA file.

Given the BLAST hits, it seems likely that the sequences are Henipavirus nipahense (NiV) genomes. To confirm this, and to investigate how the samples fit into the context of known NiV diversity, we will acquire a background dataset of the most closely related viruses. For this we will use the NCBI Virus resource, which is a community portal for viral data that is archived on Genbank.

Navigate to NCBI Virus and click on Search by virus name.



You can begin typing in Nipah and the species name should appear.



Click on the species name (Henipavirus nipahense) and this will redirect you to the NiV records.

These records can be filtered by length, completeness and a variety of other factors as shown below.



For the purposes of this exercise, we have already prepared all the complete records available for NiV that had a date and location of collection. This background dataset has been curated and headers annotated with consistent fields:


>Accession|Virus|Country|Host|CollectionDate
For example:
>MH523641|NiV|India|human|2018-05-21


However, ordinarily you can download the dataset from the NCBI Virus Portal by clicking the Download button indicated below:



Aligning the sequences

We now have our combined dataset, however before any trees can be built, we need to align them. Feel free to align with your preferred method, however this tutorial will use the MAFFT online version for alignment. MAFFT has a variety of different algorithms, with varying speeds and precision. Today we’ll just be using the AUTO option that selects the most appropriate for your input dataset. For full details about MAFFT and its models, see Katoh et al 2002.

Navigate to the MAFFT web server.

We have provided the combined case genome sequences and NiV background set. Once you have decompressed the outbreak.seq_run_1.NiV_background.fasta file, select Choose file on the MAFFT web server and upload your file. We will run with the default AUTO mode (it may be useful to change the Title length field to 50, but not necessary), select Submit.

When ready, return to the results page and click on Fasta format to download the alignment as a FASTA file.

The downloaded file will probably have a long and uninformative name such as: _out.2502241929466CMs0BqwdZEKASUk69JNo3lsfnormal.fasta – rename this file to outbreak_NiV_aligned.fasta

Estimate a Maximum Likelihood tree

For the initial exploratory analysis, we want to quickly estimate a phylogeny to place our sequences into the known diversity of NiV. In order to do this, we will use IQTREE to estimate the phylogeny Minh et al. 2020. IQTREE2 can be run on the command line with more flexibility, but we will be using the web server in this tutorial Trifinopoulos et al. 2016.

To navigate to the web server, click the W-IQTREE link.

Input data

Upload your alignment file to the server by clicking Browse and selecting the file outbreak_NiV_aligned.fasta.

Substitution model options

We will bypass model selection for now, and just select the simple HKY substitution model. This is just a quick first look at the data.

Running IQTREE

All other parameters can be left as default. Scroll down and select Submit job.

Click QUERY STATUS to check on the run. When the job has finished the status bar on the left will say Success.


You can examine an ASCII version of the phylogeny in the Full Result tab, but to download the result, click DOWNLOAD SELECTED JOBS in the bottom left. Decompress the downloaded file and you’ll see the contents contain the IQTREE log file, the result file and the treefile.



The file containing the maximum likelihood tree will be the one called outbreak_NiV_aligned.fasta.treefile.


We will use FigTree to look at the treefile (the file ending in .treefile).

Open the FigTree application and select to open the treefile, it will be displayed as arbitrarily rooted because we did not specify an outgroup during the quick run. Click on the Tree dropdown menu on the top bar and select Midpoint Root.

This will re-root the phylogeny and provide a more balanced tree view in the absence of a known outgroup.

When did this outbreak arise?

Even with just these few sequences it may be possible to ascertain some key information about this outbreak (such as when it likely began and how quickly it is spreading).

Use the TempEst tutorial as a guide to load the tree file into TempEst to assess whether there is temporal signal in the data. Parse the tip dates by selecting Parse dates in the Sample Dates tab, and select Best fitting root in the top-left side of the application.

Take a look at the root-to-tip and residual tabs.

Additional human cases spurs an investigation in animal population

Download the provided FASTA alignment file and decompress. This FASTA file contains NiV genome sequences from both the initial sequencing run and a second sequencing run that contained both human and pig NiV samples. These are aligned sequences. We will attempt to answer whether the human and animal outbreaks are linked and what is the source of these cases.

Generate a maximum likelihood tree from the alignment

Using IQTREE as described above, estimate a maximum likelihood phylogeny with the outbreak_animals.fasta file.

Inspect the phylogeny using TempEst (parse tip dates and select best fitting root). Note that the tips are labelled with host species.

Simulation Notes

An outbreak structure was simulated using JT McCrone’s transmission simulator. The simulator generated an outbreak with a total of 517 cases and produces a transmission tree, a time tree and a line list with sampling time in days.

As an example case, NiV was selected because it is fast evolving and has a suitable background dataset available on Genbank that is not too large that it would require downsampling. All complete NiV genomes were downloaded from NCBI virus, aligned using MAFFT and an ML tree was computed using IQTREE2.

Ancestral reconstruction was carried out also using IQTREE and the genome sequence for the common ancestor node for MK801755 and the 1999 porcine outbreak in Malaysia. From this ancestral node, seq-gen was used to simulate a branch length of 0.018 under the HKY substitution model, which represents ~40 years of NiV evolution assuming a rate of 4.5x10^-4 substitutions per site per year (as published for NiV in Cortes-Azuero et al 2023). This provided the initial infection case from which we simulated genome sequences corresponding to the simulated outbreak time tree. The evolution was simulated with a HKY substitution model, also using seq-gen, at a rate of 7x10^-4. The time tree was manually labelled in FigTree with host annotations, creating a scenario where pig represented the majority of cases, including the earlier cases, and human clusters represented spillover events from the pig population. In total, 60 of the 513 tips were labelled as human cases and the remainder were labelled as pig. The start date of the outbreak was set to 2023-08-11.

Because this outbreak has been simulated, we know the underlying ground truth and can see the impact lack of data and sampling bias can have on inference. Below is the BEAST DTA inference with all of the simulated outbreak cases (so 100% of the population has been sequenced and sampled in this case).



References & Further Reading

  • Drummond AJ, Rambaut A (2007) BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evolutionary Biology 7: 214.
  • Drummond AJ, Ho SYW, Phillips MJ & Rambaut A (2006) PLoS Biology 4, e88.
  • Drummond AJ, Rambaut A & Shapiro B and Pybus OG (2005) Mol Biol Evol 22, 1185-1192.
  • Drummond AJ, Nicholls GK, Rodrigo AG & Solomon W (2002) Genetics 161, 1307-1320.
  • Dudas G, Carvalho LM, Rambaut A, Bedford T (2018) MERS-CoV spillover at the camel-human interface eLife 7:e31257.
  • Ferreira, M. A. R. and M. A. Suchard. 2008. Bayesian analysis of elapsed times in continuous-time Markov chains. Can J Statistics, 36: 355–368. doi: 10.1002/cjs.5550360302
  • Gill MS, Lemey P, Faria NR, Rambaut A, Shapiro B, and Suchard MA (2013) Improving Bayesian population dynamics inference: a coalescent-based model for multiple loci. Mol Biol Evol 30, 713-724.
  • Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol. 2013 Apr;30(4):772-80. doi: 10.1093/molbev/mst010. Epub 2013 Jan 16. PMID: 23329690; PMCID: PMC3603318.
  • Minin VN, Bloomquist EW and Suchard MA (2008) Smooth Skyride through a Rough Skyline: Bayesian Coalescent-Based Inference of Population Dynamics. Molecular Biology and Evolution 25:1459-1471; doi:10.1093/molbev/msn090.
  • Nguyen LT, Schmidt HA, von Haeseler A, Minh BQ. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol. 2015 Jan;32(1):268-74. doi: 10.1093/molbev/msu300. Epub 2014 Nov 3. PMID: 25371430; PMCID: PMC4271533.
  • Rambaut A, Pybus OG, Nelson MI, Viboud C, Taubenberger JK, Holmes EC (2008) The genomic and epidemiological dynamics of human influenza A virus. Nature, 453: 615-9.
  • Smith GJD, Vijaykrishna D, Bahl J, Lycett SJ, Worobey M, Pybus OG, Ma SK, Cheung CL, Raghwani J, Bhatt S, Peiris JSM, Guan Y & Rambaut A (2009) Origins and evolutionary genomics of the 2009 swine-origin H1N1 influenza A epidemic. Nature 459, 1122-1125.
  • Trifinopoulos J, Nguyen L, von Haeseler A, Minh B, W-IQ-TREE: a fast online phylogenetic tool for maximum likelihood analysis, Nucleic Acids Research, Volume 44, Issue W1, 8 July 2016, Pages W232–W235, https://doi.org/10.1093/nar/gkw256

Help and documentation

The BEAST website: http://beast.community

Tutorials: http://beast.community/tutorials

Frequently asked questions: http://beast.community/faq