AmtDB
IMG logo

FAQ

How to cite the AmtDB?

If you are using this database, please, consider citing our paper: AmtDB: a database of ancient human mitochondrial genomes

How often is the database updated?

As of 2018, we are starting the database, so the updates will be quite frequent, as we are adding more and more samples. After the initial operation phase (2019+), we plan to have two major updates annually (with possible additional smaller updates if needed). You can always check the database log.

What kind of samples can I find in the database, where do they come from?

Our database provides full mitochondrial sequences and descriptive metadata for samples coming from prehistoric and early historic populations. Generally speaking, our samples are from the first milliennium CE (Common Era, AD, after Christ) or older. Vast majority of our samples comes from the prehistoric populations of Neolithic and Bronze Age. Most of the samples originate in Europe, but we have also samples from central Asia, Middle East, Near East or Africa. In the future, we will bring you more non-European samples, as more of them will get analyzed and published.

I have found some bugs/errors/wrong or dubious values!

Good job! It would help us if you could, please, report the bugs/errors using one of the e-mails on contact page. Thank you!

What can I do with the data you provide?

We provide the complete mitochondrial genome sequences in FASTA format, together with sample description metadata. Our data are only provided for research of human ancient population life-histories and their relations and connections with modern populations. Please note that we are not responsible for any use of provided data.

Can you add my samples into AmtDB?

Sure, if you have ancient human full mtDNA sequences! Please, contact us by e-mail.

I did not find the sample(s) I was looking for?!

We are constatly adding new samples into the database, but if you want some specific samples added, please let us know! In an ideal case, those samples were already published and the sequences deposited in GenBank, ENA, SRA or similar database.

What technologies does this website use?

What bioinformatics pipeline do you use to get the consensus mtDNA sequences?

This information can be found in the Docs section.

AmtDB documentation

Here you can find the documentation for metadata provided and the mitochondrial genome sequences stored in the database.

Metadata variables description

  • id - primary identifier of the sample
  • id_alt - secondary identifier(s) of the sample (is case it was also published under other name(s))
  • country - country of origin
  • continent - continent of origin
  • geo_group - more general geographic group of the sample. So far we use these geo_groups: Altai, Anatolia, Balkans, Baltic, British Isles, Caucasus, Iberia, Middle East, Near East, Pannonia, Pontic steppe, Scandinavia, Siberia, central Europe, eastern Africa, southern Europe, western Europe.
  • culture - archaeological classification of the sample
  • epoch - rough time/culture identifier. Categories used: Aurignacian, Bronze Age, Copper Age, Epigravettian, Epipaleolithic, Gravettian, Iron Age, Mesolithic, Neolithic, and Upper Paleolithic.
  • group - we use these acronyms to classify samples into different groups. They are generally based on culture, ethnicity, epoch, geo_group or any combination thereof. It the acronym has 3 letters, then it is based predominately on culture. If the acronym has 4+ letters, then it contains information about epoch and location of the sample, and its meaning can be decoded as follows:
    XXYY, where XX = epoch (BA - Bronze Age, CA - Copper Age, EBA - Early Bronze Age, IA - Iron Age, LNE - Late Neolithic, MA - Middle Ages, ME - Mesolithic, NE - Neolithic), and YY = location, if used in capital letters - based on geo_location (AF - Africa, BA - Balkans, BI - British Isles, CA - Caucasus, IB - Iberia, ME - Middle East, NE - Near East, SC - Scandinavia); or if ending with lower case letter - based on the country of origin (Cz - Czech Republic, Ge - Germany, Gr - Greece, Hu - Hungary, It - Italy, Nl - Netherlands, Pl - Poland, Ru - Russia, Uk - Ukraine)
  • comment - place for general comments, can contain uncertainties, additional sample info, relationship between samples, etc.
  • latitude - geographical coordinates, decimal degrees
  • longitude - geographical coordinates, decimal degrees
  • sex - male (M)/female (F)/unknown (U), ideally based on the amount of Y chromosomal DNA from the sample, but we are not distinguishing between sex based on genetic, bone anthropology, or archaeological evidence.
  • site - archaeological site where the sample was excavated
  • site_detail - more details about the site, where available
  • mt_hg - mitochondrial haplogroup
  • ychr_hg - Y chromosomal haplogroup, where available
  • ychr_snps - mitochondrial haplogroup

Following variables all concern the dating of the sample. This kind information about each sample varies considerably in its completeness and available form. We are struggling to keep as much information in the database as possible and provide parsed data for quick and comfortable use. In general, we will use calibrated BCE or CE values, wherever possible. If the sample has radiocarbon dating (14C), we provide this information as well.

  • year_from - lower limit of sample age (of 95.4% probability interval for 14C samples), if starting with '-', then it refers to BCE, positive values refer to CE
  • year_to - upper limit of sample age (of 95.4% probability interval for 14C samples), if starting with '-', then it refers to BCE, positive values refer to CE
  • date_detail - string with information about the sample age, taken directly from the source publication
  • bp - uncalibrated age for 14C samples, numbers of years before 1950
  • c14_lab_code - for 14C samples, their radiocarbon laboratory code
  • reference_name - source reference of the sample
  • reference_link - DOI based link of the reference
  • data_link - GenBank, ENA, SRA, or other depository link, where the sequence is available
  • c14_sample_tag - 1 if sample was radiocarbon dated, 0 otherwise
  • c14_layer_tag - 1 if another sample in the same statum was radiocarbon dated, 0 otherwise. The sample can't have '1's in both c14_sample_tag AND c14_layer_tag. If both tags equal to 0, then the sample was dated by other methods only, most probably according to the archaeological culture

mtDNA sequence reconstruction

Whenever it is possible to download the full mtDNA sequence from GenBank or other repository, we prefer this option. When only BAM/SAM files are provided (usually in ENA or SRA archives), we will use the following pipeline:

BWA software package version 0.7.829 is used to map merged reads as single-end reads against the revised Cambridge Reference Sequence (rCRS)[1, 2] (GenBank: NC_012920), with the non-default parameters -l 16500 -n 0.01 -o 2 -t 2. The ratio of reads mapping to Y and X chromosomes (Ry) (with mapping quality greater than 30) is calculated to assign molecular sex of individuals sequenced on the Illumina platform[3].

FASTX-Toolkit is used to demultiplex sequences generated by PGM Ion Torrent, the scripts fastx_barcode_splitter.pl and fastx_trimmer (from the FASTX toolkit) are used to demultiplex the reads by barcode, using a one mismatch threshold. The Cutadapt v.1.8.133 is then used to remove the long (−M 110), short (−m 35), and low-quality sequences (−q 20). The filtered reads are analyzed with FastQC v0.11.334 using the options described previously by [4]. The sequences are mapped against the rCRS using TMAP v3.4.136. To collapse duplicate sequence reads with identical start and end coordinates (for both PGM and Illumina sequence data) we use FilterUniqueSAMCons.py script[5]. Misincorporation patterns are assessed with the use of mapDamage v2.0.537. Consensus sequences are build using ANGSD v0.91038. We accept only reads with mapping score of 30, a minimum base quality of 20, and a minimum coverage of 3 as in [4]. Where necessary, mitochondrial haplogroups (mt hgs) are assigned for each individual with the use of HAPLOFIND[6], the PhyloTree phylogenetic tree build 17[7] and Mitomaster[8].

References
  1. Anderson, S. et al. Sequence and organization of the human mitochondrial genome. Nature 290, 457–465 (1981).
  2. Andrews, R. M. et al. Reanalysis and revision of the Cambridge reference sequence for human mitochondrial DNA. Nat. Genet. 23, 147 (1999).
  3. Skoglund, P. et al. Accurate sex identification of ancient human remains using DNA shotgun sequencing. J. Archaeol. Sci. 40, 4477–4482 (2013).
  4. Chyleński, M. et al. Late Danubian mitochondrial genomes shed light into the Neolithisation of Central Europe in the 5th millennium BC. BMC Evol. Biol. 17, 80 (2017).
  5. Meyer, M. and Kircher, M. Illumina sequencing library preparation for highly multiplexed target capture and sequencing. Cold Spring Harb. Protoc. 2010, t5448 (2010).
  6. Vianello, D. et al. HAPLOFIND: a new method for high-throughput mtDNA haplogroup assignment. Hum. Mutat. 34, 1189–1194 (2013).
  7. van Oven, M. and Kayser, M. Updated comprehensive phylogenetic tree of global human mitochondrial DNA variation. Hum. Mutat. 30, E386–E394 (2009).
  8. Lott, M. T. et al. mtDNA Variation and Analysis Using Mitomap and Mitomaster. Curr. Protoc. Bioinforma. 44, 1.23.1-26 (2013).

References

Here we provide links to other resources and tools for mtDNA and ancient DNA studies.

Articles

Krause et al. 2010. The complete mitochondrial DNA genome of an unknown hominin from southern Siberia. Nature, 464(7290), 894-897. (link, data) Keller et al. 2012. New insights into the Tyrolean Iceman's origin and phenotype as inferred by whole-genome sequencing. Nature Communications, 3(1), ?. (link, data) Raghavan et al. 2014. Upper Palaeolithic Siberian genome reveals dual ancestry of Native Americans. Nature, 505(7481), 87-91. (link, data) Olalde et al. 2014. Derived immune and ancestral pigmentation alleles in a 7,000-year-old Mesolithic European. Nature, 507(7491), 225-228. (link, data) Skoglund et al. 2014. Genomic Diversity and Admixture Differs for Stone-Age Scandinavian Foragers and Farmers. Science, 344(6185), 747-750. (link, data) Lazaridis et al. 2014. Ancient human genomes suggest three ancestral populations for present-day Europeans. Nature, 513(7518), 409-413. (link, data) Gamba et al. 2014. Genome flux and stasis in a five millennium transect of European prehistory. Nature Communications, 5(1), ?. (link, data) Fu et al. 2014. Genome sequence of a 45,000-year-old modern human from western Siberia. Nature, 514(7523), 445-449. (link, data) Haak et al. 2015. Massive migration from the steppe was a source for Indo-European languages in Europe. Nature, 522(7555), 207-211. (link, data) Allentoft et al. 2015. Population genomics of Bronze Age Eurasia. Nature, 522(7555), 167-172. (link, data) Olalde et al. 2015. A Common Genetic Origin for Early Farmers from Mediterranean Cardial and Central European LBK Cultures. Molecular Biology and Evolution, ?(?), msv181. (link, data) Günther et al. 2015. Ancient genomes link early farmers from Atapuerca in Spain to modern-day Basques. Proceedings of the National Academy of Sciences, 112(38), 11917-11922. (link, data) Llorente et al. 2015. Ancient Ethiopian genome reveals extensive Eurasian admixture in Eastern Africa. Science, 350(6262), 820-822. (link, data) Jones et al. 2015. Upper Palaeolithic genomes reveal deep roots of modern Eurasians. Nature Communications, 6(1), ?. (link, data) Mathieson et al. 2015. Genome-wide patterns of selection in 230 ancient Eurasians. Nature, 528(7583), 499-503. (link, data) Cassidy et al. 2016. Neolithic and Bronze Age migration to Ireland and establishment of the insular Atlantic genome. Proceedings of the National Academy of Sciences, 113(2), 368-373. (link, data) Omrak et al. 2016. Genomic Evidence Establishes Anatolia as the Source of the European Neolithic Gene Pool. Current Biology, 26(2), 270-275. (link, data) Fu et al. 2016. The genetic history of Ice Age Europe. Nature, 534(7606), 200-205. (link, data) Hofmanova et al. 2016. Early farmers from across Europe directly descended from Neolithic Aegeans. Proceedings of the National Academy of Sciences, 113(25), 6886-6891. (link, data) Broushaki et al. 2016. Early Neolithic genomes from the eastern Fertile Crescent. Science, 353(6298), 499-503. (link, data) Lazaridis et al. 2016. Genomic insights into the origin of farming in the ancient Near East. Nature, 536(7617), 419-424. (link, data) Kilinc et al. 2016. The Demographic Development of the First Farmers in Anatolia. Current Biology, 26(19), 2659-2666. (link, data) Juras et al. 2017a. Investigating kinship of Neolithic post-LBK human remains from Krusza Zamkowa, Poland using ancient DNA. Forensic Science International: Genetics, 26(?), 30-39. (link, data) Jones et al. 2017. The Neolithic Transition in the Baltic Was Not Driven by Admixture with Early European Farmers. Current Biology, 27(4), 576-582. (link, data) Juras et al. 2017b. Diverse origin of mitochondrial lineages in Iron Age Black Sea Scythians. Scientific Reports, 7(1), ?. (link, data) Chyleński et al. 2017. Late Danubian mitochondrial genomes shed light into the Neolithisation of Central Europe in the 5th millennium BC. BMC Evolutionary Biology, 17(1), ?. (link, data) Lazaridis et al. 2017. Genetic origins of the Minoans and Mycenaeans. Nature, ?(?), ?. (link, data) Lipson et al. 2017. Parallel palaeogenomic transects reveal complex genetic history of early European farmers. Nature, 551(7680), 368-372. (link, data) Stolarek et al. 2018. A mosaic genetic structure of the human population living in the South Baltic region during the Iron Age. Scientific Reports, 8(1), ?. (link, data) Mathieson et al. 2018. The genomic history of southeastern Europe. Nature, 555(7695), 197-203. (link, data) Olalde et al. 2018. The Beaker phenomenon and the genomic transformation of northwest Europe. Nature, 555(7695), 190-196. (link, data)

Links

Tools

Phylotree - A comprehensive phylogenetic tree of worldwide human mitochondrial DNA variation
Mitomap - A human mitochondrial genome database, a compendium of polymorphisms and mutations in human mitochondrial DNA, includes MITOMASTER module for mtDNA sequence analysis
Haplofind - Fast and easy mitochondrial DNA haplogroup assignment
Haplogrep - Tool for haplogroup assignment and mutation report
Haplosearch - Fast transformation between haplotype format and nucleotide sequence data

Databases

Online Ancient Genome Repository - Ancient DNA repository for the Australian Centre for Ancient DNA, University of Adelaide
Ancestral Journeys - Ancient DNA database (currently offline)
GenBank - Complete ancient mitochondrial genomes in NCBI GenBank database

AmtDB log