How to cite the AmtDB?
How often is the database updated?
What kind of samples can I find in the database, where do they come from?
I have found some bugs/errors/wrong or dubious values!
What can I do with the data you provide?
Can you add my samples into AmtDB?
I did not find the sample(s) I was looking for?!
What technologies does this website use?
- Backend is written in Python 3:
- Server: nginx, Gunicorn
- Database: MariaDB
- Styling, UI: custom, Boostrap 4, jQueryUI, Select2, DataTables, noUIslider, tippy.js, CSS-Element-Queries, PACE page load indicator
- Icons: FontAwesome, Ionicons
- Maps: Leaflet and some plugins: Leaflet.Control.FullScreen, Leaflet.EdgeBuffer, Leaflet.Basemaps, Leaflet.markercluster, leaflet-providers, Leaflet.EasyButton
What bioinformatics pipeline do you use to get the consensus mtDNA sequences?
Here you can find the documentation for metadata provided and the mitochondrial genome sequences stored in the database.
Metadata variables description
- id - primary identifier of the sample
- id_alt - secondary identifier(s) of the sample (is case it was also published under other name(s))
- country - country of origin
- continent - continent of origin
- geo_group - more general geographic group of the sample. So far we use these geo_groups: Altai, Anatolia, Balkans, Baltic, British Isles, Caucasus, Iberia, Middle East, Near East, Pannonia, Pontic steppe, Scandinavia, Siberia, central Europe, eastern Africa, southern Europe, western Europe.
- culture - archaeological classification of the sample
- epoch - rough time/culture identifier. Categories used: Aurignacian, Bronze Age, Copper Age, Epigravettian, Epipaleolithic, Gravettian, Iron Age, Mesolithic, Neolithic, and Upper Paleolithic.
group - we use these acronyms to classify samples into different groups. They are generally based on culture, ethnicity, epoch, geo_group or any combination thereof.
It the acronym has 3 letters, then it is based predominately on culture.
If the acronym has 4+ letters, then it contains information about epoch and location of the sample, and its meaning can be decoded as follows:
XXYY, where XX = epoch (BA - Bronze Age, CA - Copper Age, EBA - Early Bronze Age, IA - Iron Age, LNE - Late Neolithic, MA - Middle Ages, ME - Mesolithic, NE - Neolithic), and YY = location, if used in capital letters - based on geo_location (AF - Africa, BA - Balkans, BI - British Isles, CA - Caucasus, IB - Iberia, ME - Middle East, NE - Near East, SC - Scandinavia); or if ending with lower case letter - based on the country of origin (Cz - Czech Republic, Ge - Germany, Gr - Greece, Hu - Hungary, It - Italy, Nl - Netherlands, Pl - Poland, Ru - Russia, Uk - Ukraine)
- comment - place for general comments, can contain uncertainties, additional sample info, relationship between samples, etc.
- latitude - geographical coordinates, decimal degrees
- longitude - geographical coordinates, decimal degrees
- sex - male (M)/female (F)/unknown (U), ideally based on the amount of Y chromosomal DNA from the sample, but we are not distinguishing between sex based on genetic, bone anthropology, or archaeological evidence.
- site - archaeological site where the sample was excavated
- site_detail - more details about the site, where available
- mt_hg - mitochondrial haplogroup
- ychr_hg - Y chromosomal haplogroup, where available
- ychr_snps - mitochondrial haplogroup
Following variables all concern the dating of the sample. This kind information about each sample varies considerably in its completeness and available form. We are struggling to keep as much information in the database as possible and provide parsed data for quick and comfortable use. In general, we will use calibrated BCE or CE values, wherever possible. If the sample has radiocarbon dating (14C), we provide this information as well.
- year_from - lower limit of sample age (of 95.4% probability interval for 14C samples), if starting with '-', then it refers to BCE, positive values refer to CE
- year_to - upper limit of sample age (of 95.4% probability interval for 14C samples), if starting with '-', then it refers to BCE, positive values refer to CE
- date_detail - string with information about the sample age, taken directly from the source publication
- bp - uncalibrated age for 14C samples, numbers of years before 1950
- c14_lab_code - for 14C samples, their radiocarbon laboratory code
- reference_name - source reference of the sample
- reference_link - DOI based link of the reference
- data_link - GenBank, ENA, SRA, or other depository link, where the sequence is available
- c14_sample_tag - 1 if sample was radiocarbon dated, 0 otherwise
- c14_layer_tag - 1 if another sample in the same statum was radiocarbon dated, 0 otherwise. The sample can't have '1's in both c14_sample_tag AND c14_layer_tag. If both tags equal to 0, then the sample was dated by other methods only, most probably according to the archaeological culture
mtDNA sequence reconstruction
Whenever it is possible to download the full mtDNA sequence from GenBank or other repository, we prefer this option. When only BAM/SAM files are provided (usually in ENA or SRA archives), we will use the following pipeline:
BWA software package version 0.7.829 is used to map merged reads as single-end reads against the revised Cambridge Reference Sequence (rCRS)[1, 2] (GenBank: NC_012920), with the non-default parameters -l 16500 -n 0.01 -o 2 -t 2. The ratio of reads mapping to Y and X chromosomes (Ry) (with mapping quality greater than 30) is calculated to assign molecular sex of individuals sequenced on the Illumina platform.
FASTX-Toolkit is used to demultiplex sequences generated by PGM Ion Torrent, the scripts fastx_barcode_splitter.pl and fastx_trimmer (from the FASTX toolkit) are used to demultiplex the reads by barcode, using a one mismatch threshold. The Cutadapt v.1.8.133 is then used to remove the long (−M 110), short (−m 35), and low-quality sequences (−q 20). The filtered reads are analyzed with FastQC v0.11.334 using the options described previously by . The sequences are mapped against the rCRS using TMAP v3.4.136. To collapse duplicate sequence reads with identical start and end coordinates (for both PGM and Illumina sequence data) we use FilterUniqueSAMCons.py script. Misincorporation patterns are assessed with the use of mapDamage v2.0.537. Consensus sequences are build using ANGSD v0.91038. We accept only reads with mapping score of 30, a minimum base quality of 20, and a minimum coverage of 3 as in . Where necessary, mitochondrial haplogroups (mt hgs) are assigned for each individual with the use of HAPLOFIND, the PhyloTree phylogenetic tree build 17 and Mitomaster.
- Anderson, S. et al. Sequence and organization of the human mitochondrial genome. Nature 290, 457–465 (1981).
- Andrews, R. M. et al. Reanalysis and revision of the Cambridge reference sequence for human mitochondrial DNA. Nat. Genet. 23, 147 (1999).
- Skoglund, P. et al. Accurate sex identification of ancient human remains using DNA shotgun sequencing. J. Archaeol. Sci. 40, 4477–4482 (2013).
- Chyleński, M. et al. Late Danubian mitochondrial genomes shed light into the Neolithisation of Central Europe in the 5th millennium BC. BMC Evol. Biol. 17, 80 (2017).
- Meyer, M. and Kircher, M. Illumina sequencing library preparation for highly multiplexed target capture and sequencing. Cold Spring Harb. Protoc. 2010, t5448 (2010).
- Vianello, D. et al. HAPLOFIND: a new method for high-throughput mtDNA haplogroup assignment. Hum. Mutat. 34, 1189–1194 (2013).
- van Oven, M. and Kayser, M. Updated comprehensive phylogenetic tree of global human mitochondrial DNA variation. Hum. Mutat. 30, E386–E394 (2009).
- Lott, M. T. et al. mtDNA Variation and Analysis Using Mitomap and Mitomaster. Curr. Protoc. Bioinforma. 44, 1.23.1-26 (2013).
Here we provide links to other resources and tools for mtDNA and ancient DNA studies.
Krause et al. 2010. The complete mitochondrial DNA genome of an unknown hominin from southern Siberia. Nature, 464(7290), 894-897. (link, data) Keller et al. 2012. New insights into the Tyrolean Iceman's origin and phenotype as inferred by whole-genome sequencing. Nature Communications, 3(1), ?. (link, data) Raghavan et al. 2014. Upper Palaeolithic Siberian genome reveals dual ancestry of Native Americans. Nature, 505(7481), 87-91. (link, data) Olalde et al. 2014. Derived immune and ancestral pigmentation alleles in a 7,000-year-old Mesolithic European. Nature, 507(7491), 225-228. (link, data) Skoglund et al. 2014. Genomic Diversity and Admixture Differs for Stone-Age Scandinavian Foragers and Farmers. Science, 344(6185), 747-750. (link, data) Lazaridis et al. 2014. Ancient human genomes suggest three ancestral populations for present-day Europeans. Nature, 513(7518), 409-413. (link, data) Gamba et al. 2014. Genome flux and stasis in a five millennium transect of European prehistory. Nature Communications, 5(1), ?. (link, data) Fu et al. 2014. Genome sequence of a 45,000-year-old modern human from western Siberia. Nature, 514(7523), 445-449. (link, data) Haak et al. 2015. Massive migration from the steppe was a source for Indo-European languages in Europe. Nature, 522(7555), 207-211. (link, data) Allentoft et al. 2015. Population genomics of Bronze Age Eurasia. Nature, 522(7555), 167-172. (link, data) Olalde et al. 2015. A Common Genetic Origin for Early Farmers from Mediterranean Cardial and Central European LBK Cultures. Molecular Biology and Evolution, ?(?), msv181. (link, data) Günther et al. 2015. Ancient genomes link early farmers from Atapuerca in Spain to modern-day Basques. Proceedings of the National Academy of Sciences, 112(38), 11917-11922. (link, data) Llorente et al. 2015. Ancient Ethiopian genome reveals extensive Eurasian admixture in Eastern Africa. Science, 350(6262), 820-822. (link, data) Jones et al. 2015. Upper Palaeolithic genomes reveal deep roots of modern Eurasians. Nature Communications, 6(1), ?. (link, data) Mathieson et al. 2015. Genome-wide patterns of selection in 230 ancient Eurasians. Nature, 528(7583), 499-503. (link, data) Cassidy et al. 2016. Neolithic and Bronze Age migration to Ireland and establishment of the insular Atlantic genome. Proceedings of the National Academy of Sciences, 113(2), 368-373. (link, data) Omrak et al. 2016. Genomic Evidence Establishes Anatolia as the Source of the European Neolithic Gene Pool. Current Biology, 26(2), 270-275. (link, data) Fu et al. 2016. The genetic history of Ice Age Europe. Nature, 534(7606), 200-205. (link, data) Hofmanova et al. 2016. Early farmers from across Europe directly descended from Neolithic Aegeans. Proceedings of the National Academy of Sciences, 113(25), 6886-6891. (link, data) Broushaki et al. 2016. Early Neolithic genomes from the eastern Fertile Crescent. Science, 353(6298), 499-503. (link, data) Lazaridis et al. 2016. Genomic insights into the origin of farming in the ancient Near East. Nature, 536(7617), 419-424. (link, data) Kilinc et al. 2016. The Demographic Development of the First Farmers in Anatolia. Current Biology, 26(19), 2659-2666. (link, data) Juras et al. 2017a. Investigating kinship of Neolithic post-LBK human remains from Krusza Zamkowa, Poland using ancient DNA. Forensic Science International: Genetics, 26(?), 30-39. (link, data) Jones et al. 2017. The Neolithic Transition in the Baltic Was Not Driven by Admixture with Early European Farmers. Current Biology, 27(4), 576-582. (link, data) Juras et al. 2017b. Diverse origin of mitochondrial lineages in Iron Age Black Sea Scythians. Scientific Reports, 7(1), ?. (link, data) Chyleński et al. 2017. Late Danubian mitochondrial genomes shed light into the Neolithisation of Central Europe in the 5th millennium BC. BMC Evolutionary Biology, 17(1), ?. (link, data) Lazaridis et al. 2017. Genetic origins of the Minoans and Mycenaeans. Nature, ?(?), ?. (link, data) Lipson et al. 2017. Parallel palaeogenomic transects reveal complex genetic history of early European farmers. Nature, 551(7680), 368-372. (link, data) Stolarek et al. 2018. A mosaic genetic structure of the human population living in the South Baltic region during the Iron Age. Scientific Reports, 8(1), ?. (link, data) Mathieson et al. 2018. The genomic history of southeastern Europe. Nature, 555(7695), 197-203. (link, data) Olalde et al. 2018. The Beaker phenomenon and the genomic transformation of northwest Europe. Nature, 555(7695), 190-196. (link, data)
Phylotree - A comprehensive phylogenetic
tree of worldwide human mitochondrial DNA variation
Mitomap - A human mitochondrial genome database, a compendium of polymorphisms and mutations in human mitochondrial DNA, includes MITOMASTER module for mtDNA sequence analysis
Haplofind - Fast and easy mitochondrial DNA haplogroup assignment
Haplogrep - Tool for haplogroup assignment and mutation report
Haplosearch - Fast transformation between haplotype format and nucleotide sequence data
Online Ancient Genome Repository - Ancient DNA repository
for the Australian Centre for Ancient DNA, University of Adelaide
Ancestral Journeys - Ancient DNA database (currently offline)
GenBank - Complete ancient mitochondrial genomes in NCBI GenBank database
New samples and minor tweaks
- added the rest of the samples from Allentoft et al. 2015 (29 samples)
- all Corded Ware Culture (CWC) samples have their epoch changed to Neolithic. This is disputably the most conservative, but at least consistent, way of presenting the CWC samples...
Added a filter based on sequence quality of the samples
- in this release we have added the possibility to select/filter samples based on sequence_source and average coverage (sequencing depth)
- please note that more options to filter samples based on mtDNA sequence quality are planned for future releases
- variable sequence_source added, which can be one of this 3 labels:
- fasta - for mt sequences downloaded directly from other databases (most likely from GenBank)
- bam - for mt sequences created using our pipeline descibed in the Docs from raw data (BAM/SAM files most likely from ENA or SRA archives
- reconstructed - for mt sequences reconstructed from published haplotypes
- variable avg_coverage added, a basic sequence quality indicator, computed for bam samples (see above) as (total length of reads) / (length of consensus sequence)
- advanced search options updated to reflect the new variables
AmtDB opened for public access!
- 1107 samples from 3 continents and 36 countries
- 887 full mitochondrial genomes in FASTA format
- sequences (FASTA) and metadata (CSV) ready for download
- maps and displaying search results
- database documentation
- AmtDB changes log - ready in basic form