- Backend is written in Python 3:
- Server: nginx , Gunicorn
- Database: PostgreSQL
-
Frontend:
- jQuery
- Styling, UI: custom, Bootstrap 4 , jQueryUI , Select2 , DataTables , noUIslider , tippy.js , CSS-Element-Queries , PACE page load indicator
- Icons: FontAwesome , Ionicons
- Maps: Leaflet and some plugins: Leaflet.Control.FullScreen , Leaflet.EdgeBuffer , Leaflet.Basemaps , Leaflet.markercluster , leaflet-providers , Leaflet.EasyButton
AmtDB documentation
Here you can find the documentation for metadata provided and the mitochondrial genome sequences stored in the database.
- id - primary identifier of the sample
- id_alt - secondary identifier(s) of the sample (is case it was also published under other name(s))
- country - country of origin
- continent - continent of origin
- geo_group - more general geographic group of the sample. So far we use these geo_groups: Altai, Anatolia, Balkans, Baltic, British Isles, Caucasus, Iberia, Middle East, Near East, Pannonia, Pontic steppe, Scandinavia, Siberia, central Europe, eastern Africa, southern Europe, western Europe.
- culture - archaeological classification of the sample
- epoch - rough time/culture identifier. Categories used: Aurignacian, Bronze Age, Copper Age, Epigravettian, Epipaleolithic, Gravettian, Iron Age, Mesolithic, Neolithic, and Upper Paleolithic.
-
group - we use these acronyms to classify samples into different groups. They are generally based on culture, ethnicity, epoch, geo_group or any combination thereof.
It the acronym has 3 letters, then it is based predominately on culture.
If the acronym has 4+ letters, then it contains information about epoch and location of the sample, and its meaning can be decoded as follows:
XXYY, where XX = epoch (BA - Bronze Age, CA - Copper Age, EBA - Early Bronze Age, IA - Iron Age, LNE - Late Neolithic, MA - Middle Ages, ME - Mesolithic, NE - Neolithic), and YY = location, if used in capital letters - based on geo_location (AF - Africa, BA - Balkans, BI - British Isles, CA - Caucasus, IB - Iberia, ME - Middle East, NE - Near East, SC - Scandinavia); or if ending with lower case letter - based on the country of origin (Cz - Czech Republic, Ge - Germany, Gr - Greece, Hu - Hungary, It - Italy, Nl - Netherlands, Pl - Poland, Ru - Russia, Uk - Ukraine)
- comment - place for general comments, can contain uncertainties, additional sample info, relationship between samples, etc.
- latitude - geographical coordinates, decimal degrees
- longitude - geographical coordinates, decimal degrees
- sex - male (M)/female (F)/unknown (U), ideally based on the amount of Y chromosomal DNA from the sample, but we are not distinguishing between sex based on genetic, bone anthropology, or archaeological evidence.
- site - archaeological site where the sample was excavated
- site_detail - more details about the site, where available
- mt_hg - mitochondrial haplogroup
- ychr_hg - Y chromosomal haplogroup, where available
- ychr_snps - mitochondrial haplogroup
Following variables all concern the dating of the sample. This kind information about each sample varies considerably in its completeness and available form. We are struggling to keep as much information in the database as possible and provide parsed data for quick and comfortable use. In general, we will use calibrated BCE or CE values, wherever possible. If the sample has radiocarbon dating (14C), we provide this information as well.
- year_from - lower limit of sample age (of 95.4% probability interval for 14C samples), if starting with '-', then it refers to BCE, positive values refer to CE
- year_to - upper limit of sample age (of 95.4% probability interval for 14C samples), if starting with '-', then it refers to BCE, positive values refer to CE
- date_detail - string with information about the sample age, taken directly from the source publication
- bp - uncalibrated age for 14C samples, numbers of years before 1950
- c14_lab_code - for 14C samples, their radiocarbon laboratory code
- reference_name - source reference of the sample
- reference_link - DOI based link of the reference
- data_link - GenBank, ENA, SRA, or other depository link, where the sequence is available
- c14_sample_tag - 1 if sample was radiocarbon dated, 0 otherwise
- c14_layer_tag - 1 if another sample in the same statum was radiocarbon dated, 0 otherwise. The sample can't have '1's in both c14_sample_tag AND c14_layer_tag. If both tags equal to 0, then the sample was dated by other methods only, most probably according to the archaeological culture
In our MitoPathoTool and database entries, we are using data and description of mitochondrial pathological mutations from MITOMAP project, although it is slightly adjusted for our needs.
- mitopatho_alleles - Allele value is composed of position and change against the revised Cambridge reference sequence (Andrews et al., 1999 ). Generally, it has the shape of XXXY, where XXX are numbers (position of the allele on mtDNA sequence and Y is the changed sequence). Usually in the form of SNP (i.e. 5587C), or a small indel (letter d marks the deletion) (7472CA, 15944d, or 16021_16022dCT - dinucleotide deletion).
- mitopatho_positions - Position of the pathological allele on mtDNA sequence, values between 1 and 16 569.
-
mitopatho_locus - Location of the pathological mutation according to functional composition of mtDNA.
It can have these values: MT-CR, MT-ND1, MT-ND2, MT-CO1, MT-CO2, MT-ATP8, MT-ATP8/6, MT-ATP6, MT-CO3, MT-ND3, MT-ND4L,
MT-ND4, MT-ND5, MT-ND6, MT-CYB, MT-TF, MT-RNR1, MT-TV, MT-RNR2, MT-TL1, MT-TI, MT-TQ, MT-NC2, MT-TM, MT-TW, MT-TA,
MT-TN, MT-TC, MT-TY, MT-TS1 precursor, MT-TS1, MT-TD, MT-TK, MT-TG, MT-TR, MT-TH, MT-TS2, MT-TL2, MT-TE, MT-TT, MT-TP.
MT-CR is the mitochondrial Control Region, the other values mark the two rRNA genes (MT-R*), 22 tRNA genes (MT-T*), and 13 protein-coding genes (rest of the codes). - mitopatho_diseases - Short description or characterization of the pathological phenotype/disease/disorders associated with the allele.
- mitopatho_statuses - Status of the pathological allele. We keep the MITOMAP description here, so the values of this field can be: Cfrm, Reported, Conflicting reports, P.M.-possibly synergistic, Unclear, Possibly synergistic, ... The most important (and serious) result is Cfrm (confirmed) status - it indicates that at least two or more independent laboratories have published reports on the pathogenicity of a specific mutation. These mutations are generally accepted by the mitochondrial research community as being pathogenic. Reported status indicates that one or more publications have considered the mutation as possibly pathologic. Please, note that several Reported statuses exist, with more details about the nature of the reported allele. P.M. (point mutation/polymorphism) status indicates that some published reports have determined the mutation to be a non-pathogenic polymorphism.
-
mitopatho_homoplasms - Homoplasmy, pure mutant mtDNAs. Possible values:
- + = True
- - = False
- nr = Not Reported
- nan = Missing data, status unknown
-
mitopatho_heteroplasms - Heteroplasmy, mixture of mutant and normal mtDNAs. Possible values:
- + = True
- - = False
- nr = Not Reported
- nan = Missing data, status unknown
Whenever it is possible to download the full mtDNA sequence from GenBank or other repository, we prefer this option. When only BAM/SAM files are provided (usually in ENA or SRA archives), we will use the following pipeline:
BWA software package version 0.7.829 is used to map merged reads as single-end reads against the revised Cambridge Reference Sequence (rCRS)[1, 2] (GenBank: NC_012920 ), with the non-default parameters -l 16500 -n 0.01 -o 2 -t 2. The ratio of reads mapping to Y and X chromosomes (Ry) (with mapping quality greater than 30) is calculated to assign molecular sex of individuals sequenced on the Illumina platform[3].
FASTX-Toolkit is used to demultiplex sequences generated by PGM Ion Torrent, the scripts fastx_barcode_splitter.pl and fastx_trimmer (from the FASTX toolkit) are used to demultiplex the reads by barcode, using a one mismatch threshold. The Cutadapt v.1.8.133 is then used to remove the long (−M 110), short (−m 35), and low-quality sequences (−q 20). The filtered reads are analyzed with FastQC v0.11.334 using the options described previously by [4]. The sequences are mapped against the rCRS using TMAP v3.4.136. To collapse duplicate sequence reads with identical start and end coordinates (for both PGM and Illumina sequence data) we use FilterUniqueSAMCons.py script[5]. Misincorporation patterns are assessed with the use of mapDamage v2.0.537. Consensus sequences are build using ANGSD v0.91038. We accept only reads with mapping score of 30, a minimum base quality of 20, and a minimum coverage of 3 as in [4]. Where necessary, mitochondrial haplogroups (mt hgs) are assigned for each individual with the use of HAPLOFIND[6], the PhyloTree phylogenetic tree build 17[7] and Mitomaster[8].
References
- Anderson, S. et al. Sequence and organization of the human mitochondrial genome. Nature 290, 457–465 (1981).
- Andrews, R. M. et al. Reanalysis and revision of the Cambridge reference sequence for human mitochondrial DNA. Nat. Genet. 23, 147 (1999).
- Skoglund, P. et al. Accurate sex identification of ancient human remains using DNA shotgun sequencing. J. Archaeol. Sci. 40, 4477–4482 (2013).
- Chyleński, M. et al. Late Danubian mitochondrial genomes shed light into the Neolithisation of Central Europe in the 5th millennium BC. BMC Evol. Biol. 17, 80 (2017).
- Meyer, M. and Kircher, M. Illumina sequencing library preparation for highly multiplexed target capture and sequencing. Cold Spring Harb. Protoc. 2010, t5448 (2010).
- Vianello, D. et al. HAPLOFIND: a new method for high-throughput mtDNA haplogroup assignment. Hum. Mutat. 34, 1189–1194 (2013).
- van Oven, M. and Kayser, M. Updated comprehensive phylogenetic tree of global human mitochondrial DNA variation. Hum. Mutat. 30, E386–E394 (2009).
- Lott, M. T. et al. mtDNA Variation and Analysis Using Mitomap and Mitomaster. Curr. Protoc. Bioinforma. 44, 1.23.1-26 (2013).
AmtDB tools
MitoPathoTool
The MitoPathoTool is the result of our ongoing efforts to include more information about mitochondrial DNA functions, structure and its uses in our database. Today, pathological phenotypes caused by mutations in mtDNA genes and control region form a group of diseases and syndromes often called as mitochondrial diseases. The AmtDB content has been updated to include the information about pathological alleles in our samples' mtDNA sequences (you can check the sample details). MitoPathoTool is suited for analyzing your samples and annotating the pathological alleles. To realize these goals, we are using the list of mitochondrial pathological mutations published openly on the MITOMAP webpage (the current version r386 for the coding and control region mutations, and r382 for the rRNA/tRNA mutations). The list of alleles from MITOMAP database has been formatted for our needs, but the information content is unchanged.
How to use the MitoPathoTool
- Prepare your samples in HSD format. The easiest way is to run online version of the Haplogrep 2 software , where you can upload mtDNA sequences of your samples and get a HSD file. Another possibility is to install Haplogrep locally (e.g. from its GitHub repository ) and run your HSD conversion offline.
- Copy-paste the file directly or upload it into our online version of the MitoPathoTool. Alternatively, you can use the offline version which can be downloaded here .
- The results (one line per found pathological locus) can be saved in several ways (Excel, PDF, CSV, copy to clipboard) or printed.
Certificate of maternal lineage
Our app can help you to search and compare your maternal lineage for locations and (pre-)historical epochs to find where and when the carriers of the same lineage have lived. To use this tool, you need to know your mtDNA haplogroup.
There are several commercial genealogical services/companies that you can find online, and that after receiving your biological sample, will sequence your mtDNA molecule and estimate your mitochondrial haplogroup. Mitochondrial haplogroups are classified into groups and sub-groups in several levels. The first level of classification is marked with major letter, i.e. haplogroup A, B, C, ... . In Europe, common major haplogroups are H, U, T, I, J. Sub-haplogroups are designated with subsequent numbers and lower letters. So haplogroup U can have sub-haplogroup U3, which again can have sub-haplogroup U3a, and again U3a1 -> U3a1a and so on. It is necessary to consider how detailed sub-haplogroup will be used to create the certificate. Let's say, your haplogroup is U3a1a3. For this detailed lineage, we don't have a direct hit in our database yet. You will need to use less detailed lineage. There are 29 U3 samples all over from Europe, north Africa and western Asia. If you choose U3a, you will find 12 samples ranging from Spain to Iran. For U3a1, there are 10 samples from south-western and central Europe. For haplogroup U3a1a the certificate app will find just 3 samples from Middle Ages Poland. As you can see, the level of detail can be adjusted to your needs, as should also help with interpretation of the results.
Please note that this search and certificate app will not find your direct ancestors/family. The results should be rather interpreted as places and times where the people from the same (or relative - depending on the haplogroup level used) maternal lineage lived and came from.
Here we provide links to other resources and tools for mtDNA and ancient DNA studies.
Tools
- Phylotree - a comprehensive phylogenetic tree of worldwide human mitochondrial DNA variation.
- Mitomap - a human mitochondrial genome database, a compendium of polymorphisms and mutations in human mitochondrial DNA, includes MITOMASTER module for mtDNA sequence analysis.
- Haplofind - fast and easy mitochondrial DNA haplogroup assignment.
- Haplogrep - tool for haplogroup assignment and mutation report.
- Haplosearch - fast transformation between haplotype format and nucleotide sequence data.
Databases
- Online Ancient Genome Repository - ancient DNA repository for the Australian Centre for Ancient DNA, University of Adelaide.
- Ancestral Journeys - ancient DNA database (currently offline).
- GenBank - complete ancient mitochondrial genomes in NCBI GenBank database.
Chyleński, M., Ehler, E., Somel, M., Yaka, R., Krzewińska, M., Dabert, M., Juras, A. and Marciniak, A., 2019. Ancient mitochondrial genomes reveal the absence of maternal kinship in the burials of Çatalhöyük people and their genetic affinities. Genes, 10(3), p.207.
The article maps the important Anatolian archaeological site of Çatalhöyük in the Neolithic period and its connection to the Eastern Mediterranean on the one hand and the Levant on the other. Of interest is the great variability of local mtDNA haplogroups.
Juras, A., Makarowicz, P., Chyleński, M., Ehler, E., Malmström, H., Krzewińska, M., Pospieszny, Ł., Górski, J., Taras, H., Szczepanek, A. and Polańska, M., 2020. Mitochondrial genomes from Bronze Age Poland reveal genetic continuity from the Late Neolithic and additional genetic affinities with the steppe populations. American Journal of Physical Anthropology, 172(2), pp.176-188.
In this article we examined three archaeological cultures from Poland (Mierzanowice, Trzciniec, Strzyżow) and their mtDNA relationship to the original Neolithic population of central and eastern Poland and the whole of central Europe and to the migration wave from the Pontic Steppe, which came at the beginning of the Bronze Age. The populations associated with Mierzanowice culture and Trzciniec culture show a higher proportion of local (Neolithic) traits and are very similar, for example, to the populations of Corded Ware culture. People of Strzyżow culture, on the other hand, are more similar to eastern populations, such as the typically steppe culture of Yamnaya, which is now considered to be the main source of the steppe migration to the west.
v1.009 , 28/02/2024
Added 406 FASTA files to older studies
v1.008 , 13/10/2021
New samples and metadata update
v1.007 , 08/04/2021
Certificate of ancient maternal genetic lineage, MitoPathoTool and frontend updates
-
New tools:
- Certificate of ancient maternal genetic lineage.
- MitoPathoTool: search your samples (not only ancient mtDNA sequences) against a database of known mitochondrial diseases imported from MITOMAP . Also, samples are now linked to this database - you can inspect pathologies in a new table column and in sample details.
-
Improvements of frontend:
- Consistent usage of Bootstrap 4.
- Improved FAQ & Help section, prettier references table.
- Improved sample detail page (example).
- Major changes in backend.
v1.006 , 21/01/2021
Happy New Year! 🥳
-
Samples added:
- Haber et al. 2017 (5 samples)
- Harney et al. 2018 (22 samples)
- Jeong et al. 2016 (8 samples)
- Jeong et al. 2018 (22 samples)
- Neparáczki et al. 2018 (102 samples)
- Olalde et al. 2019 (271 samples)
- Ozga et al. 2016 (6 samples)
- Schuenemann et al. 2017 (90 samples)
- Unterländer et al. 2017 (97 samples)
- Zalloua et al. 2018 (8 samples)
v1.005 , 02/01/2020
Happy New Year! 🥳
- Samples added and updated:
v1.004 , 12/12/2019
GUI updates
- The graphical user interface has been updated with more modern design.
- Added a dedicated funding acknowledgement page.
v1.003 , 10/10/2019
New samples and metadata updates
- New samples added:
- All metadata have been reviewed and categories normalized and pooled, spelling was unified. Especially in variables geo_group, culture, epoch, and group.
v1.002 , 09/05/2019
New samples and back-end technical changes
-
New samples added:
- Tassi et al. 2017 (3 samples)
- Hofmanova et al. 2016 (2 samples we were still missing)
- Schroeder et al. 2019 (38 samples)
- Vai et al. 2019 (135 samples)
- We have changed the server provider earlier this summer, and subsequently the database has encountered some downtime and restarts. We believe the new services of MetaCentrum Cloud (provided by CESNET [LM2015042], the national e-infrastructure for science, development, and education in the Czech Republic) will be stable and fast to fullfill our common needs.
v1.001b , 09/03/2018
New samples and minor tweaks
- added the rest of the samples from Allentoft et al. 2015 (29 samples)
- all Corded Ware Culture (CWC) samples have their epoch changed to Neolithic. This is disputably the most conservative, but at least consistent, way of presenting the CWC samples...
v1.001 , 09/02/2018
Added a filter based on sequence quality of the samples
- in this release we have added the possibility to select/filter samples based on sequence_source and average coverage (sequencing depth)
- please note that more options to filter samples based on mtDNA sequence quality are planned for future releases
- variable sequence_source added, which can be one of this 3 labels:
- fasta - for mt sequences downloaded directly from other databases (most likely from GenBank)
- bam - for mt sequences created using our pipeline descibed in the Docs from raw data (BAM/SAM files most likely from ENA or SRA archives
- reconstructed - for mt sequences reconstructed from published haplotypes
- variable avg_coverage added, a basic sequence quality indicator, computed for bam samples (see above) as (total length of reads) / (length of consensus sequence)
- advanced search options updated to reflect the new variables
v1.000 , 05/09/2018
AmtDB opened for public access!
- 1107 samples from 3 continents and 36 countries
- 887 full mitochondrial genomes in FASTA format
- sequences (FASTA) and metadata (CSV) ready for download
- maps and displaying search results
- database documentation
- AmtDB changes log - ready in basic form
The content of the AmtDB is the sole responsibility of researchers at the Institute of Molecular Genetics of the Czech Academy of Sciences. Authors make no representations or warranties of any kind, express or implied, concerning the content of the database, including, without limitation, warranties of merchantability, fitness for a particular purpose, non-infringement, validity of any intellectual property rights or claims, whether issued or pending, and the absence of latent or other defects, whether or not discoverable.