AnnoTEPDB is a transposable element database that compiles a collection of pre-processed genomes from a diverse set of plant species, including angiosperms, gymnosperms, and algae, covering a broad diversity of genome sizes. These data were generated using the EDTA-GUI in AnnoTEP mode (Ou et al., 2019; PMID: 31843001), a tool developed for the annotation of transposable elements in plants. Within AnnoTEPDB, users can access:

Annotation of all Class I and Class II elements.

Data visualization in graphic format and phylogenetic trees.

Masked genome versions, suitable for use in other pipelines.

Obtaining tool

EDTA-GUI is available on the official GitHub repository, accompanied by a user guide, user manual, and containerized versions.

Contributors

Mutation rate table

The table contains recommended mutation-rate values for use in the annotation workflow. These estimates derive from statistical analyses and literature review and aim to provide a reliable reference for mutation-rate calculations across different plant taxa.

These values should be used as preliminary guidance; users must validate and, if necessary, adjust them based on empirical data specific to their study system. More details are available under "Help → General Recommendations for Using Mutation Rates to Calculate LTR Ages" or "General Mutation Rates by Ecological Category" via the side menu bar.

Preprocessed Genomes

AnnoTEPDB documentation

Introduction

Welcome to the documentation for AnnoTEPDB, a database specialising in the storage of preprocessed data generated by the annotation of transposable elements (TEs) in plants using EDTA-GUI in AnnoTEP mode (Ou et al., 2019; PMID: 31843001).

Preprocessed Genomes

Here, you will find a list of genomes that have been analysed using EDTA-GUI in AnnoTEP mode. These genomes have been carefully tested and processed, and the results are available for consultation. To access these results, simply click on the plant image, and you will be redirected to a new page.

What will you find in the preprocessed genomes?

Each preprocessed genome contains the following results and analyses:

  1. Annotated data:
    • This section provides a selection of the data produced during TE annotation, such as FASTA and GFF3 reports, masked files, .sum and .LAI outputs. It also includes visual representations (graphs) and the underlying data required to generate them.
  2. TE classification table and genomic distribution:
    • This section presents a table that categorises transposable elements hierarchically by order, superfamily and autonomy, along with the metrics of base pairs, size and percentage. This data is illustrated using bar charts and bubble charts.
  3. RepeatLandscape graphic:
    • The repeat landscape graph provides a coherent and easily understandable inference of the relative ages of each repetitive element identified in a specific genome. This analysis is based on the genetic distance calculation proposed by Kimura, which estimates the time elapsed since duplication or insertion events of these elements.
      By applying Kimura’s calculation, the graph distinguishes older elements (with greater accumulated divergence) from more recent ones (with lower divergence), offering valuable insights into the evolutionary dynamics and genomic history of the organism under study.
    • This section also features graphs generated by the original EDTA pipeline (Ou et al., 2019; PMID: 31843001), adapted to reflect the organisation and classification of transposable elements (TEs). The plot shows the genomic proportion of each TE category across divergence levels, allowing the visualisation of the age distribution and relative abundance of different TE superfamilies.
  4. LTR Age Graph:
    • The histogram displays the age distribution of LTR elements identified in the genome. The dashed vertical lines indicate the median age, while the horizontal line represents the mean, both expressed in million years (Mya). This visualisation provides a clear analysis of the dispersion of LTR ages, highlighting the central tendency and temporal variability of these elements.
  5. Phylogenetic Tree and Density Graph:
    • In this section, the phylogeny of lineage alignments within LTR superfamilies is constructed, providing a comprehensive visualisation of their evolutionary relationships. The phylogeny is a graphical representation that illustrates how different LTR-RT domains are related to each other based on their genetic sequences.

General Recommendations for Using Mutation Rates to Calculate LTR Ages

  1. Understanding LTR Age Calculation:
    • The age of an LTR retrotransposon can be estimated by comparing the divergence between the 5' and 3' LTR sequences of the same retrotransposon. The assumption here is that these sequences were identical at the time of insertion and have diverged due to mutations over time.
    • The formula commonly used is: Age = Divergence / (2 x Mutation Rate), where Divergence is the genetic distance between the two LTR sequences.
  2. Accurate Divergence Estimation:
    • Use reliable bioinformatics tools to accurately measure the sequence divergence between the LTRs. Tools like LTR_retriever provide mechanisms to identify LTRs and calculate divergence.
    • Ensure that the alignment and comparison of LTR sequences are accurately performed to avoid underestimation or overestimation of divergence.
  3. Appropriate Mutation Rate:
    • Use species-specific mutation rates when available. The mutation rates you have for each species are critical as they can significantly affect age estimations.
    • If species-specific mutation rates are not available, use rates from closely related species or general rates for the plant family as a proxy, acknowledging the potential for error this introduces.
  4. Literature Review for Validation:
    • Review recent literature to validate the mutation rates and the methodologies used for similar studies in the same or related species. This can help confirm that your approach is aligned with current scientific standards.
    • Especially look for studies that have used LTR_retriever or similar tools in the same species for comparisons.
  5. Consideration of Evolutionary and Environmental Factors:
    • Remember that mutation rates can be influenced by various factors including environmental stress, life history traits, and population dynamics. These factors might cause the actual mutation rate in certain environments or periods to deviate from the average.

The mutation rate list provided below can be a valuable resource for calculating the ages of LTR retrotransposons. However, this list should be used with caution due to several important considerations:

  1. Species-Specific Variability:
    • Mutation rates can vary significantly even within a single species due to environmental factors, genetic background, and historical population dynamics. The rates provided are averages and may not capture this intra-species variability.
  2. Generalization Risks:
    • Using mutation rates from closely related species or generalized rates for an entire plant family can introduce errors. Such rates might not accurately reflect the specific evolutionary pressures and genetic history of the species of interest.
  3. Methodological Differences:
    • The methods used to estimate these mutation rates might differ, affecting their accuracy. Some rates might be derived from lab observations under controlled conditions, which may not perfectly mimic natural environments.
  4. Evolutionary and Environmental Influences:
    • Mutation rates are influenced by numerous factors including climate, soil conditions, and exposure to mutagens, which can fluctuate over time and across geographies. This context-dependent nature of mutation rates can lead to underestimations or overestimations of LTR ages.
  5. Technological and Analytical Limitations:
    • The precision of mutation rate calculations and the subsequent age estimations of LTR retrotransposons rely heavily on the technology and algorithms used in their determination. Advances in sequencing technology or bioinformatics tools may refine these rates, potentially altering previous calculations.
  6. Literature Support:
    • It is crucial to consult the latest peer-reviewed studies for the most recent and robust mutation rates and to understand the context in which they were measured. Research publications often provide more nuanced insights into the conditions and accuracy of reported mutation rates.

Recommendations

When using this list to calculate LTR ages, clearly state any assumptions made about mutation rates and the potential sources of error in your methods and results. Consider validating your findings with multiple approaches and seek peer feedback or additional data where possible. Always stay updated with the latest research and methodological advances that may impact the interpretation of these rates.

General Mutation Rates by Ecological Category

  1. Tropical Plants:
    • Tropical plants often have higher rates of growth and reproduction, which could lead to higher mutation rates. However, the rich biodiversity and complex interactions in tropical ecosystems might also promote genetic stability to some extent.
  2. Aquatic Plants:
    • Aquatic environments provide a relatively stable thermal environment but can expose plants to varying levels of UV radiation and other mutagenic factors depending on water clarity and depth. This rate assumes a moderate mutation rate reflecting these mixed conditions.
  3. Estimated Rate:
    • Plants in arid or desert environments are exposed to extreme conditions that can increase oxidative stress and potential DNA damage, possibly leading to slightly higher mutation rates.
  4. Arctic and Alpine Plants:
    • The harsh, cold environments can slow metabolic processes and potentially reduce mutation rates. These plants also have longer life spans and slower growth rates, which might contribute to a lower rate of mutation accumulation.
  5. Temperate Forest Plants:
    • This rate is based on the assumption that temperate plants experience seasonal variations that might impact their metabolic rates and, consequently, their mutation rates. This is a mid-range estimate considering the moderate environmental stresses.

Notes on General Mutation Rates by Ecological Category Estimations

  • These estimates are highly speculative and should be used with caution in scientific contexts. They are based on ecological reasoning rather than direct experimental evidence, which is the ideal method to determine such rates.
  • Mutation rates can vary widely even within a single ecological category due to species-specific factors, including life cycle length, reproductive strategy, and exposure to environmental mutagens.

Suggested Use

These general rates can be useful for preliminary models or simulations in ecological genetics and evolutionary studies. They provide a starting point for discussions about how different environments might influence genetic variability in plants. However, for rigorous scientific research, specific studies and data are always recommended.

Please contact us if you have any questions or need to report an issue on the platform, whether related to github, the database, or containers.
marcosnandosc@gmail.com

(Support)

alessandro.varani@unesp.br

(Advisor)

vabreu@ufpa.br

(Advisor)