Welcome to EDTA-GUI. This version provides a graphical interface that streamlines your work and research in annotating transposable elements. The performance of the annotation process depends on your machine's specifications, so ensure it remains powered on during execution. For any questions or further information, use the iconhelp icon  .

Data input

Email Adress

help icon

Genome data

help icon

Email Adress

To start the annotation process, enter a valid email address. This is a optional field and is used to send notifications about the status of the analysis. While the email facilitates communication, the annotation occurs locally, so keep the system running during the process.

You can also monitor the progress of the annotation by accessing the "Results" tab.

Genome Data

Upload the input file containing the complete genomic sequence in FASTA format. Use the "Browse" button to select the file from the local system.

Additional features

Mode

help icon

Mode

Specifies which annotation tool you wish to use. The available options are:

  • EDTA-GUI Provides annotation of transposable elements in eukaryotic genomes. Performs de novo detection and general classification of different TE classes. This is the default option for genomes of any eukaryotic organism. (default)
  • AnnoTEP: Provides specialised annotation of transposable elements in plant genomes. A version of EDTA adapted for high accuracy in identifying all classes, including autonomous and non-autonomous elements, as well as LTR lineages in plant genomes. Produces charts, detailed reports, and phylogenetic trees from the results.

Species specification to identify TIR candidates

help icon

Species Specification to Identify TIR Candidates

The options include:

  • Others: For species other than rice or maize (default).
  • Rice: For rice genomes.
  • Maize: For maize genomes.

Steps to be executed

help icon

Steps to Be Executed

Select which parts of the pipeline will be executed:

  • All: Runs the entire pipeline (default).
  • Filter: Starts from raw TEs to the end.
  • Final: Begins with filtered TEs to completion.
  • Anno: Conducts genome-wide annotation after building the TE library.

When selecting the options "Filter", "Final" or "Anno", a field will appear to choose the folder where a previous annotation step was processed.

This folder must be located in one of the following directories:

  • Repository installation: /EDTA/gui/results
  • Docker installation: /usr/local/AnnoTEP/gui/results or in the volume configured to map this directory.

TE Analysis Parameters

help icon

Overwrite

Deactivated

Sensitivity

Deactivated

Annotation

Deactivated

Evaluate

Deactivated

Force

Deactivated

Threads

Additional input files

(Field not required)

Coding DNA Sequence (cds)

help icon

Curate library (curatedlib)

help icon

Exclusion of masked regions (exclude)

help icon

RepeatModeler library (rmlib)

help icon

RepeatMasker library (rmout)

help icon

TE Analysis Parameters

EDTA-GUI mode

  • Overwrite: Decide whether existing output data should be overwritten.
  • Sensitivity: Control the execution of RepeatModeler to identify additional elements.
  • Annotation: Specify whether the genome-wide annotation of TEs should proceed after building the TE library.
  • Evaluate: Check if the classification of annotated TEs is consistent. The "Annotation" field must be enabled to use this feature.
  • Force: If no reliable TE candidates are identified, enabling this option allows the script to continue using a backup rice TE library.

AnnoTEP mode (Additional feature)

  • TIR filter: Filter TIRs without annotated domains. Enabling this filter can substantially reduce false positives, but may also result in the loss of some true positives (false negatives).
  • Annot. type: Specify whether to annotate the genome using a RepeatMasker-based librar. Enabling this option may negatively affect the filtering step and compromise benchmark results.
  • Run LAI: LAI calculations must be performed on haploid assemblies. If you have a diploid or polyploid genome, we recommend disabling this option.

  • Neutral Mutation Rate: Set the neutral mutation rate for calculating the age of intact LTR elements (default: 1.3e-8 bp per year, based on rice).
  • Maximum Divergence: Define the maximum acceptable divergence for TE fragments. For highly repetitive genomes, users are encouraged to adjust the parameter (default: 40).
  • Threads: Determine the number of threads to be used in running the pipeline (default: 10).

Additional Input File

This subsection allows for the addition of optional files to customise the analysis:

  • Coding DNA Sequence: Select a FASTA file containing the coding sequence (without introns, UTRs, or TEs) of the genome or a close relative. This helps in excluding non-transposable elements.
  • Curate library: Upload a curated library to maintain consistent TE naming and classification. Only manually validated TEs should be provided. This file is optional.
  • Exclusion of masked regions: Define regions to be ignored during TE masking. The "Annotation" field must be enabled to use this option.
  • RepeatModeler library: Upload a classified RepeatModeler library to enhance analysis sensitivity, particularly for LINEs. If not provided, one will be generated automatically.
  • RepeatMasker library:Provide your own homology-based TE annotation in RepeatMasker .out format. This file will be merged with the structure-based annotation. The "Annotation" field must be enabled.

panEDTA-GUI

This is the serial version of panEDTA. Each genome will be annotated sequentially and then combined with the panEDTA functionality. Existing EDTA annotation of genomes (EDTA run with --anno 1) will be recognized and reused. A way to acclerate the pan-genome annotation is to execute EDTA-GUI annotation of each genomes separately and in parallel, then execute panEDTA-GUI to finish the remaining of the runs. To help filtering out gene-related sequences, at least one CDS file is required. Please read wiki for the CDS requirement. You may want to check out the toy example in the ./test folder to get familiarized.

Data Input

Email Adress

help icon

Genome data *

help icon

Email Adress

To start the annotation process, enter a valid email address. This is a optional field and is used to send notifications about the status of the analysis. While the email facilitates communication, the annotation occurs locally, so keep the system running during the process.

You can also monitor the progress of the annotation by accessing the "Results" tab.

Genome Data

Upload the input file containing the complete genome sequence in FASTA format, along with a corresponding CDS file (optional). You must provide at least one genome file. The optional CDS file will only be considered if it is linked to a genome; otherwise, it will be ignored. Use the Browse button to select files from your local system. By clicking the "+" icon, you can add more fields to upload additional genome and CDS files. By clicking on the "-"" icon, you can remove the added field.

Additional features

Coding DNA Sequence (cds) *

help icon

Non-redundant library

help icon

Threads

Additional Features

This subsection provides advanced configuration options for the analysis.

  • Coding DNA Sequence: Required. A coding sequence file in FASTA format. The CDS file provided in this field will be used to fill in any missing CDS files from the Genome Data list. If no CDS files are specified in the genome list, this CDS file will be applied to all genomes.
  • Non-redundant library: SOptional. A manually curated non-redundant library following the RepeatMasker naming convention.
  • Threads: Specify the number of threads to be used when running panEDTA-GUI (default: 10).
  • Minimum number of full-length TE copies in individual genomes: Defines the minimum number of full-length TE copies required in individual genomes for them to be considered as candidate TEs for the pangenome. Lower values are more inclusive, resulting in a larger library, higher sensitivity, but increased inconsistency. Higher values are more stringent, leading to a smaller library, reduced sensitivity, and greater consistency. (default: 3).

Latest Activities

List of the 10 most recent genomic data annotations

help icon

Latest Activities

This section displays key information such as:

  • The name of the generated output file;
  • Start and end timestamps (when available);
  • The current status of the annotation (e.g., in progress, completed, failed);
  • The last 20 lines of the annotation log.

Note 1

All results and output files will be stored in the Docker volume you specified as the output directory. Ensure this path is correctly mounted to access the generated data.

Note 2

Errors may occasionally occur during the annotation process, so it is important to pay attention to two key stages:

  • Look out for any prolonged error messages.
  • Using EDTA-GUI: Confirm whether the message "Evaluation of TE annotation finished! ..." or "panEDTA annotation of genome_${date}.cds.list is finished!" appears — this message indicates that the annotation has been successfully completed.
  • Using AnnoTEP: Confirm whether the message "The generation of charts and reports has been completed" appears — this message indicates that the annotation has been successfully completed.