Transposable Elements (TEs) constitute a considerable fraction of eukaryotic genomes and they play an important role on genome evolution. miRNAs can be structurally and functionally related to those repetitive elements, and they are known by their regulatory activity at post-transcriptional level. This relation remains an important question in the study of gene expression and its effects on several traits.
Here, we used CENSOR within Repbase Update libraries for TE annotation. Plant genomes were selected according to miRBase v21 annotation files. Pre-miRNA annotation from miRBase was manually checked to avoid mistakes when using compatible genomes that seemed slightly different from provided information. Overlaps between TEs and miRNAs were identified with BEDTools and manually curated. These results were used to build this database. The workflow is presented below:
Figure 1. Workflow diagram for identification of TE-MIRs. CENSOR and BLAST programs were used to map TEs and pre-miRNAs. Bash script was used to filter, parse and organize results in GFF3 format. Using Artemis, pre-miRNA files were checked to confirm names and positions according to miRBase. Positional intersection analysis between TE-miRNA was run using BEDtools, and manually checked with Artemis. These results were modelled to build the PlanTE-MIR DB.
PlanTE-MIR DB contains information about 152 pre-miRNAs overlapping TEs among ten plant species (Figure 2).
Figure 2. Database composition by species and TE classification.
In Search section, users have a web interface to search TE-MIR relationships. The page has an intuitive step-by-step form to select options by organism name and TE or pre-miRNA attributes. TEs can be found by picking its reference name (the same as supplied by Repbase Update), by TE Name according to Wicker and co-workers (2007)¹ nomenclature, TE Position in assembly and TE class. TE class supplies an hierarchical filter allowing the user to choose among TE classes, orders and superfamilies. In the same way, pre-miRNAs may be found by typing miRBase ID, selecting miRBase Name or by position in assembly.
After selection, a list of hits will be shown where user can download search results by selecting Table format file, GFF3 format file or FASTA format file. Clicking on Details, user can access a detailed page containing information about organism, annotation and cross-reference obtained for each result. The description table displays Species Name, Common Name, Assembly Version, TE Name, Classification, Repbase Name, TE annotation details (such as Repbase version, CENSOR coverage, CENSOR similarity, start position, end position and strand), overlapping pre-miRNA, pre-miRNA ID and pre-miRNA annotation details.
Download section has bulk data for each species. All data is available in three formats: table, GFF3 and FASTA. GFF3 annotation files for TEs and pre-miRNAs can be directly loaded into public available assemblies using a genome browser tool (e.g. Artemis).
For more information about attributes and file formats, access supplementary information
here.
[1] WICKER, Thomas et al. A unified classification system for eukaryotic transposable elements.
Nature Reviews Genetics, v. 8, n. 12, p. 973-982, 2007.