Glossaries

TIS: Translation Initiation Site

annotated translation initiation site (of the main ORF) from the UCSC database (ref)

CDS: coding sequence

annotated coding sequences adopted from the UCSC database (ref)

UTR: untranslated regions

annotated untranslated regions adopted from the UCSC database (ref)

Conservation Score

Conservation scores adopted from phastCons reveal the the degree of conservation of human to 19 other mammalian genomes. (ref)
We used the rank-sum test to test if the median conservation of the IRES is larger than the median conservation of the transcript.

For the three IRES functionality tests, a q-value threshold of 1e-6 is suggested for controlling that there would be at most two false positives for all putative IRESs in this database. Note that the thresholds of the functional tests should be treated as statistical type I error control for the functional aspects. It should not be misinterpreted as the prediction confidence since it is reported that some human IRESs are not conserved in mammals or lack complete structural forms.

Structural RPI Score

The structural RPI score is to estimate the given IRES's tendency to be involved in the RNA-protein interaction network through its structural form.

The RPI score is calculated by the following steps:
1. BLAST the human ncRNAs having verified RNA-protein interactions curated in the RNAInter database to the BRAlibaseII structure benchmark dataset and obtain a reference set with known structures and the corresponding interacting protein set.
2. Use 12 different RNA structure prediction tools to produce candidate structures for a given IRES sequence.
3. For each candidate structure, an ncRNA set consisting of the top 10 structurally similar ncRNAs from the constructed reference set was extracted.
4. Proteins interacting with these structurally similar ncRNAs were listed based on the data from the RNAInter database.
5. On the listed protein set, we performed the protein interaction prevalence test and the ontology enrichment test. The protein interaction prevalence test evaluates if the protein set forms an enriched protein complex, and the ontology enrichment test checks if the protein set has statistically significant annotated functional ontology terms.
6. The IRES structural form is said to be functionally interpretable in RNA-protein complex formation if its structurally similar ncRNAs selected from the reference RNA collection bear a functionally enriched interacting-protein set.
7. The candidate structure with the lowest test enrichment p-value, or the RPI score, was selected as the most RPI-interpretable representative IRES structure for the given IRES sequence.

For the three IRES functionality tests, a q-value threshold of 1e-6 is suggested for controlling that there would be at most two false positives for all putative IRESs in this database. Note that the thresholds of the functional tests should be treated as statistical type I error control for the functional aspects. It should not be misinterpreted as the prediction confidence since it is reported that some human IRESs are not conserved in mammals or lack complete structural forms.

Conditional Translation Efficiency

The translation efficiency of a given sequence between the stressed and the normal conditions is measured by ribosome profiling (ribo-seq). (ref)
By performing the rank sum-test on the RPM values between the stressed and normal conditions, users can get the idea if the identified IRES is stress-induced.

The ribo-seq data is processed by the following steps:
1. The adaptor linker sequences or poly-(A) tails of the short reads are trimmed by Cutadapt.
2. The trimmed reads are then mapped to the hg38 reference transcriptome using RSEM with the bowtie engine.
3. The RPM (reads per million mapped reads) values are calculated on all transcript nucleotide positions.

For the three IRES functionality tests, a q-value threshold of 1e-6 is suggested for controlling that there would be at most two false positives for all putative IRESs in this database. Note that the thresholds of the functional tests should be treated as statistical type I error control for the functional aspects. It should not be misinterpreted as the prediction confidence since it is reported that some human IRESs are not conserved in mammals or lack complete structural forms.

eTIS: Experimentally-identified TIS

eTIS represents the TIS triplets probed by GTI-seq (global translation initiation sequencing). (ref)
The number of eTISs of an identified IRES element was counted by the number of eTISs that fall within the IRES sequence or its upstream and downstream 100 bps.

The GTI-seq is processed by the following steps:
1. The GTI-seq data were first mapped to the hg38 genome and transcriptome by Tophat.
2. The 13th nucleotide of a uniquely mapped read was identified as the start codon recognized in translation initiation.
3. By using the zero-truncated binomial negative model, the initiation sites with statistically significant numbers of read counts were determined.

nTIS: (predicted) Noncanonical TIS

nTIS collects the non-canonical TIS triplets predicted by the PreTIS software. (ref)
The number of nTISes of an identified IRES element was counted by the number of nTISes that fall within the IRES sequence or its upstream and downstream 100 bps.

IRES Activity

The IRES activity of an oligo sequence is the measured eGFP expression in the bicistronic fluorescence-activated cell sorting (FACS) experiment. (ref)
The IRES activity value of an identified IRES is adopted from the best matched oligo sequence with the highest query cover.

In the FACS bicistronic experiment, activity is designated as background expression when the oligos were not detected in adjacent eGFP bins during GFP sorting but had more than 100 reads in the full mRFP-oligos-eGFP library, indicating these background oligos did not convey cap-independent translation signals in the experiments. And the oligos that were assigned 'NaN' if they got fewer than 100 reads in the full mRFP-oligos-eGFP library, representing a lack of identification confidence. Other normal real numbers represent the extent of IRES activity of the regulatory sequences.

RPM

RPM metric for evaluating sequencing results is defined as the average read counts per million mapped reads for a given genome location.

BLASTN Query Cover

The BLASTN Query Cover of the result sequence calculates the percentage of the query sequence covered by the result sequence.

BLASTN Percent Identity

The BLASTN Percent Identity of the result sequence computes the percentage of the aligned bases identical to the target sequence.