Loading Toolbox...

CPAT

coding potential assessment (Galaxy Version 3.0.5+galaxy1)

Tool Parameters

Please provide a value for this option.

Query nucletide sequences * required

No datasets with fasta or fasta.gz elements available

It is recommended to use short and unique sequence identifiers (--gene)

Reference genome source

Parameter 'ref_fasta': specify a dataset of the required format / build for parameter

Reference genome from History * required

No datasets with fasta or fastq.gz elements available

Reference genome sequences in FASTA format (-r)

Parameter 'c': specify a dataset of the required format / build for parameter

Coding sequences file * required

No datasets with fasta or fasta.gz elements available

Coding sequence (must be CDS without UTR, i.e. from start coden to stop coden) in FASTA format (-c)

Parameter 'n': specify a dataset of the required format / build for parameter

Non coding sequences file * required

No datasets with fasta or fasta.gz elements available

Noncoding sequences in FASTA format (-n)

Start codon *

(--start)

Stop codons *

(--stop)

Minimum ORF length *

Minimum ORF length in nucleotides (--min-orf)

Minimum ORF length *

Minimum ORF length in nucleotides (--min-orf)

Search for ORFs from the anti-sense strand

(--antisense)

Number of ORF candidates reported *

RNAs may have dozens of putative ORFs, in most cases, the real ORF is ranked (by size) in the top several (--top-orf)

Criteria to select the best ORF *

(--best-orf)

Additional Options

Email notification

Send an email notification when the job completes.

Help

Purpose

CPAT is a bioinformatics tool to predict RNAs coding probability based on the RNA sequence characteristics. To achieve this goal, CPAT calculates scores of these 4 linguistic features from a set of known protein-coding genes and another set of non-coding genes.

ORF size
ORF coverage
Fickett TESTCODE
Hexamer usage bias

CPAT will then builds a logistic regression model using these 4 features as predictor variables and the “protein-coding status” as the response variable. After evaluating the performance and determining the probability cutoff, the model can be used to predict new RNA sequences.

Unnamed history

Loading History...

Draggable