Tool Parameters

Please provide a value for this option.
* required
It is recommended to use short and unique sequence identifiers (--gene)
Reference genome source
Parameter 'ref_fasta': specify a dataset of the required format / build for parameter
* required
Reference genome sequences in FASTA format (-r)
Parameter 'c': specify a dataset of the required format / build for parameter
* required
Coding sequence (must be CDS without UTR, i.e. from start coden to stop coden) in FASTA format (-c)
Parameter 'n': specify a dataset of the required format / build for parameter
* required
Noncoding sequences in FASTA format (-n)
*
(--start)
*
(--stop)
*
Minimum ORF length in nucleotides (--min-orf)
*
Minimum ORF length in nucleotides (--min-orf)
(--antisense)
*
RNAs may have dozens of putative ORFs, in most cases, the real ORF is ranked (by size) in the top several (--top-orf)
*
(--best-orf)

Additional Options

Send an email notification when the job completes.

Help

Purpose

CPAT is a bioinformatics tool to predict RNAs coding probability based on the RNA sequence characteristics. To achieve this goal, CPAT calculates scores of these 4 linguistic features from a set of known protein-coding genes and another set of non-coding genes.

  • ORF size
  • ORF coverage
  • Fickett TESTCODE
  • Hexamer usage bias

CPAT will then builds a logistic regression model using these 4 features as predictor variables and the “protein-coding status” as the response variable. After evaluating the performance and determining the probability cutoff, the model can be used to predict new RNA sequences.

Unnamed history

Draggable