There is a very high volume of traffic coming from your site (IP address 54.144.55.253) as of Fri Dec 3 12:44:56 2021 (California time). So that other users get a fair share of our bandwidth, we are putting in a delay of 10.0 seconds before we service your request. This delay will slowly decrease over a half hour as activity returns to normal. This high volume of traffic is likely due to program-driven rather than interactive access, or the submission of queries on a large number of sequences. If you are making large batch queries, please write to our genome@soe.ucsc.edu public mailing list and inquire about more efficient ways to access our data. If you are sharing an IP address with someone who is submitting large batch queries, we apologize for the inconvenience. To use the genome browser functionality from a Unix command line, please read <a href='http://genome.ucsc.edu/FAQ/FAQdownloads.html#download36'>our FAQ</a> on this topic. For further help o[...] Smoothed PhyloCSF++ Tracks
Smoothed PhyloCSF++ Tracks
 
Smoothed PhyloCSF++ tracks

Display mode:   

 All
PhyloCSF++ +1  Smoothed PhyloCSF++ Strand + Frame 1  
PhyloCSF++ +2  Smoothed PhyloCSF++ Strand + Frame 2  
PhyloCSF++ +3  Smoothed PhyloCSF++ Strand + Frame 3  
PhyloCSF++ -1  Smoothed PhyloCSF++ Strand - Frame 1  
PhyloCSF++ -2  Smoothed PhyloCSF++ Strand - Frame 2  
PhyloCSF++ -3  Smoothed PhyloCSF++ Strand - Frame 3  

PhyloCSF++ tracks

PhyloCSF++ tracks

Introduction

PhyloCSF++ scores the coding potential of genomic regions from a whole-genome multiple sequence alignment (MSA). The scores were computed with PhyloCSF++ [1], a fast and easy-to-use implementation of the method PhyloCSF [2, 3]. A more detailed description of the underlying method is available here.

Description

PhyloCSF++ raw tracks

The raw tracks (one for each of the six frames) score each codon. Green tracks represent the frames on the positive strand, red tracks frames on the negative strand. If a score is negative, it indicates that this codon is non-coding, and coding if the score is positive. The scores are unbounded and do not take the other codons in the region into account. Hence, we recommend in general to use the smoothened tracks (named "PhyloCSF++ +1", etc.).

PhyloCSF++ (smoothened) tracks

The scores in smoothened tracks are posterior probabilities, based on the raw tracks (smoothened with an HMM). They are normalized and are in an interval between [-15,+15]. Positive scores indicate codons in coding regions, negative scores indicate codons in non-coding regions.

PhyloCSF++ power

The power track gives a confidence on the PhyloCSF scores, the branch length sum. For each position in the genome it has a confidence score between [0,1] and corresponds to how many species were aligned at that position in the MSA (taking the phylogenetic distances of these species into account). In other words, if only very few and closely related species were aligned at a position, it has a lower confidence score.

Overview of tracks

The tracks can be downloaded here.

Species Assembly Model MSA Last updated Species subset (intersection of model and MSA)
Rat (Rattus norvegicus) rn6 100vertebrates 20way-multiz 2021-03-07 rn6, mm10, ailMel1, ornAna2, galGal5, melGal5, xenTro7, danRer10, micOch1, hg38, panTro5, rheMac8, cavPor3, felCat8, bosTau8, oryCun2, canFam3, monDom5
Fugu (Takifugu rubripes) fr3 / fugu5 100vertebrates 8way-multiz 2021-03-07 fr3, tetNig2, oreNil1, oryLat2, danRer7, gasAcu1, latCha1, gadMor1
Stickleback (Gasterosteus aculeatus) gasAcu1 100vertebrates 8way-multiz 2021-03-07 gasAcu1, danRer4, fr2, oryLat1, tetNig1, galGal3, mm8, hg18
Tarsier (Tarsius syrichta) tarSyr2 29mammals 20way-multiz 2021-03-07 tarSyr2, micMur1, tupBel1, otoGar3, hg38, panTro4, rheMac3, mm10, canFam3
Yeast (Saccharomyces cerevisiae) sacCer3 7yeast 7way-multiz 2021-03-07 sacCer3, sacPar, sacMik, sacKud, sacBay, sacCas, sacKlu

PhyloCSF++ vs. PhyloCSF

You might wonder what the difference is between these tools. Technically speaking they will give you the exact same scores (except very minor differences in the smoothened scores due to randomization in the initialization of the HMM).

PhyloCSF++ was developed to make tracks available for more species. Unfortunately, the original implementation of PhyloCSF does not allow to create tracks without doing additional coding. Furthermore PhyloCSF++ is faster, supports multi-threading and is available as static binaries, on bioconda and as C++ code (making it hopefully easier to compile and run for users). It also comes with additional tools so you can use these tracks to annotate the transcripts in your GFF/GTF files with PhyloCSF and confidence scores.

PhyloCSF++ was developed by a different group. Its underlying method is the only connection to PhyloCSF.

Citation

If you use the tracks or the software in your work, please consider citing the PhyloCSF++ paper [1]. For citing the original method, see [2].

References

  1. Pockrandt C et al. PhyloCSF++: A fast and user-friendly implementation of PhyloCSF with annotation tools. bioRxiv, 2021.
  2. Lin MF at al. PhyloCSF: a comparative genomics method to distinguish protein-coding and non-coding regions. Bioinformatics, 2011.
  3. Mudge JM et al. Discovery of high-confidence human protein-coding genes and exons by whole-genome PhyloCSF helps elucidate 118 GWAS loci. Genome Research, 2019.