bigPsl Track Format

The bigPsl format stores alignments between two sequences, as PSL files do, but they are compressed and indexed as bigBeds. bigPsl files are created using the program bedToBigBed with a special AutoSQL file that defines the fields of the bigPsl. The resulting bigPsl files are in an indexed binary format. The main advantage of the bigPsl files is that only portions of the files needed to display a particular region are transferred to UCSC. So for large data sets, bigPsl is considerably faster than regular PSL files. The bigPsl file remains on your web accessible server (http, https, or ftp), not on the UCSC server. Only the portion that is needed for the chromosomal position you are currently viewing is locally cached as a "sparse file".

Big PSL

The following AutoSQL definition is used for bigPsl gene prediction files. This is the bigPsl.as file defined by the -as option when using bedToBigBed.

table bigPsl
"bigPsl pairwise alignment"  
    ( 
    string chrom;       "Reference sequence chromosome or scaffold"
    uint   chromStart;  "Start position in chromosome"
    uint   chromEnd;    "End position in chromosome"
    string name;        "Name or ID of item, ideally both human readable and unique"
    uint score;         "Score (0-1000)"
    char[1] strand;     "+ or - indicates whether the query aligns to the + or - strand on the reference"
    uint thickStart;    "Start of where display should be thick (start codon)"
    uint thickEnd;      "End of where display should be thick (stop codon)"
    uint reserved;       "RGB value (use R,G,B string in input file)"
    int blockCount;     "Number of blocks"
    int[blockCount] blockSizes; "Comma separated list of block sizes"
    int[blockCount] chromStarts; "Start positions relative to chromStart"

    uint    oChromStart;"Start position in other chromosome"
    uint    oChromEnd;  "End position in other chromosome"
    char[1] oStrand;    "+ or -, - means that psl was reversed into BED-compatible coordinates" 
    uint    oChromSize; "Size of other chromosome."
    int[blockCount] oChromStarts; "Start positions relative to oChromStart or from oChromStart+oChromSize depending on strand"
    
    lstring  oSequence;  "Sequence on other chrom (or empty)"
    string   oCDS;       "CDS in NCBI format"
  
    uint    chromSize;"Size of target chromosome"
  
    uint match;        "Number of bases matched."
    uint misMatch; " Number of bases that don't match "
    uint repMatch; " Number of bases that match but are part of repeats "
    uint nCount;   " Number of 'N' bases "
    uint seqType;   " 0=empty, 1=nucleotide, 2=amino_acid"
    ) 

Note that the oStrand field is an indicator of whether or not the stored psl needs to be reverse complemented before output or display. This is due to the need for the bigPsl file to have reference coordinates on the positive strand which is a requirement of the BED format. The strand field indicates whether the positions in oChromStarts are listed from the beginning (+) or the end of the chromosome (-).

Note that the bedToBigBed utility uses a substantial amount of memory; somewhere on the order of 1.25 times more RAM than the uncompressed BED input file.

To create a bigPsl track, follow these steps:

  1. If you already have a PSL file, perhaps from using BLAT or other tools, skip to step 2, otherwise download the example PSL file for the Human GRCh38/hg38 assembly.
    • You may also want to download the bigPsl.fa and bigPsl.cds files if you would like to use the alternate options in step 4.
  2. Download the bedToBigBed and pslToBigPsl programs from the directory of binary utilities.
  3. Use the fetchChromSizes script from the same directory to create a chrom.sizes file for the UCSC database you are working with (e.g., hg38). Alternatively, you can download the chrom.sizes file for any assembly hosted at UCSC from our downloads page (click on "Full data set" for any assembly). For example, for the hg38 database, the hg38.chrom.sizes are located at http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes.
  4. Create a bed12+13 bigPsl format file that has the 25 fields described by a bigPsl file as described here.
    • Your bigPsl file must have the extra thirteen fields described in the AutoSQL file above: oChromStart, oChromEnd, oStrand, oChromSize, oChromStarts, oSequence, oCDS, chromSize, match, misMatch, repMatch, nCount, seqType
    • Use the pslToBigPsl utility to create a correctly formatted bed12+13 file like so:
      pslToBigPsl bigPsl.psl stdout | sort -k1,1 -k2,2n > bigPsl.txt
    • Note that if you have created your own PSL file you may have corresponding FASTA and CDS files that accompany it. You may provide these files as input to pslToBigPsl to generate a more informative final bigPsl file:
      pslToBigPsl bigPsl.psl -cds=bigPsl.cds -fa=bigPsl.fa stdout | sort -k1,1 -k2,2n > bigPsl.txt
  5. Create the binary indexed bigPsl file from your sorted bigPsl input file using the bedToBigBed utility like so:
    bedToBigBed -as=bigPsl.as -type=bed12+13 -tab bigPsl.txt chrom.sizes bigPsl.bb
  6. Move the newly created bigPsl file (bigPsl.bb) to an http, https, or ftp location.
  7. Construct a custom track using a single track line. Note that any of the track attributes listed here are applicable to tracks of type bigBed. The most basic version of the "track" line will look something like this:
    track type=bigPsl name="My Big Psl" description="Some mRNAs Discovered from Data from My Lab" bigDataUrl=http://myorg.edu/mylab/myBigPsl.bb
  8. Paste this custom track line into the text box on the custom track management page.

The bedToBigBed program can also be run with several additional options. Run bedToBigBed with no arguments to view a full list of the available options.

Example #1

In this example, you will use an existing bigPsl file to create a bigPsl custom track. A bigPsl file that contains data on the hg38 assembly has been placed on our http server. You can create a custom track using this bigPsl file by constructing a "track" line that references this file like so:

track type=bigPsl name="bigPsl Example One" description="A bigPsl file" bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigPsl.bb

Paste the above "track" line into the custom track management page for the human assembly hg38, then press the "submit" button.

Custom tracks can also be loaded via one URL line. The below link loads the same bigPsl track, but includes parameters on the URL line:

http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&position=chr1:10000-200000&hgct_customText=track%20type=bigPsl%20name=Example%20bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigPsl.bb%20visibility=pack

With this example bigPsl loaded, click into an item from the track. Note how the details page displays information about the alignment, similar to PSL tracks, as well as links to display the browser position of the alignment and more detailed information about the alignment.

Example #2

In this example, you will create your own bigPsl file from an existing bigPsl input file.

Sharing your data with others

If you would like to share your bigPsl data track with a colleague, learn how to create a URL by looking at Example 11 on this page.

Extracting data from the bigPsl format

Since the bigPsl files are an extension of bigBed files, which are indexed binary files, they can be difficult to extract data from. We have developed the following programs, all of which are available from the directory of binary utilities.

As with all UCSC Genome Browser programs, simply type the program name at the command line with no parameters to see the usage statement.

Troubleshooting

If you encounter an error when you run the bedToBigBed program, it may be because your input bigPsl file has data off the end of a chromosome. In this case, use the bedClip program here before the bedToBigBed program. It will remove the row(s) in your input BED file that are off the end of a chromosome.