Author Archives: Matthew Speir

How portable is your track hub? Use hubCheck to find out!

Track and assembly hubs are collections of data that are hosted on your servers and can be displayed using the UCSC Genome Browser and other genome browsers supporting the UCSC track hub format. Track hubs allow for the visualization of data on assemblies that we already host (such as the human or mouse genomes), while assembly hubs can be used to create genome browsers for any genome assembly of your choosing.

Hubs depend on a number of different plain text configuration files. The most important are the trackDb.txt files for each assembly in your hub. These files contain the track configuration settings, also known as “trackDb settings”, that control how the track displays in the Genome Browser as well as the display of the item detail pages. You can see the trackDb settings available for hubs on the Hub Track Database Definition page.

As the track hub format has grown in popularity, other genome browsers, including Ensembl, Biodalliance, and the WashU Epigenome Browser, have implemented support for the UCSC track hub format. The Ensembl genome browser currently boasts fairly comprehensive support of the UCSC track hub format. In addition to supporting track hubs on their site, the Ensembl team has also created a Track Hub Registry that pulls hubs listed on our Public Hubs page into a centralized database alongside those hubs submitted to their registry. In an attempt to make the adoption of our track hub format easier, we talked to the other genome browsers about what settings were core to a track hub being, well, a track hub. We sort the list of hundreds of settings into various support levels, which include:

  • Required – needed to display a hub across the different browsers.
  • Base – non-required settings that are likely to be supported by other genome browsers
  • Full – all other trackDb settings fully supported in the UCSC Genome Browser
  • New – settings introduced since the last versioned release, may change between now and the next versioned release
  • Deprecated – settings that may currently work, but could cease to work in the future as they are not being actively developed

We periodically increment the trackDb version number as major updates and changes are made to the settings. The latest change — version 2 — included settings related to the release of several “big*” file types, such as bigGenePred, bigPsl, bigChain, bigMaf, and CRAM. It also included moving several settings (html, priority, colorByStrand, autoScale, spectrum) from the “full” level to the “base” level to indicate that they are supported at other genome browsers (at this time primarily Ensembl).

During the initial versioning process, we improved the “hubCheck” utility to check the support of the trackDb settings used in your hub against our master list of trackDb settings, by version and support level. The hubCheck tool can be utilised in a variety of different ways; principally it checks if your hub works, but it can also list your hub’s settings and their support levels (required, deprecated, base, full and new) as well as check the support of your settings against any genome browser.  For example, to test compatibility with Ensembl (which supports the ‘base’ level of hub settings), use the command:

hubCheck -checkSettings -level=base

You can see more examples of how you might use hubCheck to check the compatibility of your hub with other genome browsers in our help documentation. To acquire hubCheck, you can click Downloads from the top blue menu bar and then select Utilities and navigate to the utilities directory.

If you have questions about creating or validating your track and assembly hubs, please feel free to contact us!

For more information on hubs in the UCSC Genome Browser, please see the following pages:

For more information on hubs in other genome browsers, see their help pages here:

Questions about other genome browsers support for hubs should be directed to their mailing lists.

The new NCBI RefSeq tracks and You

The release of the new NCBI RefSeq track marks a major shift in how we include annotations from NCBI’s Reference Sequence Database (RefSeq) in the UCSC Genome Browser. This new track is a composite track that contains the combined set of curated and predicted annotations from the RefSeq database for hg38/GRCh38. It also contains tracks that break up the annotation set into a few subsets. These subsets include only the curated transcripts (NM, NR, or YP transcripts), only the predicted transcripts (XM or XR transcripts), all of the other annotations from RefSeq that don’t fit into the curated or predicted subsets, and the alignments of the curated and predicted transcripts to the genome. All of the coordinates and alignments in these tracks are provided by the RefSeq group.

This new NCBI RefSeq composite also includes a “UCSC RefSeq” track that is based on our original method of producing the “RefSeq Genes” track. This “UCSC RefSeq” track is built by aligning RNAs obtained from the RefSeq Database to the genome. In the early days of the UCSC Genome Browser, only RNA sequences were provided by RefSeq, so we used BLAT to align them to the genome. This was a good solution in the past, but over time this method has led to some issues with transcripts matching to multiple places and our alignments of small exons or other regions differing slightly from those found in the RefSeq database. This type of minor alignment difference can be seen in the following session, where you can see that the RefSeq Curated (top) and UCSC RefSeq (bottom) tracks place the small fifth exon in transcript NM_001130970 at different locations due to the fact that there are multiple matches to this exon sequence in that region.

The new set of RefSeq tracks differs from the “UCSC RefSeq” track in a few key ways. First, as mentioned previously, the new tracks are based entirely on positions and alignments provided by RefSeq. Second, this track is currently only available for the hg38/GRCh38 assembly. This means that if you obtain the hg38 coordinates for a RefSeq transcript from the UCSC Genome Browser, these coordinates should be the same as those from the entry found at NCBI’s RefSeq Database. Lastly, these new NCBI RefSeq tracks include predicted transcripts, which were absent from our original RefSeq track.

This has been a long and exciting collaboration between the UCSC Genome Browser staff and NCBI’s RefSeq group. We trust that this full complement of tracks from the Reference Sequence Database will be helpful to you, our Browser users. We hope to bring these tracks to more genome assemblies in the future.