Monthly Archives: August 2021

Sharing Data with Sessions and URLs

In this blog post, I’ll give an overview of ways to share Genome Browser data views with others.

Visualizing and sharing custom data is one of the most useful features of the UCSC Genome Browser tool. An independent review evaluating various genome browsers (http://tinyurl.com/genome-browsers), emphasized “the local and global exports for sharing sessions” is one of the site’s most “attractive functionalities,” with the report concluding that the UCSC Genome Browser “is the best tool of our evaluation from that point of view.”

Many veteran users are not aware of how easy it is to create and share browser views called sessions, especially using the more recent Public Sessions feature. Few users know that there are ways to modify URLs to share custom data, even to build URL links on top of data or sessions created by others. This blog post will give a wide overview of the many ways to share data on the Browser.

  • TIP: You can watch a great introductory video to Saving and Sharing Sessions, which walks users through the steps to build a session and illustrates the new Public Sessions tool: http://bit.ly/sessionVid

SESSIONS AND PUBLIC SESSIONS

To access Sessions, under the top “My Data” menu there is a “My Sessions” option that leads to the page to create a URL snapshot of the view you are looking at in the Genome Browser. Once a user has created an account, on the Sessions Management page, they can then save a snapshot by giving the current view any “sessionName”.  A link, built from the userName and given sessionName, will be created that can be shared with others: https://genome.ucsc.edu/s/userName/sessionName

Once a session is created users have the option to click a “details” button on the Sessions Management page that leads them to an additional screen where they can enter a description. Newly created sessions are shareable by default, but can be made private (thereby requiring an account login to access), or they can be published to the Public Sessions page, where a search such as on the userName (https://genome.ucsc.edu/cgi-bin/hgPublicSessions?search=userName) will bring up all sessions that the author published.

Public Sessions with descriptions are even more discoverable since matches will be returned on words found in the description. Public Sessions can be accessed under the “My Data” menu and a search term can be entered in the box on the right, or a URL can be built to scan for specific search terms as illustrated above for userName. If you search “protein” you will find all the sessions, for instance, that have mentioned protein in their description. Here’s an example: http://genome.ucsc.edu/cgi-bin/hgPublicSessions?search=protein

  • TIP: When sessions are created with custom data uploaded, the uploaded data becomes “immortalized.”  Usually any uploaded custom text-based tracks will be deleted in a few days, but by creating a session any uploaded tracks are marked as belonging to the associated userName account and attempts are made to preserve it.  Please keep a local backup of your sessions contents, however, as the Browser is not a data storage service.

BUILDING URLS TO SET TRACK VISIBILITIES

Sometimes users want to hide all the tracks and only display certain data, and this can be done even without creating sessions. You can control the visibility of tracks from the URL with some of the following parameters:

  • hideTracks=1 – hides all tracks
  • <trackName>=hide|dense|pack|full – sets specified track or subtrack to a chosen visibilites
  • <trackName>.heightPer=<###> – sets a bigWig track’s height to a particular number of pixels (between 20-100)

For example, you can use the following URL to hide every track (hideTracks=1), set the genome database to hg38 (db=hg38), set the mappability track to full visibility (mappability=full), and set the umap track height to 100 pixels (umap24Quantitative.heightPer=100): http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&hideTracks=1&mappability=full&umap24Quantitative.heightPer=100

BUILDING URLS TO CUSTOM TRACKS

Users can also share data with links without first creating a session by adding a “hgct_customText=” parameter to their base URL. For instance, if a group has data for the human hg38 database in a web-accessible location that meets the criteria for loading as a custom track, they can build URL links in this fashion: https://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&hgct_customText=http://location.online/dataFile

That online dataFile can be the track data, or a collection of more URLs to load more custom tracks. For instance, in a recent blog post about building bigBed tracks, https://bit.ly/UCSC_blog_bigBed, there was an example of hosting bigBed data at CyVerse. Since the data only displays in the position range of 1,405,000-1,448,000 on chromosome 5, a URL such as the below will load the hg19 genome (db=hg19) and go to a specific position (position=chr5:1405000-1448000) and then attach the remote file (hgct_customText=): https://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&position=chr5:1405000-1448000&hgct_customText=https://data.cyverse.org/dav-anon/iplant/home/brianlee/Lab_Primers.bigBed

  • TIP: One advantage of not using sessions is that a user’s preexisting preferences for track displays will not be impacted.  For instance, if they have a collection of clinical tracks displaying, using the hgct_customText= parameter or hubUrl= will add the new remote data to a user’s existing preferred clinical track configurations. Sessions, on the  other hand, would disconnect existing remote data and change the position location as well as reconfigure tracks, to match everything saved when the session was created.

BUILDING URLS TO TRACK HUBS

Once a user has taken the step to build binary-indexed files such as bigBeds or bigWigs, they can go a step further and put their collection of tracks into a Track Hub. Track Hubs provide much more power for loading external data in more complex ways, such as enabling search indexes on uniquely named items in the remote data, or coloring tracks or individual elements.

Track Hubs are similar to the idea of having a text file that points to a collection of remotely hosted custom tracks. To make sharing easy, just one URL, called a hubUrl is given to the browser to load the Track Hub, and all the remotely hosted data, which must be in a binary-indexed format is then attached so only the data in the current view is transferred over the Internet. Here is a generic example of  a link that would load hg38 track hub data: https://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38?hubUrl=http://location/hub.txt

Here is a working example that loads onto the hg19 assembly (db=hg19) around a position (position=chr21:33,030,000-33,043,000) an example hub (hubUrl=): https://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&position=chr21:33,030,000-33,043,000&hubUrl=http://genome.ucsc.edu/goldenPath/help/examples/hubDirectory/hub.txt

  • TIP: Once you start using URLs to share data instead of sessions, take caution to have only the first element use the question mark ? and then all other parameters to use the ampersand &, “?parameter1=value&parameter2=value&parameter3=value”.  If you are having trouble, check to be sure that you have not confused the order of & and ? for your values.

BUILDING URLS TO TRACK HUBS ON ASSEMBLY HUBS

The UCSC Genome Browser provides a means to attach Track Hubs that can display novel genomes not hosted within the Browser. These are called assembly hubs.  If a new assembly is being hosted remotely as an assembly hub, additional hub attachments also can be linked on top of that assembly hub, where the db= parameter is swapped with a genome= parameter as defined in the external assembly hub’s genomes.txt file (or genomes stanza when useOneFile is applied –see below).

In this following conceptual link, a genomeName is defined in an external assemblyHub.txt file that provides the Browser the underlying sequence of a declared genomeName. Then another collection of data, called hub.txt,  is attached to that assembly hub, where that hub.txt is using the same genomeName in its genomes.txt file (or genome stanza). In the URL the very first parameter (genome=genomeName) tells the Browser that in one of these hubs there should be a similarly defined genome in order for the Browser to display the correct underlying sequence: https://genome.ucsc.edu/cgi-bin/hgTracks?genome=genomeName&hubUrl=http://location/assemblyHub.txt&hubUrl=http://location/hub.txt

  • TIP: Note that hubUrl= can be used multiple times to attach multiple hubs, but only the genome=genomeName will inform the Browser which genome to display.  The second hub.txt in this example can piggyback entirely on the first assemblyHub.txt to provide all the novel underlying genomeName sequence data.

Just to illustrate how complex the system can get, a further step could also add custom tracks to the Assembly Hub, which has a Track Hub attached simultaneously: https://genome.ucsc.edu/cgi-bin/hgTracks?genome=genomeName&hubUrl=http://location/assemblyHub.txt&hubUrl=http://location/hub.txt&hgct_customText=http://location.online/dataFile

ASSEMBLY HUB EXAMPLES WITH GenArk HUBS

The new GenArk assemblies come with quick links to load hubs from that collection. An example is https://genome.ucsc.edu/h/GCF_001984765.1, which will load the American beaver assembly (GCF_001984765.1). This short link is the equivalent of loading the hubUrl=https://hgdownload.soe.ucsc.edu/hubs/GCF/001/984/765/GCF_001984765.1/hub.txt and setting the genome=GCF_001984765.1 to the URL and pointing to the hgTracks CGI (the main Browser display).  By condensing it all to this new short link format, we’ve attempted to make loading GenArk hubs easier.

  • TIP: Once you start using URLs to define the Browser view, you will likely wish to reset the view occasionally. You can do this by going to the “Reset All User Settings” under the top “Genome Browser” menu. Another option is to directly point the browser to the cartReset CGI: https://genome.ucsc.edu/cgi-bin/cartReset

These https://genome.ucsc.edu/h/GCF_### short links to GenArk assembly hubs can have additional parameters added to them, such as the following link that loads a custom track onto the GCF_001984765.1 assembly hub.  The remote custom track in this example is a single bigBed hosted at CyVerse, where the URL is  simultaneously setting the position to NW_017869957v1:1,437,578-1,648,889: https://genome.ucsc.edu/h/GCF_001984765.1?position=NW_017869957v1:1,437,578-1,648,889&hgct_customText=https://data.cyverse.org/dav-anon/iplant/home/brianlee/examples/GCF_001984765.1_C.can_genome_v1.0.cpgIslandExt.bb

A Track Hub can be attached to the Assembly Hub as seen in this version where the GCF_001984765.1assembly hub is redirected from the default position to NW_017869957v1:1,285,000-1,793,000 and the hubUrl= defines a CyVerse hosted hub.txt: https://genome.ucsc.edu/h/GCF_001984765.1?position=NW_017869957v1:1,285,000-1,793,000&hubUrl=https://data.cyverse.org/dav-anon/iplant/home/brianlee/examples/hub.txt

  • TIP: Take a moment to look at this example hub.txt (https://data.cyverse.org/dav-anon/iplant/home/brianlee/examples/hub.txt). Note that it only has “genome GCF_001984765.1” for the genomes stanza (since it is using useOneFile on and is also expecting to find a GenArk hub).  It relies entirely on the GenArk assembly hub for the underlying assembly information.

Track Hubs loaded on Assembly Hubs are not limited to GenArk hubs. The GenArk hubs have special privileges because they have short links. If you try to attach any hub with something like “genome GCF_###” the Genome Browser will make an effort to find a match in the existing GenArk collection, and attach it automatically.

To illustrate how other assembly hubs outside of GenArk would work to have hubs attached, here is the longer version of the above link.  In this case, the first hubUrl= is used to call out the location of this assembly hub, then the second hubUrl= is used again to load the second hub, and finally also hgct_customText comes into use to load a custom track

https://genome.ucsc.edu/cgi-bin/hgTracks?hubUrl=https://hgdownload.soe.ucsc.edu/hubs/GCF/001/984/765/GCF_001984765.1/hub.txt&genome=GCF_001984765.1&position=NW_017869957v1:1,285,000-1,793,000&hubUrl=https://data.cyverse.org/dav-anon/iplant/home/brianlee/examples/hub.txt&hgct_customText=https://data.cyverse.org/dav-anon/iplant/home/brianlee/examples/GCF_001984765.1_C.can_genome_v1.0.cpgIslandExt.bb

The point of these rather tortuous examples is that multiple groups can own the sources of the data. Everything after the base URL, https://genome.ucsc.edu/cgi-bin/hgTracks, can point to other places on the Internet with either the hubUrl= or hgct_customText= parameters. This means lab_X might have the assembly data, and lab_Y can generate a hub to view on that assembly, and lab_Z can further attach to those external groups even more custom data.  And all this sharing and interoperability can happen without ever creating session links.

BUILDING URLS ATTACHING TRACK HUBS AND CUSTOM TRACKS TO SESSIONS

Using sessions is powerful since it lets you customize your view of the Genome Browser. Users can create a session (or borrow another from the Public Session page) and use that session’s userName and sessionName to attach their own custom data.

  • Here is a model link for attaching custom tracks: https://genome.ucsc.edu/s/userName/sessionName?hgct_customText=http://location.online/dataFile
  • Here is a model link for attaching track hubs: https://genome.ucsc.edu/s/userName/sessionName?hubUrl=http://location.online/hub.txt

This can have the advantage of creating shorter links or also preconfiguring the browser to a certain position or display.  We recently added the ability to customize the font on the Browser so a session can even be used just as a different way of viewing the same data stylistically, for instance making the display easier for you to read.

Here are some real-world examples borrowing from real Public Sessions.  To load on a Public Session, go to the “My Data” menu, then choose “Public Sessions”, and then you can click on the image of any session to load it. You can build your own URL from an existing Public Session by noting the Author field (equivalent to the session’s source userName) and the Session Name field, like so: https://genome.ucsc.edu/s/userName/sessionName

  • TIP: Session names will URL encode whitespace or other special characters, where any spaces in the name would become %20 (My%20session%20name), this is one reason using underscores (or camelCase) instead of spaces in your sessionNames makes for cleaner links.

Here’s a session on hg19 that will load and also attach the earlier CyVerse custom track: https://genome.ucsc.edu/s/brianlee/AvantG_Font?position=chr5:1405000-1448000&hgct_customText=https://data.cyverse.org/dav-anon/iplant/home/brianlee/Lab_Primers.bigBed

Here’s one that will load a few hubs on a session that points to hg38 and also opens the display to  the SIRT1 gene using the &singleSearch=knownCanonical&position=SIRT1 parameters: https://genome.ucsc.edu/s/brianlee/Times_Font?hubUrl=http://fantom.gsc.riken.jp/5/datahub/hub.txt&hubUrl=http://expdata.cmmt.ubc.ca/JASPAR/UCSC_tracks/hub.txt&hubUrl=http://remap.univ-amu.fr/storage/public/hubReMap2020UCSC/hub.txt&singleSearch=knownCanonical&position=SIRT1

Again, these complex links are to illustrate that there are multiple ways to view multiple groups of data across the world in the Genome Browser. You can get to the data either through clicks and searches on the website or by building Sessions or Public Sessions and URL links to remotely hosted data.  This blog post could not cover every topic but gives a good introduction to the ways to share data with sessions or complex URLs. To learn more about links, see these documentation pages:

  • TIP: If you love modifying URLs, click on the “example links” in the second #optParams section above to see how you can even add parameters like highlight= to define multiple colored vertical highlights.

Links to guides for Sessions, Track Hubs, Custom Tracks, and videos can be found on our training page:

If after reading this blog post you have any questions, please email genome@soe.ucsc.edu. All messages sent to that address are archived on a publicly accessible forum. If your question includes sensitive data, you may send it instead to genome-www@soe.ucsc.edu.

How to make a bigBed file – Part 1

In this blog post, I’ll share the experience a user could be having where they have an existing text-based custom track that could be made into a more shareable bigBed version.

Let’s say the original track is in the bedDetail format that allows for BED12+ columns using tabs to define additional columns. This original track can  be made into a bigBed track to be put in a Track Hub or to be hosted alone and shared across multiple sessions, where the bigBed could act as a universal custom track.  If it were updated at the bigBed hosted location, all the related sessions that referenced the new bigBed remotely-hosted location of the data would have their representations of the data updated as well.

Let’s begin with the idea that Jerry’s Lab would like to host a primers track and share it between sessions for their lab group. The lab has already created a primers custom track in text files that can be updated and uploaded successfully.

The below steps will take Jerry’s lab from this uploading approach, to putting the data in a  shared online location and using a binary-indexed format of the custom track called bigBed. The bigBed is hosted at an online location defined by a bigDataUrl which allows Jerry’s entire lab to see the updated data as new primers are added.  This way each lab member in Jerry’s lab can use their early sessions, but get new data in their views, provided the bigBed is updated with the new information at the URL shared between all the sessions.

For this example, imagine Jerry’s lab is already using a tab-separated bedDetail custom track text file that might look like this:

track name=Primers type=bedDetail description=Primers visibility=2 color=221,55,118
browser position chr5:1405000-1448000
chr5    1413367    1413387    hDAT32061R    0    .    1413367    1413387    221,55,118    1    20    0    catggagtgggccctttcag
chr5    1414322    1414343    hDAT31086F    0    .    1414322    1414343    221,55,118    1    21    0    cctcaagcccaaatgcagctg
...

This track with type=bedDetail can upload  a text file to display BED12 items (http://genome.ucsc.edu/FAQ/FAQformat.html#format1) with an additional 13th column with sequence (making it a bedDetail format: http://genome.ucsc.edu/FAQ/FAQformat.html#format1.7). With bedDetail a user has either the first 4 or 12 columns of data in BED format, and can extend the format with additional fields, such as sequence data here, to enhance the track details pages.

By going to the My Data and Custom Tracks page, the above text can be pasted and will work (provided there are tabs between the columns, some cut and paste interfaces will remove tabs).

When this custom track is added to a session as a text file, it is uploaded one time and does not update further unless there is a new upload. If Jerry’s lab wanted to update the Primers tracks in their sessions, a future upload of the text-based track would be required in each individual session. Once created, the original sessions that have uploaded text data are static. To solve this issue for Jerry’s lab, an option is to make a URL-hosted location of the data and  turn the data into a binary-indexed bigBed format.  In this way the new URL-hosted bigBed could act as a universal custom track across many sessions.

Here are the steps to do that.

1. The first would be to edit the file and remove the top track and browser lines, they will be used again at a later step after the bigBed is created.

chr5    1413367    1413387    hDAT32061R    0    .    1413367    1413387    221,55,118    1    20    0    catggagtgggccctttcag
chr5    1414322    1414343    hDAT31086F    0    .    1414322    1414343    221,55,118    1    21    0    cctcaagcccaaatgcagctg
...

This link is an example of that file for those that want to follow along with the next steps.

curl -O https://data.cyverse.org/dav-anon/iplant/home/brianlee/Lab_Primers.txt

2. Next, in a command-line environment, you can use the UNIX sort command to sort the data in your file and call the file Lab_Primers.txt

sort -k1,1 -k2,2n Lab_Primers.txt > Lab_Primers_sorted.txt

The command creates a new file Lab_Primers_sorted.txt where all the entries are ordered correctly.

3. Next we will acquire the bedToBigBed utility assuming you are using a MacBook

curl -O http://hgdownload.soe.ucsc.edu/admin/exe/macOSX.x86_64/bedToBigBed.

4. Then we will make the bedToBigBed utility executable:

chmod 700 bedToBigBed

5. With this utility we will need a definitions file to explain what each column means. We will get an example that will work with these 13 columns, but we could edit this file or make our own.

curl -O https://genome-source.gi.ucsc.edu/gitlist/kent.git/raw/master/src/hg/lib/bed12Source.as

6. With the bedToBigBed utility and the bed12Source.as file, we can now use the tool to build from the Lab_Primers_sorted.txt file a new Lab_Primers.bigBed file for the hg19 genome, using a URL to find the chromosome sizes for the hg19 assembly.

./bedToBigBed -type=bed12+ -as=bed12Source.as -tab Lab_Primers_sorted.txt http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/hg19.chrom.sizes Lab_Primers.bigBed

With the following three optional steps, we can get another tool called bigBedToBed to check the extraction of data from the file:

curl -O http://hgdownload.soe.ucsc.edu/admin/exe/macOSX.x86_64/bigBedToBed
chmod 700 bigBedToBed
./bigBedToBed -chrom=chr5 -start=1419444 -end=1445682 Lab_Primers.bigBed stdout

7. Now we need to host this data somewhere online so that it can be found by the Browser. One option is CyVerse, you can read more about them at this location: http://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html#Hosting

8. Once you have an online location to the bigBed (for example: https://data.cyverse.org/dav-anon/iplant/home/brianlee/Lab_Primers.bigBed) you can add it to your sessions. Go to the custom track page and put in a track like the following, where you can use your track and browser lines again, but change type=bedDetail to type=bigBed and use a bigDataUrl:

browser position chr5:1405000-1448000
track name=Primers type=bigBed description=Primers visibility=2 color=221,55,118 bigDataUrl=https://data.cyverse.org/dav-anon/iplant/home/brianlee/Lab_Primers.bigBed

9. Save a session with this bigBed as a custom track. Example: https://www.genome.ucsc.edu/s/brianlee/Primers

10. Now anytime  the file has updates, the session that references this bigDataUrl location of the bigBed data should also update. If  CyVerse is used to host the bigBed data file online, this may require deleting and replacing your file to force a browser to reload (Control-Shift-R) the file to trigger caching to expire. Contact CyVerse directly for help.

Finding your own institution to host  your data is often the best solution as you can then work with your system administrators to have the best experience.

Once you have the bigBed, it is not much more work to take it to the next step and put it inside a Track Hub. Once in a Track Hub, many additional features are possible, such as using a searchIndex feature that allows finding unique named items within your custom track on the search bar or ultimately creating a Public Hub to share your data with the wider community.


If after reading this blog post you have any public questions, please email genome@soe.ucsc.edu. All messages sent to that address are archived on a publicly accessible forum. If your question includes sensitive data, you may send it instead to genome-www@soe.ucsc.edu.