Author Archives: Cath Tyner

The UCSC Genome Browser Coordinate Counting Systems

If you think dogs can’t count, try putting three dog biscuits in your pocket and then giving Fido only two of them.  

~Phil Pastoret

“Counting is easy. Right?”

I say this with my hand out, my thumb and 4 fingers spread out. With my other hand’s pointer finger, I simply count each digit, “one, two, three, four, five.” Easy.

But what happens when you start counting at 0 instead of 1? You can see that you have 5 digits (4 fingers and a thumb), but how do you calculate the size of your range?

With your hand in mind as an example, let’s look at counting conventions as they relate to bioinformatics and the UCSC Genome Browser genomic coordinate systems.

The UCSC Genome Browser uses two different systems:

“1-start, fully-closed” = coordinates positioned within the web-based UCSC Genome Browser. “0-start, half-open” = coordinates stored in database tables.
Table 1. UCSC Genome Browser coordinate systems summary
0-start, half-open (0-based) 1-start, fully-closed (1-based)
“BED” format (Browser Extensible Data):
chr1 127140000 127140001
Note: Spaces, not punctuation
When using BED format, browser & utilities
assume coords are 0-start, half-open.
“Position” format:
Note: Punctuation used, no spaces
When using “position” format, browser & utilities
assume coords are 1-start, fully-closed.
Stored in UCSC Genome Browser tables Positioned in UCSC Genome Browser web interface
To convert to 1-start, fully-closed:
add 1 to start, end = same
To convert to 0-start, half-open:
subtract 1 from start, end = same

Section 1: Interval types

0-start vs. 1-start : Does counting start at 0 or 1?
Sometimes referred to as “0-based” vs “1-based” or 
“0-relative vs “1-relative.”

Interval Types
For a counted range, is the specified interval fully-open, fully-closed, or a hybrid-interval (e.g., half-open)?

Ok, time to flashback to math class!
You might recall that specifying an interval type as open, closed (or a combination, e.g., “half-open”) refers to whether or not the endpoints of the interval are included in the set. For further explanation, see the
interval math terminology wiki article. Figure 1 below describes various interval types.


Figure 1. (To enlarge, click image.) Description of interval types.

Section 2: Interval types in the UCSC Genome Browser

UCSC Genome Browser web interface = “1-start, fully-closed”

A common counting convention is a system that we all used when we first learned to count the fingers on our hands; this is referred to as the “one-based, fully-closed” system (Figure 2, below). Note that an extra step is needed to calculate the range total (5).

The “1-start, fully-closed” system is what you SEE when using the UCSC Genome Browser web interface. However, all positional data that are stored in database tables use a different system.


Figure 2. (To enlarge, click image.) 1-start, fully-closed interval. Most common counting convention. Used within the UCSC Genome Browser web interface (but not used in UCSC Genome Browser databases/tables). We calculate that we have 5 digits because 5 (pinky finger, range end) – 1 (the thumb, range start) = 4. We then need to add one to calculate the correct range; 4+1= 5.

UCSC Genome Browser tables = “0-start, half-open”

While the commonly-used “one-start, fully-closed” system is more intuitive, it is not always the most efficient method for performing calculations in bioinformatic systems, because an additional step is required to calculate the size of the base-pair (bp) range.

To increase efficiency, the UCSC Genome Browser uses a “hybrid-interval” coordinate system for storing coordinates in databases/tables that is referred to as “0-start, half-open” (see Figure 3, below).

Although coordinates in the web browser are converted to the more human-readable “1-start, fully-closed” system, coordinates are stored in database tables as “0-start, half-open.” You may have heard various terms to express this 0-start system:

Synonyms for “0-start, half-open”

  • 0-based, half-open
  • 0-based start, 1-based end
    • Note: This is not technically accurate, but conceptually helpful. A “1-based end” refers to the end of the range being included, as in the common “1-based, fully-closed” system.
  • 0-start, hybrid-interval (interval type is: start-included, end-excluded)


Figure 3. (To enlarge, click image.) The UCSC Genome Browser coordinate system for databases/tables (not the web interface) is “0-start, half-open” where start is included (closed-interval), and stop is excluded (open-interval). We calculate that we have 5 digits because 5 (range end after pinky finger) – 0 (the thumb, range start)  = 5.

Another example which compares 0-start and 1-start systems is seen below, in Figure 4. This figure describes the differences in defining and calculating the range for a specified sequence highlighted in yellow, “T, C, G, A.”


Figure 4. (To enlarge, click image.)  Calculation of genomic range for comparing “1-start, fully-closed” vs. “0-start, half-open” counting systems.

Section 3: Formatting

Coordinate formatting indicates interval type

The UCSC Genome Browser and many of its related command-line utilities distinguish two types of formatted coordinates and make assumptions of each type.

The “Position” format (referring to the “1-start, fully-closed” system as coordinates are “positioned” in the browser)

  • Written as: chr1:127140001-127140001
  • No spaces.
  • Includes punctuation: a colon after the chromosome, and a dash between the start and end coordinates.
  • When in this format, the assumption is that the coordinate is 1-start, fully-closed.

The “BED” format (referring to the “0-start, half-open” system)

  • Written as: chr1 127140000 127140001
  • Spaces between chromosome, start coordinate, and end coordinate.
  • No punctuation.
  • When in this format, the assumption is that the coordinates are 0-start, half-open.

Section 4: Examples

SNP example

What we SEE in the Genome Browser interface itself is the “1-start, fully-closed” system. However, these data are not STORED in the UCSC Genome Browser databases and tables in the same way. The UCSC Genome Browser databases store coordinates in the “0-start, half-open” coordinate system.

Table 2. SNP coordinates in web browser (1-start) vs table (0-start)
rs782519173 (hg38) Start End
Positioned in web browser: 1-start, fully-closed  133255708  133255708
Stored in table: 0-start, half-open  133255707  133255708

LiftOver examples and coordinate formatting

Let’s take a look at the two types of coordinate formatting (“BED” and “position”) when using the UCSC Genome Browser web-based and command-line utility liftOver tools.

1) Web-based LiftOver example

Below is an example from the UCSC Genome Browser’s web-based LiftOver tool (Home > Tools > LiftOver). Depending on how input coordinates are formatted, web-based LiftOver will assume the associated coordinate system and output the results in the same format.

Table 3. UCSC Genome Browser web-based LiftOver and “position” coordinate formatting
Input: Assembly = panTro3
Output: Lifts to this position in hg19:
Notes: If your input is entered with the “position” formatted coords (1-start, fully-closed),
the browser will also output the same “position” format. (Note positional format
includes “:” and “-” and no spaces.)
Table 4. UCSC Genome Browser web-based LiftOver and “BED” coordinate formatting
Input: Assembly = panTro3
chr1 127140000 127140001
Output: Lifts to this position in hg19:
chr1 110255312 110255313
Notes: If your input is entered with the “BED” formatted coords (0-start, half-open), the
browser will also output the same “BED” format. (Note BED format contains no
punctuation and includes spaces.)
 * Note that the web-based output file extension is misleading in this case; while titled “*.bed” the positional output is not actually in “0-start, half-open” BED format, because the 1-start, fully-closed “positional” format was used for input. 

 2) Command-line liftOver utility example

When using the command-line utility of liftOver, understanding coordinate formatting is also important. Just like the web-based tool, coordinate formatting specifies either the “0-start half-open” or the “1-start fully-closed” convention. For example, if you have a list of 1-start “position” formatted coordinates, and you want to use the command-line liftOver utility, you will need to specify in your command that you are using “position” formatted coordinates to the liftOver utility.

To view the liftOver utility usage statement and options, enter “liftOver” on your command-line (with no other arguments, and without the quotes).

Table 5. UCSC Genome Browser command-line liftOver and “position” coordinate formatting
Command: liftOver -positions panTro3.txt liftOver/panTro3ToHg19.over.chain.gz mapped unMapped
Output: chr1:110255313110255313
via “mapped” file for hg19
Notes: Note: Must specify “-positions” for 1-start “position” format in command-line liftOver
Table 6. UCSC Genome Browser command-line liftOver and “BED” coordinate formatting
chr1 127140000 127140001
Command: liftOver panTro3.bed liftOver/panTro3ToHg19.over.chain.gz mapped unMapped
Output: chr1 110255312 110255313
via “mapped” file for hg19
Notes: Note: No special argument needed, 0-start “BED” formatted coordinates are default. 

Wiggle Files

The wiggle (WIG) format is used for dense, continuous data where graphing is represented in the browser. Wiggle files of variableStep or fixedStep data use “1-start, fully-closed” coordinates. Like all other UCSC Genome Browser data, these coordinates are positioned in the browser as “1-start, fully-closed.”

Note: Many other formats outside of the UCSC Genome Browser use 1-start coordinate systems, such as GTF/GFF.

Table 7. UCSC Genome Browser wiggle files & coordinate systems
File Type Wiggle file Coordinate system as positioned
in UCSC Genome Browser
bedGraph -> bigWig 0-start, half-open 1-start, fully-closed
wiggle variableStep -> bigWig 1-start, fully-closed 1-start, fully-closed
wiggle fixedStep -> bigWig 1-start, fully-closed 1-start, fully-closed

 Section 5: Resources

The new Genome Browser gateway

New UCSC Genome Browser gateway page design

New UCSC Genome Browser gateway page design

The opinions expressed here are those of the author, Cath Tyner, and do not necessarily reflect those of the University of California Santa Cruz or any of its units.

Maybe it’s just me, but I can clearly remember the excitement of getting brand new sparkly shoes as a young kid.  Half the excitement was picking out the shoes – my siblings and I would try on every potential new-shoe option. There were high standards, of course; rigorous criteria that had to be thoroughly discussed and tested. Could they make you jump SUPER high? Low tops or high tops? Classic laces or cutting edge velcro? After finally picking out the perfect pair and racing to put them on with maniacal laughter, the reality set in. New shoes. Ahhhh. The excitement of showing off my new kicks at school was only one night away.

That’s a little bit how we all felt when we unveiled the brand new sparkly Genome Browser gateway page  earlier this week. This was a project that had been “in the works” for quite a long time, starting from ideas and drawings, moving into design phases, and finally maturing into many iterations of testable versions as the development process gained its own momentum. This project soon had a life of its own – we all became shepherds as we guided it into what we finally knew was a final product.

The things we are most excited about? We’ve already received feedback that the new human-centric phylogenetically ordered tree menu is downright awesome (and we think so too).  For me, the graphics and colors pull me in, inviting me to visually scroll through our entire genome species collection. With a flick of the scroll handle on the tree menu, I can zip from “us humans” all the way down to sea hare or Ebola virus; within two seconds, I’ve just traveled through millions and millions of evolutionary years. Based on NCBI’s taxonomy database, the “tree menu” provides an interactive way to explore our genome species collection. Little known fact: Try hovering over one of the “branches” of the tree (the horizontal and vertical lines connecting all species) and see what you find!

Example of mouse hover on tree menu branch

Example of mouse hover on tree menu branch

Another exciting new feature that makes our eyes light up is the autocomplete search function and “popular species” button shortcuts:

Button shortcuts & autocomplete search

Button shortcuts & autocomplete search

We know that over 95% of you will benefit from our “popular species” buttons as quick access shortcuts to the genomes that you use most. We also believe that just about everyone will benefit from the autocomplete search function. For example, you can enter “fish” to see genomes from our aquatic friends, or you can enter something as specific as “hg38” to load a particular assembly version. With a whopping 276 genomes and counting, autocomplete search is a celebrated new feature! The same autocomplete function works great for our public genome hubs; try typing “plant” to see related hubs.

Want to jump to your favorite gene in the genome browser? The “position/search term” functionality remains just as efficient – just enter a genomic position, gene symbol, or search term, lean back in your comfy chair, and press “GO.” You’re there.

To see more details, including a few menu option changes, visit the gateway announcement on our news page and watch the short gateway video tour.

We sincerely hope you enjoy the new gateway page as much as we do – and as always, we invite you to contact us with questions, concerns, and compliments. 😉