TwoBit Sequence Archives

A twoBit file is a highly efficient way to store genomic sequence. The format is defined here. Please note that lower-case nucleotides will be considered masked in the twoBit, which could cause such sequence to be ignored when using the -mask option with gfServer, so one may wish to upper-case sequence when preparing the FASTA format. To complete the steps below you will need to download the faToTwoBit, twoBitInfo, and twoBitToFa utilities. For more information on downloading our command line utilities, please see these instructions.

To create a twoBit file, follow these steps:

  1. Prepare the sequence for your twoBit file in a FASTA formatted file (i.e. genome.fa).
  2. Run the faToTwoBit program on your FASTA file.
    faToTwoBit genome.fa genome.2bit
  3. Use twoBitInfo to verify the sequences in this assembly and create a chrom.sizes file which is useful in later processing to construct the big* files:
    twoBitInfo genome.2bit stdout | sort -k2rn > genome.chrom.sizes

The twoBit commands can function with the .2bit file at a URL:
twoBitInfo -udcDir=. | sort -k2nr > genome.chrom.sizes

Sequence can be extracted from the .2bit file with the twoBitToFa command, for example:
twoBitToFa -seq=chr1 -udcDir=. stdout > genome.chr1.fa