Bertone Yale TAR Track Settings
Yale Transcriptionally Active Regions (TARs) (Bertone data)   (All Expression tracks)

Display mode:   

Alignment Gap/Insertion Display Options Help on display options
Draw double horizontal lines when both genome and query have an insertion
Draw a vertical purple line for an insertion at the beginning or end of the
query, orange for insertion in the middle of the query
View table schema
Data last updated at UCSC: 2006-10-13


This track shows the locations of transcriptionally active regions (TARs)/transcribed fragments (transfrags) hybridized to an oligonucleotide microarray with a design based on human assembly hg13 (NCBI Build 31) (Bertone et al., 2004).


Microarrays were designed using sequence from the human hg13 assembly. The genome sequence was screened for repetitive elements and low-complexity DNA using RepeatMasker in the sensitive mode. Additional low-complexity filtering was performed using the NSEG (segment sequence(s) by local complexity) program using a minimum segment length of 21 nucleotides to determine low complexity segments of lowest probability. After filtering, 1.5 Gb of nonrepetitive DNA remained and microarray probes were chosen using the NASA Oligonucleotide Probe Selection Algorithm (NOPSA).

NOPSA is designed to find the optimal probes for hybridization. A database of the frequency of every 18-mer in the genome is created using a hash algorithm. Chaining was used to resolve collisions. Average frequencies of 36-mers in the genome were determined from the frequencies of each 18-mer subsequence in the 36-mer and its reverse complement. 36-mer oligonucleotides with a frequency equal to one are selected as potential probes for the microarray (from supporting online material for Stolc et al., 2004)

This resulted in probe selection based on several criteria:

  • Every 36-mer in the genome is unique.
  • Sequences that could form a loop with a stem of > 7 bp were excluded.
  • Factors such as sequence length, extent of complementarity and base composition were also considered.

A total of 51,874,388 36-mer oligonucleotide probes were selected from both the sense and antisense strands at an average resolution of 46 bp to cover the non-repetitive sequence from the whole genome. Probes were spaced every 10 nucleotides on average. The probes were synthesized via maskless photolithography at a feature density of approximately 390,000 probes per slide.

Biological samples that were hybridized to the arrays consisted of triple-selected human liver poly(A)+ RNA pooled from several individuals (supplied by Ambion). One biological replicate was carried out.

See this NCBI GEO accession for details of experimental protocols.

The TARs identified for hg13 (NCBI Build 31) were mapped to this assembly using Blat. The program pslCDnaFilter was used to filter alignments using the parameters -minId=0.96, -minCover=0.25, -localNearBest=0.001,-minQSize=20, -minNonRepSize=16, -ignoreNs, -bestOverlap.

Display Conventions

TARs are represented by blocks in the graphical display. The numeric part of the ID displayed when the track has pack or full visibility is the ID used by the Yale Database for Active Regions with Tools (DART). A link to this database is provided on the details page for each TAR.

Data Analysis

Two groups of TARs were identified: Normal and Poly(A)-associated.

Normal TARs:

Clusters of transcription units were identified that consisted of at least five consectutive probes with fluorescence intensities in the top 90th intensity percentile and with genomic coordinates within a 250-nt window. After collecting these regions genome-wide, their locations were compared to those of annotated components of genes. As a result, a total of 13,889 transcription units, ranging in size from 209 to 3,438 nucleotides, were identified. Under the null hypothesis of zero transcription, only 400 were expected to be found. Of those regions identified, one-third (4,931) correspond to previously annotated exons while the other 8,958 are new transcribed sequences that are referred to as TARs.

Poly(A)-associated TARs:

Another set of criteria was used to find TARs in which the probe hybridization intensities were correlated with the presence of a polyadenylation signal 3' to the TAR. Transcription units are five consecutive probes with fluoroscence intensities in the top 80th intensity percentile and in a window of 250 nucleotides. The 3' region also must contain or be close to a polyadenylation signal. Transcription units with an associated polyadenylation signal of "AATAAA" were assigned to a type I group, while those with "ATTAAA" were type II. Only 100 of these should occur at random in the genome under the null hypothesis of zero transcription. The majority (1,991) were found to be within annotated exons, and 952 were located more than 10 kb from an annotated gene. A total of 1,371 type I and 674 type II poly(A) sequences were identified within exons of known genes. 1,289 (94%) of type I and 607 (90%) of type II instances were found to be in the 3' exon of the gene.


The TARs were validated using RT-PCR on human liver poly(A)+ RNA. Forty-eight poly(A)-associated and 48 non-poly(A)-associated TARs were investigated. In 94% (90/96) of cases, the PCR products were found to be of the expected size in a single-pass assay.


These data were generated and analyzed by a collaboration between the labs of Michael Snyder, Mark Gerstein, and Sherman Weissman at Yale University and with NASA Ames Research Center (Moffett Field, California) and Eloret Corporation (Sunnyvale, California).


Bertone P, Stolc V, Royce TE, Rozowsky JS, Urban AE, Zhu X, Rinn JL, Tongprasit W, Samanta M, Weissman S et al. Global identification of human transcribed sequences with genome tiling arrays. Science. 2004 Dec 24;306(5705):2242-6.

Stolc V, Gauhar Z, Mason C, Halasz G, van Batenburg MF, Rifkin SA, Hua S, Herreman T, Tongprasit W, Barbano PE et al. A gene expression map for the euchromatic genome of Drosophilamelanogaster. Science. 2004 Oct 22;306(5696):655-60.