BU First Exon Track Settings
 
Boston University First Exon Activity   (All Pilot ENCODE Transcription tracks)

Display mode:       Reset to defaults

Show only items with score at or above:   (range: 0 to 1000)

All subtracks:
List subtracks: only selected/visible    all    ()  
hide
 Configure
 BU Cere. Cortex  Boston University First Exon Activity in Cerebral Cortex   Data format 
hide
 Configure
 BU Colon  Boston University First Exon Activity in Colon   Data format 
hide
 Configure
 BU Heart  Boston University First Exon Activity in Heart   Data format 
hide
 Configure
 BU Kidney  Boston University First Exon Activity in Kidney   Data format 
hide
 Configure
 BU Liver  Boston University First Exon Activity in Liver   Data format 
hide
 Configure
 BU Lung  Boston University First Exon Activity in Lung   Data format 
hide
 Configure
 BU Skel. Muscle  Boston University First Exon Activity in Skeletal Muscle   Data format 
hide
 Configure
 BU Spleen  Boston University First Exon Activity in Spleen   Data format 
hide
 Configure
 BU Stomach  Boston University First Exon Activity in Stomach   Data format 
hide
 Configure
 BU Testis  Boston University First Exon Activity in Testis   Data format 
    
Source data version: ENCODE June 2005 Freeze
Assembly: Human Mar. 2006 (NCBI36/hg18)
Data coordinates converted via liftOver from: July 2003 (NCBI34/hg16)

Description

This track displays expression levels of computationally identified first exons and a constitutive exon of genes in ENCODE regions, based on the real competitive Polymerase Chain Reaction (rcPCR) technique described in Ding et al. (2003). Expression levels are indicated by color, ranging from black (no expression) to red (high expression).

Experiments were performed on total RNA samples of ten normal human tissues purchased from Clontech (Palo Alto, CA): cerebral cortex, colon, heart, kidney, liver, lung, skeletal muscle, spleen, stomach, and testis.

The name for each alternative transcript starts with the gene name, followed by an identifier for the alternative first exon or the constitutive exon. For example, for gene CAV1, there are three alternative first exons (CAV1-E1A, CAV1-E1B, and CAV1-E1C) and the third exon is chosen as the constitutively expressed exon (CAV1-E3).

Methods

Alternative transcription start sites (TSS) for 20 ENCODE genes were predicted using PromoSer, an in-house computational tool. PromoSer computationally identifies the TSS by considering alignments of a large number of partial and full-length mRNA sequences and ESTs to genomic DNA, with provision for alternative promoters. In PromoSer, the treatment of alternative first exons (or the resulting TSSs) is as follows:

  • all transcripts (mRNA, full-length mRNA and EST) from the same gene cluster are examined
  • individual ESTs are not considered for alternative TSSs; only the 5'-most positions from all ESTs in the cluster are considered a potential TSS
  • if multiple 5'-end positions are more than 20 bp apart, they are reported as alternative TSSs

For each gene, all alternative first exons were identified based on manual selection of PromoSer predictions. An exon that is shared by all transcripts (called the constitutive exon) was also selected. The selection process involved visually examining the structure of the cluster, preferably using the latest data available on UCSC, to identify distinct first exons that were well formed (having multiple supporting sequences) and had no evidence (especially from newer sequences) of additional sequence that made them internal exons. After the first exon was identified, a subsequence (between 100-300 bases) was selected for use in the experiment. The selection process avoided repeat sequences as much as possible and if the two first exons partially overlapped, the non-overlapping region was selected. If those conditions caused the remaining sequence to be too short (or the first exon itself was too short), a junction with the second exon was used. A constitutive exon was also selected that was included in all (or most) of the alternative transcripts and suitable sequences were then extracted as above (no exon junctions are used).

The absolute expression levels of all exons were individually quantified by rcPCR by designing four assays with PCR amplicons corresponding to each exon. Amplicons were designed according to transcript sequences and can span a large distance on the genomic sequence. In addition, some amplicons were designed across the junctions between first exons and the constitutive second exons, and thus these amplicons may overlap with the amplicons that correspond to the constitutive second exons.

The rcPCR technique combined competitive PCR and matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) for gene expression analysis. To measure the expression level of a gene, an oligonucleotide standard (60-80 bases) of known concentration, complementary to the target sequence with a single base mismatch in the middle, was added as the competitor for PCR. The gene of interest and the oligonucleotide standard resembled two alleles of a heterozygous locus in an allele frequency analysis experiment, and thus could be quantified by the high-throughput MALDI-TOF MS based MassARRAY system (Sequenom Inc.).

After PCR, a base extension reaction was carried out with an extension primer, a ThermoSequenase and a mixture of ddNTPs/dNTP (for example, a mixture of ddA, ddC, ddT, and dG). The extension primer annealed the immediate 5’-upstream sequence of the mismatch position. Depending on the nature of the mismatch and the mixture composition of ddNTPs/dNTP, one or two bases were added to the extension primer, producing two extension products with one base-length difference. These two extension products were then detected and quantified by MALDI-TOF MS.

Expression ratios (e.g. CAV1-E1A/CAV1-E3, CAV1-E1B/CAV1-E3, CAV1-E1C/CAV1-E3) indicate the relative abundance of alternative first exons. 18S rRNA was used for exon absolute expression normalization among different tissues.

Values shown on this track represent the relative abundance of the alternative first exons with respect to the 18S rRNA. The raw values have been log10 transformed and scaled to show graded colors on the browser.

Verification

One biological replicate was performed for each gene. Two to four competitor concentrations were used to detect the expression level of each exon. Two to six technical replicates were performed for each competitor concentration. One more biological replicate will be performed in the future.

Credits

Data generation and analysis for this track were performed by ZLAB at Boston University. The following people contributed: Shengnan Jin, Anason Halees, Heather Burden, Yutao Fu, Ulas Karaoz, Yong Yu, Chunming Ding, Charles R. Cantor, and Zhiping Weng.

References

Ding, C. and Cantor, C.R. A high-throughput gene expression analysis technique using competitive PCR and matrix-assisted laser desorption ionization time-of-flight MS. Proc Natl Acad Sci U S A 100(6), 3059-64 (2003).

Ding, C. and Cantor, C.R. Direct molecular haplotyping of long-range genomic DNA with M1-PCR. Proc Natl Acad Sci U S A 100(13), 7449-53 (2003).

Halees, A.S., Leyfer, D. and Weng, Z. PromoSer: A large-scale mammalian promoter and transcription start site identification service. Nucleic Acids Res. 31(13), 3554-9 (2003).

Halees, A.S. and Weng, Z. PromoSer: improvements to the algorithm, visualization and accessibility. Nucleic Acids Res., 32, W191-W194 (2004).