GIS PET RNA Track Settings
Gene Identification Signature Paired-End Tags of PolyA+ RNA   (All Expression tracks)

Display mode:       Reset to defaults
List subtracks: only selected/visible    all  
 GIS RNA MCF7  Gene Identification Signature Paired-End Tags of PolyA+ RNA (log phase MCF7)   Schema 
 GIS RNA MCF7 Est  Gene Identification Signature Paired-End Tags of PolyA+ RNA (estrogen-stim MCF7)   Schema 
 GIS RNA HCT116  Gene Identification Signature Paired-End Tags of PolyA+ RNA (5FU-stim HCT116)   Schema 
 GIS RNA hES3  Gene Identification Signature Paired-End Tags of PolyA+ RNA (embryonic stem cell hES3)   Schema 
Source data version: ENCODE Oct 2005 Freeze
Data coordinates converted via liftOver from: May 2004 (NCBI35/hg17)


This track shows the starts and ends of mRNA transcripts determined by paired-end ditag (PET) sequencing. PETs are composed of 18 bases from either end of a cDNA; 36 bp PETs from many clones were concatenated together and cloned into pZero-1 for efficient sequencing. See the Methods and References sections below for more details on PET sequencing.

The PET sequences in this track are full-length transcripts derived from two cell lines with differing treatments:

  • the log phase of MCF7 cells
  • MCF7 cells treated with estrogen (10nM beta-estradiol) for 12 hours
  • HCT116 cells treated with 5FU (5-fluorouracil) for 6 hours
  • Log phase of embryonic stem cell hES3 in feeder free culture condition
In total, 584,624 PETs were generated for the log phase MCF7 cells, 153,179 PETs were generated for the estrogen-treated MCF7 cells, 280,340 PETs were generated for the HCT116 cells, and 1,799,970 PETs were generated from the hES3 cells.

More than 80% of the PETs in the HCT116 and log phase MCF7 cells were mapped to the genome. The 474,278 log phase MCF7 PETs and 223,261 HCT116 PETs that mapped with single and multiple (up to ten) matches in the genome are shown in the two subtracks. For the estrogen-treated MCF7 cells, only those PETs mapped to the ENCODE regions with the above match criteria (4881 total) are displayed.

Human embryonic stem cell line hES3 (46XX, Chinese) was obtained from ES Cell International. These cells were serially cultured according to protocols established previously (Choo, 2006). In brief, feeder-free cultures of hES3 were maintained at 37C/5% CO2 on Matrigel-coated organ culture dishes supplemented with conditioned media from mouse feeders, DE-MEF.

In the graphical display, the ends are represented by blocks connected by a horizontal line. In full and packed display modes, the arrowheads on the horizontal line represent the direction of transcription, and an ID of the format XXXXX-N-M is shown to the left of each PET, where X is the unique ID for each PET, N indicates the number of mapping locations in the genome (1 for a single mapping location, 2 for two mapping locations, and so forth), and M is the number of PET sequences at this location. The total count of PET sequences mapped to the same locus but with slight nucleotide differences may reflect the expression level of the transcripts. PETs that mapped to multiple locations may represent low complexity or repetitive sequences.

The graphical display also uses color coding to reflect the uniqueness and expression level of each PET:

ColorMappingPETS observed at location
dark blueunique2 or more
light blueunique1
medium brownmultiple2 or more
light brownmultiple1


PolyA+ RNA was isolated from the cells. A full-length cDNA library was constructed and converted into a PET library for Gene Identification Signature analysis (Ng et al., 2005). Generation of PET sequences involved cloning of cDNA sequences into the plasmid vector, pGIS3. pGIS3 contains two MmeI recognition sites that flank the cloning site, which were used to produce a 36 bp PET. Each 36 bp PET sequence contains 18 bp from each of the 5' and 3' ends of the original full-length cDNA clone. The 18 bp 3' signature contains 16 bp 3'-specific nucleotides and an AA residual of the polyA tail to indicate the sequence orientation. PET sequences were mapped to the genome using the following specific criteria:

  • a minimal continuous 16 bp match must exist for the 5' signature; the 3' signature must have a minimal continuous 14 bp match
  • both 5' and 3' signatures must be present on the same chromosome
  • their 5' to 3' orientation must be correct
  • the maximal genomic span of a PET genomic alignment must be less than one million bp

Most of the PET sequences (more than 90%) were mapped to specific locations (single mapping loci). PETs mapping to 2 - 10 locations are also included and may represent duplicated genes or pseudogenes in the genome.


To assess overall PET quality and mapping specificity, the top ten most abundant PET clusters that mapped to well-characterized known genes were examined. Over 99% of the PETs represented full-length transcripts, and the majority fell within ten bp of the known 5' and 3' boundaries of these transcripts. The PET mapping was further verified by confirming the existence of physical cDNA clones represented by the ditags. PCR primers were designed based on the PET sequences and amplified the corresponding cDNA inserts from the parental GIS flcDNA library for sequencing analysis. In a set of 86 arbitrarily-selected PETs representing a wide range of annotation categories — including known genes (38 PETs), predicted genes (2 PETs), and novel transcripts (46 PETs) — 84 (97.7%) confirmed the existence of bona fide transcripts.


The GIS-PET libraries and sequence data for transcriptome analysis were produced at the Genome Institute of Singapore. The data were mapped and analyzed by scientists from the Genome Institute of Singapore and the Bioinformatics Institute of Singapore.


Choo A, Padmanabhan J, Chin A, Fong WJ, Oh SKW. Immortalized feeders for the scale-up of human embryonic stem cells in feeder and feeder-free conditions. J Biotechnol. 2006 Mar 9;122(1):130-41.

Ng P, Wei CL, Sung WK, Chiu KP, Lipovich L, Ang CC, Gupta S, Shahab A, Ridwan A, Wong CH, et al. Gene identification signature (GIS) analysis for transcriptome characterization and genome annotation. Nat Methods. 2005 Feb;2(2):105-11.