Schema for HAIB Methyl-seq - ENCODE HudsonAlpha Methyl-seq
  Database: hg18    Primary Table: wgEncodeHudsonalphaMethylSeqRegionsRep1Hct116    Row Count: 90,612   Data last updated: 2009-02-10
Format description: Browser extensible data
On download server: MariaDB table dump directory
fieldexampleSQL type info description
bin 590smallint(5) unsigned range Indexing field to speed chromosome range queries.
chrom chr1varchar(255) values Reference sequence chromosome or scaffold
chromStart 744000int(10) unsigned range Start position in chromosome
chromEnd 744037int(10) unsigned range End position in chromosome
name chr1.1varchar(255) values Name of item
score 1000int(10) unsigned range Optional score, nominal range 0-1000
strand .char(1) values + or -
thickStart 0int(10) unsigned range Start of where display should be thick (start codon)
thickEnd 0int(10) unsigned range End of where display should be thick (stop codon)
itemRgb 16744192int(10) unsigned range Used as itemRgb as of 2004-11-22

Sample Rows
 
binchromchromStartchromEndnamescorestrandthickStartthickEnditemRgb
590chr1744000744037chr1.11000.00255,127,0
590chr1752582752656chr1.20.000,0,255
591chr1795274795330chr1.30.000,0,255
591chr1799061799101chr1.41000.00255,127,0
591chr1830102830182chr1.50.000,0,255
591chr1830679830716chr1.61000.00255,127,0
591chr1831917832072chr1.70.000,0,255
591chr1835110835293chr1.81000.00255,127,0
591chr1835600835710chr1.91000.00255,127,0
591chr1836081836146chr1.101000.00255,127,0

Note: all start coordinates in our database are 0-based, not 1-based. See explanation here.

HAIB Methyl-seq (wgEncodeHudsonalphaMethylSeq) Track Description
 

Description

This track shows average methylation status in CpG islands. In general, methylation of CpG sites within a promoter causes silencing of the gene associated with that promoter.

Release Notes

This is release 2 of this track. Release 2 adds tables for several new cell types: GM12891, GM12892, H1-hESC, HeLa-S3, and HepG2.

Track Conventions

Methylation status is color-coded as:

  • orange = methylated (bed score = 1000)
  • blue = non-methylated (bed score = 0)

Methods

CpG regions were assayed via Methyl-seq, a method developed in the Myers laboratory to measure the methylation status at CpGs throughout the genome. It combines DNA digestion by a methyl-sensitive enzyme HpaII and its methyl-insensitive isoschizomer MspI with the Illumina DNA sequencing platform. The method was first applied in a collaboration with the laboratory of Dr. Julie Baker at Stanford University to study methylation and gene expression changes that occur in human embryonic stem cells before and after differentiation to definitive endoderm. A paper describing the results as well as the method has been submitted for publication [1].

This study profiled genomic DNA and mRNA samples derived from two human embryonic stem cell lines: H9 and BG02. These cells were differentiated into definitive endoderm, embryoid bodies, embryoid body-derived cells, and AFP+ (alpha-fetoprotein positive) hepatocytes. These in vitro samples were profiled with Methyl-seq and compared them with normal tissue samples from 11-week and 24-week fetal liver and adult liver.

Methyl-seq assays more than 250,000 methyl-sensitive restriction enzyme cleavage sites, representing more than 90,000 genomic regions. These regions include 35,528 annotated CpG islands, while the remaining 55,084 non-CpG island regions are distributed across the genome in promoters, genes, and intergenic regions. Sequence tags present in MspI libraries but not in HpaII libraries are derived from methylated regions. Conversely, sequence tags that occur in HpaII libraries come from at least partially unmethylated regions.

In vitro differentiation

Definitive endoderm precursor cells were generated from H9 hES cells by treating them with activin A. Embryoid bodies (EBs) were generated by growing undifferentitated H9 and BG02 hESCs in suspension. EB-derived cells were obtained by plating clumps of the cells from the EBs. AFP+ fetal hepatocytes were derived from EBs by plating EB cells with FgF, followed by fluorescence activated cell sorting (FACS) to isolate cells expressing the green fluorescent protein (GFP) reporter gene driven from the AFP promoter.

Isolation of genomic DNA

Genomic DNA is isolated from biological replicates of each cell line by using the QIAGEN DNeasy Blood & Tissue Kit according to the instructions provided by the manufacturer. DNA concentrations and a level of quality of each preparation is determined by UV absorbance.

HpaII and MspI digestions

Cleavage of DNA by restriction endonuclease HpaII is prevented by the presence of a 5-methyl group at the internal C residue of its recognition sequence CCGG. MspI, an isoschizomer of HpaII, cleaves DNA irrespective of the presence of a methyl group at this position.

For the MspI library, 5 μg genomic DNA was digested in a 100 μl reaction with 1X NEB Buffer2 and 20 units MspI restriction enzyme and incubated for 18 hr at 37°C. For the HpaII library, 5 μg genomic DNA was digested in a 100 μl reaction with 1X NEB Buffer1 and 20 units HpaII restriction enzyme and incubated for 18 hr at 37°C.

Note that in subsequent versions of the Methyl-seq protocol, which will be described later, much lower amounts of genomic DNA were used (1 μg and potentially lower).

DNA library construction and sequencing

High-throughput sequencing libraries were generated from DNA fragments of the HpaII or MspI digested genomic DNA according to the protocol posted at the Myers' lab protocols page. This approach was recently modified by removing the first PCR amplification step, just prior to the gel electrophoresis size-selection step, which was found to reduce a fragment-size bias in the sequencing libraries. These libraries were sequenced with an Illumina Genome Analyzer (GA2) according to the manufacturer's recommendations.

Data analysis

For this analyis, reads that align to human genome sequence version hg18 and contain the 5'-CGG-3' HpaII-cut signature on their 5' end were used. These aligned sequence reads were mapped to CCGG sites predicted in silico on hg18. Sites with four or more Msp1 tags occurring in either the forward or reverse direction were retained for analysis. These "assayable" sites were then grouped with neighboring sites that are within 35-75 bp of each other. Thus, a "region" can be comprised of between 2 and 18 digestion sites that are each within 35-75 bp of another site. Methylated and non-methylated calls were made by using HpaII tag data from all assayable cut sites. For each site across each region, the larger of either the forward read count or reverse read count was used. Regions that have an average of 0 or 1 read per cut site are called methylated, and regions with more than one sequence read per site are called unmethylated.

Credits

Dr. Richard M. Myers
Mr. Yuya Kobayashi: yuyak@stanford. edu
Dr. Devin M. Absher: dabsher@hudsonalpha. org
Dr. Rebekka O. Sprouse: rsprouse@hudsonalpha. org

Contact: Flo Pauli.

References

1. Brunner AL, Johnson DS, Kim SW, Valouev A, Reddy TE, Neff NF, Anton E, Medina C, Nguyen L, Chiao E et al. Distinct DNA methylation patterns characterize differentiated human embryonic stem cells and developing human fetal liver. Genome Research. 2009 Jun;19(6):1044-56. Epub 2009 Mar 9.

Data Release Policy

Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column on the track configuration page and the download page. The full data release policy for ENCODE is available here.