Description
These tracks represent data from a CRISPR-activation (CRISPRa) assay performed on human
HEK293 cells as described in Field, et al., 2018. Briefly, lncRNAs found to be transiently
expressed (TrEx) during cortical organoid differentiation from human pluripotent stem cells were
activated in HEK293FT cells by co-transfecting plasmids with dCas9-VP64 and 5 small guide
RNAs (sgRNAs) 50 to 450bp upstream of the observed transcription start site of one TrEx
lncRNA target or 5 random sequence non-targeting controls ("scrambled"). Transfected cells
were selected by puromycin at 24 hours and the remaining cells were harvested for RNA at 48
hours post transfection. RNA-seq libraries were prepared by the NEXTflex Rapid Directional
qRNA-Seq Library Prep Kit (PerkinElmer). All data files associated with these experiments can
be found on GEO (GSE120702).
Each track indicates the genomic coverage from strand-specific RNAseq data (on either the
plus or minus strand) in the human hg19 genome assembly for one CRISPRa assay. The
coverage has been normalized between samples as described in Methods below. The names of
the samples are the same as those used in the GEO accession, where the name-prefix
indicates the cell type (HEK293) plus CRISPRa activation target gene, and the name-suffix is a
unique identifier that distinguishes among biological replicates:
- hek_scramxxx -- scrambled guide RNA controls
- hek_trex8168 -- CRISPRa of TrEx8168
- hek_trex0108 -- CRISPRa of TrEx108
- hek_trex4039 -- CRISPRa of TrEx4039
- hek_trex5008 -- CRISPRa of TrEx5008
Display Conventions and Configuration
The minus strand coverage tracks use negative values so that they descend from
the zero line. The plus strand coverage tracks use positive values. The colors
have been chosen to be colorblind-friendly:
- Blue - plus strand coverage
- Red - minus strand coverage
Because the coverage values have been normalized between all the samples, the visual
display indicates the relative expression between samples at a locus as long as all the
individual tracks use the same scale (adjustable with "Vertical viewing range" limits).
Since there is wide variation in coverage between genes with different levels of expression, you
should adjust the "Vertical viewing range" control at the composite track level in order
to vertically zoom in and out at a given locus. In general, you should probably keep the plus and
minus sets of tracks at the same scale. However, you might also want to use different plus and
minus scales to more closely examine cases of anti-sense transcription. Although you can
adjust each sample's scale separately, this will distort the relative expression of that sample, so
should be avoided. If you do this inadvertantly, the "Reset to defaults" function can be
used to restore all the individual track settings.
Because the plus and minus strand are aggregated into separate composite tracks, the default
browser display groups them separately. Be aware that you can drag the tracks individually
to reorder them. For example, you might want to place each sample's plus and minus strands
together, with plus above minus for a more natural display.
Methods
The full description of data processing of the RNAseq data can be found in Field, et al., 2018.
Here is a brief synopsis.
- Trimming and Filtering
-
The raw paired-end reads were trimmed to eliminate low quality bases. The trimmed reads
were mapped with Bowtie2 (Langmead et al., 2012) to a set of repeat-elements for
the appropriate species. Reads mapping to these elements were removed from further
processing.
- Alignment and Duplicate Removal
-
The filtered reads were aligned to the appropriate genome assembly with STAR (Dobin et al.,
2012) keeping only the primary mapping for multiply-mapped paired-end reads. Duplicate
mappings were removed with Samtools.
- Coverage and DESEQ Normalization
-
The duplicate-removed alignments were converted to coverage using bedTools. The total
coverage at all the exonic positions of a gene was divided by the read length (sum of the length
of the two paired-end reads) for input to DESEQ2. As part of its differential expression analysis,
DESEQ2 performs a normalization across all samples using the expression of all genes (Love
et al., 2014). This normalization compensates for differences in sequencing depth between the
samples. It comprises a set of "sizeFactor" values. The un-normalized values are divided
by the sizeFactors before the rest of the DESEQ2 algorithm is performed. In the same way, the
coverage values from bedTools have been divided by the sizeFactor values to create the tracks
presented here.
Data were generated and processed at the UC Santa Cruz Genomics Institute. For inquiries,
please contact us at the following address:
ssalama@ucsc.
edu
References
Field AR, Jacobs FMJ, Fiddes IT, Phillips APR, Reyes-Ortiz AM, LaMontagne E, Whitehead L,
Meng V, Rosenkrantz JL, Olsen M, Hauessler M, Katzman S, Salama SR, Haussler D.
Structurally conserved primate lncRNAs are transiently expressed during human cortical
differentiation and influence cell type specific genes. Stem Cell Reports. 2018. (In Press)
Dobin, A., Davis, C.A., Schlesinger, F., Drenkow, J., Zaleski, C., Jha, S., Batut, P., Chaisson,
M., and Gingeras, T.R. (2013). STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29(1),
15-21.
Langmead, B., and Salzberg, S. (2012). Fast gapped-read alignment with Bowtie 2. Nature
Methods 9, 357-359.
Love, M.I., Huber, W., and Anders, S. (2014). Moderated estimation of fold change and
dispersion for RNA-seq data with DESeq2. Genome Biology, 15, 550.
|
|