Download Report

MAPPING PUTATIVE REGULATORY REGIONS USING
HISTONE H3 LYSINE 4 MONOMETHYLATION MARKS
IN BREAST CANCER CELL LINES
by
Denil Wickrama
B.Sc., McMaster University, 2005
a Thesis submitted in partial fulfillment
of the requirements for the degree of
Master of Science
in the Department
of
Molecular Biology and Biochemistry
©
Denil Wickrama 2011
SIMON FRASER UNIVERSITY
Summer 2011
All rights reserved. However, in accordance with the Copyright Act of
Canada, this work may be reproduced without authorization under the
conditions for Fair Dealing. Therefore, limited reproduction of this
work for the purposes of private study, research, criticism, review and
news reporting is likely to be in accordance with the law, particularly
if cited appropriately.
Declaration of
Partial Copyright Licence
The author, whose copyright is declared on the title page of this work, has granted
to Simon Fraser University the right to lend this thesis, project or extended essay
to users of the Simon Fraser University Library, and to make partial or single
copies only for such users or in response to a request from the library of any other
university, or other educational institution, on its own behalf or for one of its users.
The author has further granted permission to Simon Fraser University to keep or
make a digital copy for use in its circulating collection (currently available to the
public at the “Institutional Repository” link of the SFU Library website
<www.lib.sfu.ca> at: <http://ir.lib.sfu.ca/handle/1892/112>) and, without changing
the content, to translate the thesis/project or extended essays, if technically
possible, to any medium or format for the purpose of preservation of the digital
work.
The author has further agreed that permission for multiple copying of this work for
scholarly purposes may be granted by either the author or the Dean of Graduate
Studies.
It is understood that copying or publication of this work for financial gain shall not
be allowed without the author’s written permission.
Permission for public performance, or limited permission for private scholarly use,
of any multimedia materials forming part of this work, may have been granted by
the author. This information may be found on the separately catalogued
multimedia material and in the signed Partial Copyright Licence.
While licensing SFU to permit the above uses, the author retains copyright in the
thesis, project or extended essays, including the right to change the work for
subsequent purposes, including editing and publishing the work in whole or in
part, and licensing other parties, as the author may desire.
The original Partial Copyright Licence attesting to these terms, and signed by this
author, may be found in the original bound copy of this work, retained in the
Simon Fraser University Archive.
Simon Fraser University Library
Burnaby, BC, Canada
Last revision: Spring 09
Abstract
Breast cancer is the most frequently diagnosed cancer in women. In cancer, tumour cells
accumulate changes over time that allow them to replicate indenitely. These changes can
be mutations to DNA and also epigenetic modications.
This study looks at a histone
modication, H3K4me1, in multiple breast cancer cell lines.
It has been found that the
regions between anking H3K4me1 peaks, referred to as valleys , are enriched for bound
transcription factors. Multiple cell lines were used to form functional groups (luminal vs.
basal cell lines and tumourigenic vs. a non-tumourigenic match control) in which to look for
concordance of valleys. In addition, overexpressed genes in a functional group, as determined
by RNA-seq, were correlated with associated uniquely marked valleys. A motif analysis was
done on the valley sequences using MEME and STAMP to yield putative transcription
factor binding sites. This analysis yielded some known and putative tumour suppressors and
oncogenic factors.
iii
This thesis is dedicated to my parents for their love, endless support, and encouragement.
iv
Acknowledgments
I am very grateful to my supervisor Dr. Steven Jones for the opportunity to do this research
and for the support, suggestions, and encouragement given throughout my thesis work.
Thanks also to current and former members of Dr. Steven Jones' lab for help with research,
thesis corrections, or presentation feedback. Notably Anthony Fejes, Mikhail Bilenky, Gordon Robertson, Timothée Cezard, Elizabeth Chun, and Shing Zhan.
Thanks as well to the other members of my committee, Dr. Frederic Pio, and Dr. Fiona
Brinkman, and also my SFU examiner, Dr. Jack Chen, who provided valuable suggestions
to improve this thesis.
Thanks to the CIHR/MSFHR Bioinformatics Training Program and the supervisors and
members of labs that hosted me for a rotation as part of this program.
amazing learning experience.
v
It has been an
Contents
Approval
ii
Abstract
iii
Dedication
iv
Acknowledgments
v
Contents
vi
List of Tables
xiii
List of Figures
xv
Nomenclature
xvi
1 Introduction
1.1
Breast cancer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1.1.1
Cancer development
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1.1.1.1
Oncogenes
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
1.1.1.2
Tumour suppressors
. . . . . . . . . . . . . . . . . . . . . . .
3
. . . . . . . . . . . . . . . . . . . . . . . . . .
4
1.1.2.1
Luminal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
1.1.2.2
Basal-like . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
1.1.2.3
HER2+ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
1.1.2.4
Normal breast-like . . . . . . . . . . . . . . . . . . . . . . . .
6
Cell lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
1.1.2
1.2
1
Breast Cancer Subtypes
vi
1.3
1.4
1.2.1
Advantages of cell lines over primary culture . . . . . . . . . . . . . . .
6
1.2.2
Fidelity of cell lines to primary breast tumours
. . . . . . . . . . . . .
7
1.2.2.1
Large scale genomic delity . . . . . . . . . . . . . . . . . . .
7
1.2.2.2
Immunohistochemical Fidelity
. . . . . . . . . . . . . . . . .
7
1.2.2.3
Therapeutic Fidelity . . . . . . . . . . . . . . . . . . . . . . .
7
Cancer genomics
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
1.3.1
Watson genome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
1.3.2
Venter Genome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
1.3.3
Exomes and transcriptomes . . . . . . . . . . . . . . . . . . . . . . . .
9
1.3.4
Whole cancer genome
9
1.3.5
Genomic Landscape of Cancer
1.3.6
Breast Cancer Genomics Sequencing
1.6
9
10
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
1.4.1
First generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
1.4.2
Second generation
12
1.4.3
Illumina Genome Analyzer
Next-generation sequencing
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
12
1.4.3.1
Roche 454 Genome Sequencer . . . . . . . . . . . . . . . . . .
13
1.4.3.2
Life Technologies SOLiD System . . . . . . . . . . . . . . . .
13
1.4.3.3
Single molecule sequencing
. . . . . . . . . . . . . . . . . . .
14
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
RNA-seq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
1.5.1
Comparison to other methods . . . . . . . . . . . . . . . . . . . . . . .
15
Alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16
1.6.1
Hash based methods . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17
1.6.1.1
Software
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17
1.6.1.2
MAQ
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
18
1.6.2
1.7
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
1.4.4
1.5
. . . . . . . . . . . . . . . . . . . . . . . . . . .
Third generation
BurrowsWheeler Transformation Methods
. . . . . . . . . . . . . . .
18
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
20
1.7.1
What is epigenetics? . . . . . . . . . . . . . . . . . . . . . . . . . . . .
20
1.7.2
How important is epigenetics in normal development?
. . . . . . . . .
20
1.7.3
What role does epigenetics play in cancer? . . . . . . . . . . . . . . . .
21
1.6.2.1
Software
1.6.2.2
Bowtie
Epigenetics
vii
1.7.4
How do epigenetic factors exert phenotypic change?
. . . . . . . . . .
21
1.7.5
How permanent are the changes? . . . . . . . . . . . . . . . . . . . . .
21
1.7.6
What role does the nucleosome play? . . . . . . . . . . . . . . . . . . .
22
1.7.7
What are the types of histone modications?
. . . . . . . . . . . . . .
22
1.7.7.1
Histone acetylation
. . . . . . . . . . . . . . . . . . . . . . .
22
1.7.7.2
Histone phosphorylation . . . . . . . . . . . . . . . . . . . . .
22
1.7.7.3
Histone ubiquitination . . . . . . . . . . . . . . . . . . . . . .
23
1.7.7.4
Histone methylation . . . . . . . . . . . . . . . . . . . . . . .
23
H3K4me1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
24
1.7.8.1
Mono-, di- and tri-methylation . . . . . . . . . . . . . . . . .
24
1.7.8.2
Bimodal locii . . . . . . . . . . . . . . . . . . . . . . . . . . .
24
Histone methyltransferases and histone demethylases . . . . . . . . . .
24
1.7.9.1
LSD1
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
1.7.9.2
MLL1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
26
1.7.12 JHDM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
26
Transcription Regulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
26
1.8.1
Popular TF binding sites programs . . . . . . . . . . . . . . . . . . . .
27
1.8.2
Mismatch representation . . . . . . . . . . . . . . . . . . . . . . . . . .
27
1.8.3
Probabilistic
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
28
1.8.4
Expectation Maximization . . . . . . . . . . . . . . . . . . . . . . . . .
28
1.8.4.1
29
1.7.8
1.7.9
1.7.10 Smyd
1.7.11 Whistle
1.8
1.8.5
1.8.6
MEME
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
TF binding databases
. . . . . . . . . . . . . . . . . . . . . . . . . . .
29
1.8.5.1
OregAnno . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
29
1.8.5.2
JASPER
30
1.8.5.3
TRANSFAC
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
Interpretation of motif-nder output
30
. . . . . . . . . . . . . . . . . . .
31
STAMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
31
Functional analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
32
1.9.1
DAVID
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
33
1.9.2
g:Proler
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
34
1.10 Summary of research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
35
1.8.6.1
1.9
viii
2 Materials and Methods
2.1
36
Cell lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
36
2.1.1
Framentation methods . . . . . . . . . . . . . . . . . . . . . . . . . . .
36
2.1.2
Immunohistochemical properties
. . . . . . . . . . . . . . . . . . . . .
37
2.1.3
Cell lines used
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
37
2.1.3.1
MCF7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
37
2.1.3.2
T47D
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
37
2.1.3.3
BT549 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
38
2.1.3.4
MDA-MB-231
. . . . . . . . . . . . . . . . . . . . . . . . . .
38
2.1.3.5
HS578T . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
38
2.1.3.6
HS578Bst . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
39
2.2
Aligning sequence reads to reference genome . . . . . . . . . . . . . . . . . . .
39
2.3
Filtering reads
39
2.4
Identifying enriched regions
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
40
2.4.1
Vancouver Short Read (Find Peaks 4)
. . . . . . . . . . . . . . . . . .
40
2.4.2
Saturation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
40
2.5
Valley regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
40
2.6
Concordance
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
40
2.7
Expression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
41
2.8
Motifs
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
41
2.8.1
Association of valley marked genes with breast cancer tumourigenesis .
42
2.8.2
Functional analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
42
3 Results
43
3.1
Note regarding contributions
. . . . . . . . . . . . . . . . . . . . . . . . . . .
43
3.2
Chip sequencing Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
43
3.2.1
Tally of Reads and Peaks
. . . . . . . . . . . . . . . . . . . . . . . . .
43
3.2.2
Saturation curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
44
3.3
Enrichment of TF binding sites in H3K4me1 marked motifs
. . . . . . . . . .
46
3.4
Correlation of Valleys with Downstream Genes
. . . . . . . . . . . . . . . . .
46
Association of valley marked genes with breast cancer tumourigenesis .
47
Concordance of valleys between cell lines . . . . . . . . . . . . . . . . . . . . .
47
3.5.1
47
3.4.1
3.5
Concordance between breast cancer cell line and a matched control . .
ix
3.5.2
3.5.3
Concordance among various luminal and basal breast cancer cell lines
48
3.5.2.1
Breast cancer subtypes
. . . . . . . . . . . . . . . . . . . . .
48
3.5.2.2
Concordance with the same subtype . . . . . . . . . . . . . .
49
3.5.2.3
Valleys shared by all cell lines . . . . . . . . . . . . . . . . . .
49
Concordance between a set of luminal and a set of basal breast cancer
cell lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.6
Unique valleys in promoter regions of overexpressed genes
52
. . . . . . . . . . . . . . . .
52
Dening marked overexpressed categories
3.6.2
Tally of unique valleys in promoter region of overexpressed genes
. . .
56
3.6.2.1
Breast cancer subtype specic valleys
. . . . . . . . . . . . .
56
3.6.2.2
Tumourigenics valleys . . . . . . . . . . . . . . . . . . . . . .
56
Tally of uniquely marked overexpressed genes . . . . . . . . . . . . . .
56
Functional analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
58
3.7.1
Functional analysis of basal and luminal cell lines . . . . . . . . . . . .
58
3.7.1.1
Functional analysis of basal marked basal overexpressed genes
58
3.7.1.2
Functional analysis of basal marked luminal overexpressed
genes
3.7.1.3
3.7.1.4
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Functional analysis of cancer and control cell lines
3.7.2.1
59
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
59
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
60
Functional analysis of cancer marked cancer overexpressed
genes
3.7.2.4
. . . . . . . . . . .
59
Functional analysis of cancer marked control overexpressed
genes
3.7.2.3
59
Functional analysis of control marked cancer overexpressed
genes
3.7.2.2
59
Functional analysis of luminal marked luminal overexpressed
genes
3.7.2
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Functional analysis of luminal marked basal overexpressed
genes
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
60
Functional analysis of control marked control overexpressed
genes
3.8
. . . . . . . . . . .
3.6.1
3.6.3
3.7
51
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
60
Marked overexpressed genes . . . . . . . . . . . . . . . . . . . . . . . . . . . .
82
3.8.1
Motifs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
82
3.8.1.1
82
ESR1
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
x
3.8.1.2
3.9
ESR2
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Genes downstream of ESR1 motifs in Valleys
. . . . . . . . . . . . . . . . . .
4 Discussion & Conclusions
4.1
82
86
92
Valley concordance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
93
4.1.1
Match control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
93
4.1.2
Breast cancer subtype
. . . . . . . . . . . . . . . . . . . . . . . . . . .
93
Core shared marks . . . . . . . . . . . . . . . . . . . . . . . .
93
4.2
Association of valley marked genes with breast cancer tumourigenesis . . . . .
94
4.3
Marked genes with corresponding expression modulation . . . . . . . . . . . .
94
4.1.2.1
4.3.1
Functions of H3K4me1 Marked genes with corresponding expression
modulation
4.4
4.3.1.1
Cell cycle checkpoints
4.3.1.2
Metastasis
4.3.1.3
95
. . . . . . . . . . . . . . . . . . . . . .
95
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
96
Cellular adhesion . . . . . . . . . . . . . . . . . . . . . . . . .
96
4.3.2
Angiogenesis
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
97
4.3.3
MicroRNAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
97
Putative regulatory regions
4.4.1
4.5
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
98
Relevance of marked overexpressed categories . . . . . . . . . . . . . .
98
4.4.1.1
Putative activatory region . . . . . . . . . . . . . . . . . . . .
98
4.4.1.2
Putative repressive region . . . . . . . . . . . . . . . . . . . .
98
Experimentally determined functions of TFs potentially regulated by valley
regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
98
4.5.1
ESR1 and ESR2
99
4.5.2
Egr1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
4.5.3
Che-1
4.5.4
EWSR1/Fli-1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
4.5.5
Ixr1
4.5.6
Tlx1_NFIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
4.5.7
Tin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.5.8
Bcd, oc, and gsc
4.5.9
IRF1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
4.5.10 MEF2A
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
xi
4.5.11 Sna
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
4.5.12 Stat3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
4.5.13 REST
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
4.6
Experimental validation
4.7
Uncorroborated experimental results
4.8
4.9
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
. . . . . . . . . . . . . . . . . . . . . . . 109
4.7.1
Post-transcriptional regulation
4.7.2
Co-regulators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
Progressive methylation
. . . . . . . . . . . . . . . . . . . . . . 109
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
4.8.1
Binding strengths of eectors
. . . . . . . . . . . . . . . . . . . . . . . 110
4.8.2
H3K4me3 unobserved in these studies
. . . . . . . . . . . . . . . . . . 110
4.8.2.1
Expected case
4.8.2.2
Methylation states . . . . . . . . . . . . . . . . . . . . . . . . 110
4.8.2.3
Reasons for unexpected case
Epigenetic crosstalk
. . . . . . . . . . . . . . . . . . . . . . . . . . 110
. . . . . . . . . . . . . . . . . . 111
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
4.10 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
Bibliography
113
xii
List of Tables
3.1
Tally of Reads and Peaks
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2
Enrichment of TF binding sites in valleys
3.3
Proportion of breast cancer genes of the set of genes marked with H3K4me1
. . . . . . . . . . . . . . . . . . . .
44
46
valleys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
47
3.4
Concordance of valleys in match controlled cell lines
. . . . . . . . . . . . . .
48
3.5
Cell lines by breast cancer subtype
. . . . . . . . . . . . . . . . . . . . . . . .
49
3.6
Overlap of valleys in promoter regions of luminal and basal cell lines
. . . . .
50
3.7
Overlap of valleys in promoter regions of luminal and basal cell lines
. . . . .
54
3.8
Valleys shared between breast cancer subtypes . . . . . . . . . . . . . . . . . .
55
3.9
Categories correlating expression with H3K4me1 mark in tumourigenic and
non-tumourigenic cell lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
56
3.10 Categories correlating expression with H3K4me1 mark in luminal and basal
cell lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
57
3.11 Number of valleys in the promoter region marking overexpressed genes in
breast cancer by subtype . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
57
3.12 Valleys in promoters of genes correlated with overexpression in match-controlled
cell lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
57
3.13 Uniquely marked genes correlated with overexpression by breast cancer subtype 57
3.14 Uniquely marked genes correlated with overexpression in match-controlled cell
lines
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
57
3.15 Control marked cancer overexpressed genes
. . . . . . . . . . . . . . . . . . .
61
3.16 Cancer marked control overexpressed genes
. . . . . . . . . . . . . . . . . . .
63
3.17 Cancer marked cancer overexpressed genes . . . . . . . . . . . . . . . . . . . .
66
3.18 Control marked control overexpressed genes
68
xiii
. . . . . . . . . . . . . . . . . . .
3.19 Basal marked basal overexpressed genes
. . . . . . . . . . . . . . . . . . . . .
72
3.20 Basal marked luminal overexpressed genes . . . . . . . . . . . . . . . . . . . .
75
3.21 Luminal marked basal overexpressed genes . . . . . . . . . . . . . . . . . . . .
77
3.22 Luminal marked luminal overexpressed genes
. . . . . . . . . . . . . . . . . .
80
3.23 Uniquely Marked in Control and Overexpressed in Control . . . . . . . . . . .
84
3.24 Uniquely Marked in Cancer and Overexpressed in Control . . . . . . . . . . .
84
3.25 Uniquely Marked in Cancer and Overexpressed in Cancer
. . . . . . . . . . .
85
3.26 Uniquely Marked in Control and Overexpressed in Cancer . . . . . . . . . . .
85
xiv
List of Figures
3.1
Combined Saturation plots.
This gure was generated using Find Peaks 2
and a modied MatLab script,
saturation.m, both created by Mikhail Bilenky.
3.2
Overlap of valley regions in tumourigenic cell line vs. control
3.3
Overlap of valley regions by breast cancer subtype
3.4
ESR1 motifs found in valleys upstream of genes that were uniquely marked
45
. . . . . . . . .
48
. . . . . . . . . . . . . . .
52
by H3K4me1 mono-methylation in the control cell line and overexpressed in
the control cell line, cont.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
87
4.1
Snail1 complex [44] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
4.2
Various REST isoforms [76]
4.3
Low H3K4me1 could indicate higher H3K4me3
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
xv
. . . . . . . . . . . . . . . . . 111
Nomenclature
Acronym
BM
Basement membrane
bp
Base pair
DAVID Database for Annotation, Visualization and Integrated Discovery
DNA
Deoxyribonucleic acid
ECM
Extracelluar Matrix
EGFR epidermal growth factor receptor
ER
Estrogen Receptor
GO
Gene Ontology
HAT
Histone AcetylTransferases
HDAC Histone DeACetylases
HER2 Human Epidermal growth factor Receptor 2
HKMT Histone Lysine MethylTransferases
KDM Lysine DeMethylase
KEGG Kyoto Encyclopedia of Genes and Genomes
LSD1
Lysine-Specic Demethylase 1
MEME Multiple EM for Motif Elicitation
xvi
PCR
Polymerase Chain Reaction
PFM
position frequency matrix
PR
Progesterone Receptor
PRMT Protein aRrginine MethylTransferases
PSSM Position Specic Score Matrix
SHRiMP The SHort Read Mapping Package
SMS
Single molecule sequencing
SOLiD Sequencing by Oligonucleotide Ligation and Detection
TF
Transcription factor
TRANSFAC The Transcription Factor Database
TSS
Transcriptional start site
Glossary
Carcinogenesis Carcinogenesis or oncogenesis is literally the creation of cancer
ChIP-Seq Chromatin immunoprecipitation combined with massively parallel DNA sequencing to identify the DNA-associated proteins
Epigenetics Heritable changes in gene expression and chromatin organisation that are not
encoded in the genomic DNA itself
H3K4me1 Histone H3 mono methyl K4
Histones Histones are the proteins closely associated with DNA molecules
Nucleosomes Nucleosomes are the basic unit of DNA packaging in eukaryotes (cells with a
nucleus), consisting of a segment of DNA wound around a histone protein core.
Oncogenes An oncogene is a gene that has the potential to cause cancer
RNA-seq Deep high-throughput transcriptome sequencing. Also known as Whole Transcriptome Shotgun Sequencing.
xvii
Somatic mutation Alterations in DNA that occur after conception.
Tumour suppressor gene A tumour suppressor gene, or anti-oncogene, is a gene that protects
a cell from one step on the path to cancer.
Valley Flanking H3K4me1 monomethylation peaks possibly marking a transcription factor
binding site
xviii
Chapter 1
Introduction
1.1
Breast cancer
Breast cancer is heterogeneous, arising from varied genetic and epigenetic abnormalities [286].
In general, tumours progress by accumulating modications that allow them behave dierently than normal cells.
This includes self-suciency in growth signals, insensitivity to
anti-growth signals, tissue invasion, metastasis, and sustained angiogenesis [106]. Usually,
these steps occur by the activation of an oncogene, such as Ras, or inactivating tumour
suppressor genes, such as p53 [23]. Dierent tumour types have dierent molecular characteristics and determining the tumour type allows prediction of the prognosis along with the
best treatment [138].
1.1.1
Cancer development
The path by which cancer progresses is also important to developing treatments. Carcinogenesis literally means the production of cancer [50]. It occurs in multiple steps in the form
of genetic or epigenetic alterations that inuence key cellular pathways [71, 71, 129]. Some
of these steps include: deregulation of multiple cellular processes including genome stability,
proliferation, apoptosis, motility, and angiogenesis [6, 106].
With the breakdown of these
barriers, a normal, nite-life-span somatic epithelial cell can transform into an immortalized,
1
CHAPTER 1. INTRODUCTION
2
metastatic cell.
1.1.1.1 Oncogenes
A group of genes that are major players in carcinogenesis are called oncogenes. An oncogene
is any gene that encodes a protein able to transform cells to induce cancer [198]. Types of
oncogenes may include growth factors, growth factor receptors, signal-transduction proteins,
transcription factors, pro- or anti-apoptotic proteins, cell cycle control proteins, and DNA
repair proteins [198].
An example of a growth factor oncogene that plays a role in breast cancer is EGF-R/ErbB2.
Epidermal growth factor receptor (EGFR) and ErbB2 are members of the ErbB family of
receptor tyrosine kinases. ErbB2 interacts with EGFR in order to achieve its full oncogenic
potential. ErbB2 amplication and overexpression are associated with a poor prognosis in
breast cancer patients [45]. In addition, BRCA1 is an oncogeneic transcriptional regulator
whose mutation has been linked to the development of breast and ovarian cancer [347]. Other
oncogenes that researchers have found to be related to breast cancer include the tyrosine
kinase family of growth factor receptors, the c-myc oncogene, cyclin D-1, and the cyclin
regulator, CDK-1 [46].
Oncogenes arise by activation of a proto-oncogene.
These proto-oncogenes may undergo
mutations altering their regulation or function, which make them capable of turning normal cells into cancer cells. Proto-oncogenes are cellular genes with important functions in
normal cell growth or dierentiation [198].
Dierent oncogenes are mutated in dierent
tumours, contributing to dierences in histopathology, hormone receptor expression, and
clinical course [67].
As mentioned earlier, activation is necessary to convert a proto-oncogene into an oncogene.
Generally, this involves a gain-of-function mutation [198]. These genes are altered due to
mutations such as amplication, deletion and insertion mutations, increased transcription,
and point mutations [198].
A mutation within a proto-oncogene can change a protein's
structure, causing an increase in enzyme activity or a loss of regulation [234].
One method of proto-oncogene activation, gene amplication, increases the protein levels
encoded by a gene.
This could occur in various ways.
For example, an amplication in
CHAPTER 1. INTRODUCTION
3
protein concentration, due to misregulation would provide a gain of function. Also, increasing
mRNA stability prolongs its existence, causing more translation, and thus increased activity
in the cell. This results in enhanced function of the gene. An example of such a mode of
oncogene activation is that of HER2, which is seen in about 20% of primary breast cancer
cases [245].
A point mutation that enhances the function of the oncoprotein is another mode of activation.
An example is point mutations in the ras oncogene, seen commonly in lung, colorectal, and
pancreatic (but not breast) cancer [245].
Chromosomal translocation, is a method of oncogenic transformation where a fusion gene is
transcribed into a protein, with enhanced function. Chromosomal translocations can cause
increased gene expression to occur in the incorrect cell type or cellular conditions. This could
also result in the expression of a constitutively active hybrid protein.
1.1.1.2 Tumour suppressors
Proto-oncogenes are typically genes that assist cell growth and dierentiation that mutated
they induce cancer when mutated. [54]. Tumour suppressors on the other hand slow down cell
division, repair DNA mistakes, and promote apoptosis. The loss of function of these genes
promotes malignancy [250]. Tumour suppressor gene mutations can be haploinsucient, or
dominant negative in addition to recessive [250].
Usually, mutated tumour suppressors are recessive alleles, as they contain loss-of-function
mutations [107]. These mutations can follow a two hit hypothesis where both alleles that
code for a particular gene must be aected before an eect is manifested [155]. Typically, a
mutation limited to one oncogene would be suppressed by normal mitotic control and tumour
suppressor genes [156]. An inherited loss of a tumour suppressor allele leads to accelerated
tumourigenesis, due to the need to inactivate only one remaining allele [250]
In some cases, inactivation of one allele of a tumour suppressor gene is sucient to cause tumours. Haploinsuciency occurs when one allele is insucient to confer the full functionality
produced from two wild-type alleles [250].
CHAPTER 1. INTRODUCTION
4
In the case of a dominant negative mutation, the wild-type allele does not need to be inactivated, because the dominant negative mutation serves that function [250]. This phenomenon
is called the dominant negative eect.
These mutations are also thought to be more fre-
quent than null mutations such as complete gene deletions, premature nonsense mutations
or regulatory alterations abolishing allelic expression [94]. Also, they appear frequently in
transcription factors (TFs) [317].
An example of a key tumour suppressor gene is the p53 gene [322]. A mutation in the p53
gene is the most common genetic change found in breast cancer, found in 50% of human
cancers [294]. One function of this gene is to keep cells with damaged DNA from entering
the cell cycle. The p53 gene can tell a normal cell with DNA damage to stop proliferating
and repair the damage [46]. In cancer cells, p53 recognizes damaged DNA and tells the cell
to undergo apoptosis. If the p53 gene is damaged and loses its function, cells with damaged
DNA continue to reproduce when normally they would have been removed through apoptosis.
A small proportion of breast cancer cases (5%) are related to the inheritance of susceptibility
genes [46]. Examples of breast cancer susceptibility genes involved in some inherited cases
of breast cancer are BRCA1 and BRCA2. If inactivated, these tumour suppressor genes can
act indirectly in the cell by disrupting DNA repair [46]. This allows the cell to accumulate
DNA damage, including mutations that can encourage cancer development. Other tumour
suppressor genes that researchers have found may be related to breast cancer include the
Retino blastoma, Brush-1, Maspin, nm23, and the TSG101 genes.
1.1.2
Breast Cancer Subtypes
For many years, the conventional way to diagnose the pathology of breast tumours was microscopic subtyping and grading [268]. However, patients with the same pathologic subtype
and grade can have dierent outcomes.
Long-term follow-up of patients with breast can-
cer show that a particular subtype of carcinoma or a specic grade as determined by the
Nottingham Prognostic Index (NPI) has little impact on prognosis and doesn't provide any
insights into the best therapeutic strategy [268].
Patients with breast cancer can be stratied based on their gene expression prole and
expression of immunohistochemical expression of cytokeratins, estrogen receptors, EGFR,
CHAPTER 1. INTRODUCTION
5
and HER2 [24]. This classication has impact on therapeutic strategies, and the dierent
molecular subtypes respond dierently to chemotherapy. Five distinct molecular subclasses
have been identied: Luminal A and B, HER2, basal-like, and normal-like [255, 237, 137].
1.1.2.1 Luminal
Luminal-like breast carcinoma is characterized by the expression of Estrogen Receptor (ER),
Progestorone Receptor (PR), Bcl-2 and CK8/18 [268].
Luminal tumours originate at the
inner cells that line the mammary ducts [232]. They are characterized by high levels of ER
expression and are associated with good prognosis, high survival rates and low recurrence.
Luminal A is the most prevalent cancer subtype occuring in 42-59% of cases [37].
The
characteristic molecular markers are ER+ and/or PR+ and tend to be Human Epidermal
growth factor Receptor 2 negative (HER2-). Only about 15% of luminal A tumours have
p53 mutations, a factor linked with a poorer prognosis [37].
Luminal B tumours occur 9-16% and is a more aggressive phenotype than luminal A but
still has fairly high survival rates [255]. They are more likely to have p53 mutations, poorer
tumour grade, and larger tumour size. Luminal B tumours tend to be HER2+, ER+ and/or
PR+, and most express EGFR-1 and cyclin E1 [308].
1.1.2.2 Basal-like
Many basal-like tumours are triple-negative (ER-, PR-, HER2-) and this category comprises
about 8-20% of breast cancers [37, 221]. This subtype expresses CK5/6 and/or EGFR [269].
These tumours are often associated with aggressive histological features, BRCA mutations,
and have a poorer prognosis compared to luminal subtypes [338].
Basal-like tumours are
usually treated with some combination of surgery, radiation therapy and chemotherapy.
These tumours cannot be treated with trastuzumab or hormone therapies because they are
HER2- and hormone receptor-negative [20].
CHAPTER 1. INTRODUCTION
6
1.1.2.3 HER2+
This category of tumour typically has the molecular signature (ER-, PR-, HER2+). HER2+
breast cancers tend to be more aggressive than other types of breast cancer [309]. In the
majority of these tumours p53 is not expressed. HER2+ tumours and also are prone to early
and frequent relapse and distant metastases. This tumour type has an occurrence of 7-12%
[151]. HER+ tumours can be treated with the drug trastuzumab.
1.1.2.4 Normal breast-like
About 6-10% fall into an unclassied/normal breast-like category [37]. These tumours do
not t the proles of the other four subtypes. These are negative for all ve markers ER-,
PR-, HER2-, CK5- and EGFR- [268]. These tumours are most often small and tend to have
a good prognosis [73].
1.2
Cell lines
When studying tumours, cell lines are often used. A cell line is a homogeneous population
of cells on which experiments can be performed. These cell lines can be derived from breast
cancer patients and be immortalized for study. They are useful
in vitro
models of cancer
research.
1.2.1
Advantages of cell lines over primary culture
Cell lines do not fully represent the tumours from which they derive.
They do however
represent tangible and tractable experimental resources, and there are advantages to their
use in a genomewide sequencing study.
For example, they are readily available in large
quantities. When tumour tissue is used the quantity of tumour material would be limited,
and therefore dicult to share.
Directly sequencing patient derived tumour tissue would
provide no experimental resource to test whether the modications are causative or merely
correlative with disease. Some other advantages of using cell lines over primary culture are
CHAPTER 1. INTRODUCTION
7
faster population doubling times, and the lack of a nite set lifespan before senescence [32].
Cell lines are also heavily relied upon for compound and RNAi screening [103].
1.2.2
Fidelity of cell lines to primary breast tumours
Interpreting the results of a cell line experiment in the context of breast cancer pathophysiology requires an understanding of the extent to which they mirror aberrations that are
present in primary tumours. Studies have concluded that the cell line collection mirror most
of the important genomic and resulting transcriptional abnormalities found in primary breast
tumours. They show analysis of the functions of these genes in the ensemble of cell lines will
accurately reect how they contribute to breast cancer pathophysiologies [236].
1.2.2.1 Large scale genomic delity
Cell lines display the same heterogeneity in copy number and expression abnormalities as
the primary tumours, and they carry almost all of the recurrent genomic abnormalities
associated with clinical outcome in primary tumours [236].
1.2.2.2 Immunohistochemical Fidelity
Breast cancer cell lines can also be used to study subtype specic changes in breast cancer.
This is because the breast cancer cell lines cluster into basal-like and luminal expression
subsets in a similar way to their tissue counterparts.
A study on the cell lines T47D,
HS578T, MCF7, and MDA-MB-231 shows that luminal cells appear more dierentiated and
form tight cell-cell junctions, while the Basal B cells appear less dierentiated and have a
more mesenchymal-like appearance [236].
1.2.2.3 Therapeutic Fidelity
Given the immunohistochemical and large scale genomic delity, we would expect cell lines
to behave in a similar manner to their representative breast tumours to therapeutic agents.
CHAPTER 1. INTRODUCTION
8
Indeed, studies have found the cell lines exhibit heterogeneous responses to targeted therapeutics paralleling clinical observations [236].
1.3
1.3.1
Cancer genomics
Watson genome
Second-generation DNA sequencing technologies have transformed investigation of cancer
genomes.
James Watson's genome was the rst personal genome to be sequenced using
NGS technologies [319].
This achievement was rst proof of principle that these rapid-
sequencing machines can decipher large, complex genomes [243].
Watson's genome was
sequenced to 7.4× coverage on the 454 GS (Roche) platform [331], and included 3.3 million
single nucleotide polymorphisms. It took just four months, a handful of scientists and less
than US$1.5 million to sequence the 6 billion base pairs of DNA pioneer James Watson [319].
1.3.2
Venter Genome
The genome of J. Craig Venter was sequenced at a cost of $100 million [319]. Their approach
was based on whole-genome shotgun sequencing, and generated an assembled genome over
half of which is represented in large diploid segments (>200 kilobases). Essentially, in this
method, the sequence was broken into large parts. Then the large parts were broken into
smaller parts, sequenced and put back together [261].
The dierence between Venter's
genome and Watson's, besides the cost, is that in Venter's genome it was possible to gure
out how the smaller parts t into the larger parts, and to reconstruct contiguous pieces. Also,
unlike Watson's data, Venter's data allows us to look much more closely at the dierence
between the two sets of chromosomes and reports that the maternal and paternal sets are
quite dierent and 44% of the genes are heterozygous [261].
Comparison with previous
reference human genome sequences, which were composites comprising multiple humans,
revealed that the majority of genomic alterations are the well-studied class of variants based
on single nucleotides (SNPs). However, the results also reveal that lesser-studied genomic
variants, insertions and deletions, while comprising a minority (22%) of genomic variation
events, actually account for almost 74% of variant nucleotides [182].
CHAPTER 1. INTRODUCTION
1.3.3
9
Exomes and transcriptomes
Most of the currently known driver mutations change the coding sequences of protein-coding
genes and because protein-coding exons account for only about 1% of the human genome, sequencing is often being thriftily targeted at these [315]. Use of technologies that extract subsets of DNA sequences from the whole genome [208], in combination with second-generation
sequencing, has allowed sequencing of the protein-coding exons of roughly 2000 individual
cancers worldwide [304]. This strategy will nd base substitutions and indels in coding exons but will miss these types of mutation in noncoding regions and require other analyses of
the same genomes to report most rearrangements. Similarly, after extraction of RNA, the
transcriptomes of many hundreds of cancers have been sequenced [304].
1.3.4
Whole cancer genome
While sequencing exomes and transcriptomes yields useful information, it does not tell us
the whole story. Technology shifts allowed further insight by sequencing the whole cancer
genome [291, 185, 178]. This strategy, in which genomic DNA from a cancer and, DNA
isolated from normal tissue of the same person, can reveal all classes of somatic change
(base substitutions, indels, rearrangements, copy number changes, and even potentially epigenetic alterations) in all sectors of the genome (exons, introns, and intergenic regions) [304].
Thizs allows exploration of the genome without any preconceptions of where the important
mutations are.
1.3.5
Genomic Landscape of Cancer
Somatic mutations found in cancer are either drivers or passengers [286, 98]. Passenger mutations confer no selective advantage or disadvantage, whereas driver mutations are causal in
the neoplastic process and positively selected for in tumourigenesis [335]. There are usually
between 1000 and 10,000 somatic substitutions in the genomes of most adult cancers, including breast, ovary, colorectal, pancreas, and glioma [98]. Within a particular cancer type,
individual tumours often display wide variation in the prevalence of base substitutions [304].
Cancer genome exploration has identied approximately 400 somatically mutated cancer
CHAPTER 1. INTRODUCTION
10
genes or 2% of the protein-coding genes in the human genome that contribute to neoplastic
change in one or more types of cancer [89, 304].
Most inherited cancer shows a dominant pattern of inheritance, an inactivation of tumour
suppressor genes rather than activating mutations in oncogenes [77].
Most of the known
cancer genes were found through primary cytogenetic analyses, with the wave of ever higher
resolution copy number studies bringing a further substantial yield [304]. The advent of studies systematically sequencing cancer genomes has identied cancer genes directly through
an elevated prevalence of base substitutions and small indels. These include several dominant cancer genes, such as BRAF, EGFR, ERBB2, PIK3CA, IDH1, IDH2, EZH2, FOXL2,
PPP2R1A, and JAK2 [304].
While less recessive genes are known, there are some genes which may activate oncogenes.
Examples that have emerged through systematic sequencing, include SETD2, KDM6A,
KDM5C, PBRM1, BAP1, ARID1A, DNMT3A, GATA3, DAXX, ATRX, and MLL2 [304].
Epigenetics plays a part in carcinogenesis and these sequencing studies nd evidence of this
as well. Some of the genes found in these studies are involved in chromatin modication and
remodelling. For example, SETD2, EZH2, and MLL2 are histone H3 methylases, whereas
KDM6A and KDM5C are histone H3 demethylases [304].
1.3.6
Breast Cancer Genomics Sequencing
Not only have there been many studies to investigate the genomic landscape of cancer but one
study in particular investigates the breast cancer genome. This study used these advances in
sequencing technologies to characterize all somatic coding mutations that occur during the
development and progression of individual cancers. Here they achieved over 43-fold coverage
using sequencing Illumina technology to study the genome of metastatic tissue from a breast
cancer patient [291].
This coverage ensured every part of the genome was sequenced and
allowed them to identify somatic mutations where the tumour genome diered from the
patient's normal genome.
When comparing noncancerous and metastatic tissue, they found 32 mutations present in
the metastatic tumour. Overall the number of mutations they found in the cancerous tissue
was greater than expected, making it challenging to determine which mutations were drivers
CHAPTER 1. INTRODUCTION
11
that enhance a cancer's ability to spread, and which were passenger mutations that have
no eect [295]. Of the 32 mutations found in the metastatic tumour, ve were prevalent in
the primary tumour, and six were found at lower frequencies in the primary tumour [291].
This kind of analysis to sheds light on questions such as whether tumours start out with the
ability to spread or they evolve that capacity with time.
1.4
Next-generation sequencing
The previous section discussed some of the types of contributions that high-throughput
genome sequencing can have on cancer genomics. When we examine a genome in a unbiased
way and use tumourigenic samples with non-tumourigenic controls we can draw many useful
conclusions about the role of a particular gene in tumourigenesis.
Using this technology
for these kinds of studies is now feasible but only as a result of various improvements. A
discussion of the progression and future of high throughput sequencing follows.
1.4.1
First generation
This technology started with Sanger sequencing machines. Modern Sanger sequencing machines started a shift in the way we think about sequencing. High-throughput sequencing, in
which a single lab could sequence millions of base pairs, rather than the thousands that could
be done prior to their introduction [131]. These machines are called the rst generation of
sequencing technology as they are the rst of many improvements and variations on highthroughput sequencing technology.
They used automated capillary sequencing machines.
This method was rst developed by Fredrick Sanger, using Sanger chemistry [285].
consumed much time and reagents and used isotopic radioactive labelling.
They
This required
four separate chain termination reactions, and slab-gel based separation on four individual
lanes. Eventually, this was improved to capillary-electrophoresis, using parallel multiple sequencing runs. This generation of sequencers was used in production of the Human Genome
Project. This method can be applied to achieve sequencing length up to 1000 bp, with raw
accuracy as high as 99.999%, at a cost as little as $0.50/kilobase and throughput close to
600000 bp/day [346]. Though this method is still use, it is not fast enough or suciently
economical to be used in present-day large scale genomic analysis.
CHAPTER 1. INTRODUCTION
1.4.2
12
Second generation
A new generation of sequencing technologies was needed for massively parallel genomic
studies. There are three widely used commercial second generation sequencing platforms,
the Illumina Genome Analyzer, Roche 454 Genome Sequencer and Life Technologies SOLiD
System.
1.4.3
Illumina Genome Analyzer
Illumina's workow uses reversible uorescently-labelled terminators as each dNTP is added.
This system uses a ow-cell with eight lanes that allows bridge amplication [78] of fragments
on it's surface. Each cycle, four distinctly labelled nucleotides are added simultaneously to
´
the ow cell channel, DNA polymerase adds a base pair, and it is 3 -OH blocked. The Illumina Genome Analyzer produces sequence reads of 32-50 bps [210, 346]. It's main drawback
is the short read length and the signal decay of the uorescent signal if any of the DNA
strands extend out of sync [346].
All of the second generation technologies follow a similar workow. The workow will be
described below for the Illumina Genome Analyzer. First, DNA fragments are prepared from
the genomic DNA sample. This is done by randomly sheared genomic DNA of 10s to 100s bp
in size or pair-end fragments with controlled distance distribution. This can be done by either
sonication or using micrococcal nuclease to fragment the DNA [253]. The advantages and
disadvantages of each of these strategies are discussed further in the materials and methods
section. Adapters are ligated to both ends of the fragments [346]. They are then attached to
a planar surface as denatured single strands. The resulting single-stranded template library
is created and immobilized on a solid surface. These fragments are then clonally amplied,
by bridge amplication [78], resulting in double stranded fragments. These fragments are
denatured and cycles of bridge amplication are repeated [346].
DNA clusters form an
array of DNA clusters on a slide. The sequencing then begins with the addition of all four
ourescently labelled reversible terminators, primers, and DNA polymerase [18]. Then the 30
end is unblocked and the cycle is repeated for the subsequent bases. Optical events generated
from the cyclic chain extension process are monitored by microscopic detection system, and
images recorded through CCD camera. 100 of these regions of clustered DNA, or tiles, are
CHAPTER 1. INTRODUCTION
13
imaged per lane [333]. Some bioinformatics challenges involved in this step are background
subtraction, image correlation to account for owcell repositioning, and intensity extraction
of the cluster [333].
Next, a post-image analysis signal correction must be done to get
accurate base calling. Bioinformatics challenges here involve crosstalk correction caused by
overlapping dye emission frequencies, phasing correction caused by failed incorporation of a
nucleotide, and chastity ltering on mixed clusters [333]. The sequence reads are aligned to
the reference genome in processes which are described later.
1.4.3.1 Roche 454 Genome Sequencer
The Illumina technology described above has fairly low error rates but short reads.
technology results in long reads, but with considerable homopolymer problems.
454
In this
workow, amplicons are made by emulsion PCR using paramagnetic beads coated with DNA
primers [212].
The beads, which carry no more than one ssDNA molecule, are amplied
through rounds of thermocyling and transferred to picotiter plates and further enriched.
Sequencing-by-synthesis is done with pyrophosphate chemistry to produce optical signals
[281]. Advantages of this technique are its speed and read length of up to 500 bp [346]. This
is due to the lack of extra chemical steps such as removing a label moiety or deblocking
a terminator. Costs of reagents and errors in homopolymer regions are drawbacks of this
method.
1.4.3.2 Life Technologies SOLiD System
Illumina results in with longer reads than SOLiD and is more expensive to run with fewer
reads. In addition, SOLiD is more suited to SNP calling. SOLiD also uses emulsion PCR
with paramagnetic beads, and then xes those beads in a disorder array on a at glass
substrate [346]. The sequencing-by-synthesis method used in this technology is driven by
ligation [203]. Seven rounds of a ligation are used with ourescently labelled octamer probes
at the 8
th position. Since the rst two bases correlate with a unique ourescent colour, each
base is measured twice to allow identication of miscalls. Studies have shown that SOLiD
sequencing can characterize an entire genome with only 18
Ö
haploid coverage [217].
CHAPTER 1. INTRODUCTION
14
1.4.3.3 Single molecule sequencing
SMS platforms address some of the major drawbacks of other second generation sequencing
platforms. SMS increases read length, the number of DNA fragments that can be independently analyzed on a given surface area, and involves no costly cluster amplication step
[346].
The major challenge in this technology is the optical signal detection of a single-
molecule event. Some companies that have addressed or are trying to address this issue are
Helicos HeliScope, VisiGen, Pacic Biosciences, and Mobious Nexus I.
1.4.4
Third generation
Third generation sequencing involves sequencing single DNA molecules without the need
to halt between read steps (whether enzymatic or otherwise) [287]. This is in contrast to
second generation sequencing which works by indirectly determining the base incorporated
with either DNA polymerase or DNA ligase through uorescent of chemiluminescent optical
events. Working with large numbers of optical images is complex and costly. Consumables
for biochemical reactions in sequence interrogation are also a major expense.
There are
attempts being made to create the next generation of sequencing technology. Non-optical
microsopic imaging is one strategy attempting to take a high-resolution picture of a DNA
strand at the atomic level [310]. Nanopore is another technology that threads a DNA strand
through a pore electrophoretically and then reads the bases as they pass through the pore
opening [26].
Grapheen [262] and carbon nanotubes [5] are other techniques that are in
development to use electrophysical properties to sequence DNA.
1.5
RNA-seq
RNA-seq, or whole transcriptome shotgun sequencing, can be used to prole the transcriptome, the complete set of transcripts in a cell using deep-sequencing technologies. RNA-Seq
uses deep-sequencing technologies to analyze a transcriptome [325]. First a library of cDNA
fragments is generated from a population of RNA. Adaptors are attached to one or both ends
of the cDNA fragments. Each molecule, with or without amplication, is then sequenced in a
high-throughput manner to obtain short sequences from one end or both ends. The reads are
CHAPTER 1. INTRODUCTION
15
typically 30400 bp, depending on the DNA-sequencing technology used [325]. The cDNAs
are then sequenced in a high-throughput manner to obtain short sequences. The sequence
reads are aligned to the reference genome as either junction reads exonic reads or poly(A)
end reads.
Once high-quality reads have been obtained, the rst task of data analysis is to map the
short reads from RNA-Seq to the reference genome, or to assemble them into contigs before
aligning them to the genomic sequence to reveal transcription structure. There are several
programs for mapping reads to the genome, including ELAND, SOAP, MAQ, and RMAP33
[325].
Exonexon junctions can be identied by the presence of a specic sequence con-
text and conrmed by the low expression of intronic sequences, which are removed during
splicing [325]. For complex transcriptomes it is more dicult to map reads that span splice
junctions, due to extensive alternative splicing and trans-splicing. One partial solution is to
compile a junction library that contains all the known and predicted junction sequences and
map reads to this library [334, 226]. For large transcriptomes, alignment is also complicated
because reads match multiple locations in the genome. One solution is to assign these multimatched reads by proportionally assigning them based on the number of reads mapped to
their neighbouring unique sequences [226, 48].
1.5.1
Comparison to other methods
Various technologies have been developed to deduce and quantify the transcriptome, including hybridization- or sequence-based approaches.
In contrast to microarray methods,
sequence-based approaches directly determine the cDNA sequence [325]. RNA-Seq has very
low, if any, background signal because DNA sequences can been unambiguously mapped to
unique regions of the genome [325].
In addition, RNA-Seq does not have an upper limit
for quantication, which correlates with the number of sequences obtained. Consequently,
it has a large dynamic range of expression levels over which transcripts can be detected: a
greater than 9,000-fold range was estimated in a study [229], and a range spanning ve orders of magnitude was estimated in another [226]. RNA-Seq also provides a far more precise
measurement of levels of transcripts and their isoforms than other methods [325]. RNA-Seq
has also been shown to be highly accurate for quantifying expression levels, as determined
using quantitative PCR (qPCR)and spike-in RNA controls of known concentration [229, 226]
CHAPTER 1. INTRODUCTION
16
Finally, RNA-Seq also show high levels of reproducibility, for both technical and biological
replicates [325].
Sanger sequencing of cDNA or EST libraries is relatively low throughput, expensive and
generally not quantitative [174]. Tag-based methods such as Serial Analysis of Gene Expression (SAGE), Cap Analysis of Gene Expression (CAGE) and Massively Parallel Signature
Sequencing (MPSS) are high throughput and can provide precise, `digital' gene expression
levels [325]. However, they are also based on expensive Sanger sequencing technology, and
a signicant portion of the short tags cannot be uniquely mapped to the reference genome.
Also, only a portion of the transcript is analyzed and isoforms are generally indistinguishable
from each other [325]. DNA microarrays lack sensitivity for genes expressed either at low
or very high levels and therefore have a much smaller dynamic range (one-hundredfold to a
few-hundredfold) [325].
In general, RNA-Seq avoids limitations of other methods such as reliance upon existing
knowledge about genome sequence, high background levels owing to cross-hybridization,
and a limited dynamic range of detection owing to both background and saturation of signals [325].
1.6
Alignment
Next generation sequencing involves an alignment step. Alignment is the process of determining the most likely source within the genome sequence for an observed DNA sequencing
read [82]. It is one of the rst steps taken in a sequencing-based project in which a reference
genome assembly already exists.
As sequence capacity grows, algorithmic speed may be-
come a more important bottleneck. Running accurate alignment algorithms as a full search
of all possible places where the sequence may map is computationally infeasible. In general,
alignment programs using heuristic techniques in the rst step to quickly identify a small set
of places in the reference sequence where the location of the best mapping is most likely to
be found. Then, slower and more accurate alignment algorithms such as Smith-Waterman
are run on the limited subset. There are two fundamental technologies used in alignment
hash tablebased implementations, and Burrows Wheeler Transform based (BWT-based)
methods.
CHAPTER 1. INTRODUCTION
1.6.1
17
Hash based methods
DNA sequencing reads are extremely unlikely to contain every possible combination of nucleotides and very likely to contain duplicates. This type of dataset lends itself well to hash
tables. Hash tables are a common data structure that are able to index complex and nonsequential data in a way that facilitates rapid searching. The rst wave of alignment programs
specically designed for short-read alignment from next-generation sequencing machines was
based on a hash-table data structure to index and scan the sequence data. Hash-based algorithms build their hash table either on the set of input reads or on the reference genome.
There are advantages and disadvantages to each method. For example, hash tables of the
reference genome have a constant memory requirement for a given parameter set regardless
of the size of the input set of reads, which may be large, depending on the size and complexity
of the reference genome. Hash tables based on the set of input reads typically have smaller
and variable memory requirements based on the number and diversity of the input read
set but may use more processing time to scan the entire reference genome when there are
relatively few reads in the input set.
1.6.1.1 Software
Examples of tools using this approach, building a hash table of the input read sequences,
include MAQ [189], ELAND, SHRiMP [282], and ZOOM [193].
SOAP [190] is another
example which hashes the reference genome assembly [82].
The idea of a hash table can be traced back to BLAST [8]. This method follows a seed and
extgend paradigm, with each
k -mer
subsequence in the in a hash table. An improvement
to this method was the discovery that seeding non-consecutive matches improves sensitivity [202]. A seed allowing internal mismatches is called a spaced seed. Eland was the rst
to use these spaced seed as does SOAP. They allow a two-mismatch hit. MAQ extends this
to allow
k -mismatches.
Zoom uses manually constructed space seeds to enable detection of
up to 4 mismatch in 50-bp reads [202].
A potential problem with consecutive seed and spaced seed is they disallow gaps within the
seed [188]. A
q -gram approach [270],
requires that multiple spaced seeds per read match if a
CHAPTER 1. INTRODUCTION
18
region is to be considered a possible alignment. This provides a possible solution to building
an index natively allowing gaps. The
occurrence of a
query and the
of length
the
lter is based on the observation that at the
query string with at most
k
dierences (mismatches and gaps), the
w-long database substring share at least (w + 1) − (k + 1)q
q [35].
q -gram
w-long
q -gram
common substrings
The former category initiates seed extension from one long seed match, while
approach initiates extension usually with multiple relatively short seed matches.
An example usage of this method is SHRiMP [282]. BLAT [146] and SSAHA2 [240], which
are used as capillary read aligners, also use this method [157].
1.6.1.2 MAQ
The Mapping and Alignment with Qualities algorithm (MAQ), was one of the rst methods to work with short-read lengths [189].
Maq is a popular aligner that is among the
fastest competing open source tools for aligning millions of Illumina reads to the human
genome [168].
MAQ considers base quality scores during sequence alignment, which helps to address the
variable quality of sequence across a read [157]. Second, it assigns a mapping quality score
to quantify the algorithm's condence that a read was correctly placed. MAQ also makes
use of read pairing information in paired-end libraries to improve mapping accuracy and
identify aberrantly-mapped pairs.
1.6.2
BurrowsWheeler Transformation Methods
The inexact matching problem can be reduced to identifying exact matches and building
inexact alignment supported by exact matches [188]. These methods typically use the Fulltext Minute-space (FM) index data structure, which introduced the concept that a sux
array is much more ecient if it is created from the Burrows-Wheeler Transform (BWT)
sequence, rather than from the original sequence [81]. The FM index retains the sux array's
ability for rapid subsequence search and, for mammalian genomes, is often the same size or
smaller than the input genome size [101]. Creating the underlying data structure requires
two steps. In the rst step, the sequence order of the reference genome is modied using the
BWT, a reversible process that reorders the genome such that sequences that exist multiple
CHAPTER 1. INTRODUCTION
19
times appear together in the data structure. Next, the nal index is created; it is then used
for rapid read placement on the genome. The creation of the nal index may be a memoryintensive step, although methods exist to create the index in relatively little memory at the
cost of more processing time [139]. The BWT has been commonly used in which rst create
an ecient index of the reference genome assembly in a way that facilitates rapid searching
in a low-memory footprint.
[82]
1.6.2.1 Software
There are at least three aligners, Bowtie [168], BWA [187] and SOAP2 [191] that have
leveraged the BWT algorithm. This algorithm provide to dramatically decreased alignment
time. They are capable of mapping a single lane of Illumina data (20 million reads) in a
matter of hours, compared to the several days required by MAQ [331].
1.6.2.2 Bowtie
Bowtie uses a dierent and novel indexing strategy to create an ultrafast, memory-ecient
short read aligner, geared toward mammalian re-sequencing [168]. It employs a BWT index
based on the FM index, which has a memory footprint of only about 1.3 gigabytes (GB)
for the human genome [168]. Bowtie can align reads as short as four bases and as long as
1,024 bases [168]. The input to a single run of Bowtie may comprise a mixture of reads with
Ö
dierent lengths. Bowtie has been used to align 14.3
coverage worth of human Illumina
reads from the 1,000 Genomes project in about 14 hours on a single desktop computer with
four processor cores [168]. Bowtie aligns Illumina reads to the human genome at a rate of
over 25 million reads per hour [168].
Bowtie makes a number of compromises to achieve this speed. If one or more exact matches
exist for a read, then Bowtie is guaranteed to report one, but if the best match is an inexact
one then Bowtie is not guaranteed in all cases to nd the highest quality alignment. With
its highest performance settings, Bowtie may fail to align a small number of reads with valid
alignments, if those reads have multiple mismatches. If the stronger guarantees are desired,
Bowtie supports options that increase accuracy at the cost of some performance [168].
CHAPTER 1. INTRODUCTION
20
With its default options, Bowtie's sensitivity measured in terms of reads aligned is equal to
SOAP's and somewhat less than MAQ's. There are options to allow increased sensitivity
at the cost of greater running time, and to enable Bowtie to report multiple hits for a read.
Bowtie has been found to align 35 bp reads at a rate of more than 25 million reads per CPUhour, which is more than 35 times faster than Maq and 300 times faster than SOAP under
the same conditions [82]. Also, unlike SOAP, Bowtie's 1.3 GB memory footprint allows it to
run on a typical PC with 2 GB of RAM [168].
1.7
1.7.1
Epigenetics
What is epigenetics?
Epigenetics is the study of heritable changes in genome function that occur without changing
the underlying DNA sequence. Like the key signatures, phrasing and dynamics on a score of
sheet music [266] that show how the keys in a melody should be played, so to do epigenetic
changes add multidimensional layers to the readout of DNA.
1.7.2
How important is epigenetics in normal development?
Epigenetics plays a role in normal development [58].
It is involved when cells specialize
in complex multi-cellular organisms developed from a fertilized egg. Interesting studies on
epigenetics include those of twins. Identical twins share the same DNA sequence and have
similar phenotypes, but they do not have complete phenotypic identity. These phenotypic
dierences are likely imparted by epigenetic modications that occur over a lifetime.
In
a study of 80 pairs of identical twins ranging in age, epigenetic dierences were hardly
detectable in the youngest twins, but increased with age. The number of genes that dier
in activity between 50-year-old twins was more than three times that in pairs three year
old twins [86]. Also, epigenetic changes explain how simply altering the diet of a pregnant
mouse can change the coat colour of her pups [327], or even alter their response to stress
[328].
CHAPTER 1. INTRODUCTION
1.7.3
21
What role does epigenetics play in cancer?
Epigenetic modication can play an important role in the steps of tumourogenesis [123].
Some epigenetic processes silence key regulatory genes. When this silencing become disregulated it can result in diseased states.
Epigenetic abnormalities in cancer aberrations in
cancer comprise virtually every component of chromatin involved in packaging the human
genome [129]. These epigenetic modications are mitotically heritable and can thus play the
same roles and undergo the same selective processes as genetic alterations. In fact, epigenetic
events can occur at a much more increased rate compared to mutations in somatic cells.
1.7.4
How do epigenetic factors exert phenotypic change?
One example is the methylation of CpG islands in the promoter regions of gene [279]. This
condenses the DNA to heterochromatin and can hide transcription factor binding sites or
inuence polymerase progression, thus silencing those genes. DNA is not naked in eukaryotes, a complex of proteins interact with chromatin.
DNA is spooled around nuclosomal
units consisting of eight histones (two H2A, H2B, H3 and H4 histones) around which 147
base pairs of DNA are wrapped in 1.75 superhelical turns [200]. This close proximity of the
histones to the DNA allows for changes in the histones to aect how the DNA is accessed
and/or processed. These include posttranslational histone modifcations, energy-dependent
chromatin-remodeling, exchanging of histones with variants, and targeting of small noncoding RNAs [260].
1.7.5
How permanent are the changes?
There are many modications and chromatin changes that are reversible. These transitory
changes are unlikely to be passed along to the germline. These marks change the chromatin
template in response to various stimuli [127]. Other epigenetic modications can be stable
through several cell divisions. These include methylated DNA regions, altered nucleosome
structures, and some histone modications.
CHAPTER 1. INTRODUCTION
1.7.6
22
What role does the nucleosome play?
The core histone proteins that make up the nucleosome are highly basic.
globular domain which has pretruding exible histone tails.
They have a
Histone proteins, including
their tails, are highly conserved from yeast to humans, which indicates they have critical
functions [144].
1.7.7
What are the types of histone modications?
Many types of histone modications have been identied.
They include histone acetyla-
tion, phosphorylation, ubiquitination, sumoylation, ADP-ribosylation, biotinylation, proline
isomerization, and histone methylation [314]. In addition variant proteins of H2A and H3
could be substituted. The arrangment of these nucleosomes on the DNA is altered either by
cis -eects or trans -eects. Cis -eects occur due to changes in the physical properties of covalently modied histone tails. Trans -eects occur via recruitement of modifcation-binding
partners to the chromatin. This allows for context-dependent reading of a particular covalent
histone mark.
1.7.7.1 Histone acetylation
Histone acetylation neutralizes the positive charge on the histones and decreases the interaction of the N termini of histones with the negatively charged phosphate groups of DNA. This
generates an expansion of the chromatin ber allowing better access of the transcriptional
machinery.
Histone Acetyl Transferase (HAT) and Histone Deacetylase (HDAC) serve to
regulate these histone marks. There is evidence that histone H3 acetylation and H3 lysine 4
methylation, are functionally linked [239].
1.7.7.2 Histone phosphorylation
The four core histones, histone variants, and H1 histones, are phosphorylated on both the
amino-terminal and carboxy-terminal portions of the histones [116].
In general, histone
phosphorylation may disrupt chromatin structure and allows for the recruitment or occlusion
CHAPTER 1. INTRODUCTION
of non-histone chromosomal proteins to chromatin [265].
23
Linker histone H1 proteins are
believed to promote the higher-order packaging of DNA by shielding the negative charge of
linker DNA between adjacent nucleosomes. Histone H1 phosphorylation aects chromatin
condensation and function.
Phosphorylation of H1 increases the protein's mobility in the
nucleus and weakens its interaction with chromatin [181].
It is thought that site-specic
interphase H1 phosphorylation facilitates transcription by RNA polymerases I and II [344].
There is evidence that phosphorylation of histone H3 at threonine 6 by protein kinase C
beta I prevents LSD1 from demethylating H3K4 [220].
1.7.7.3 Histone ubiquitination
H2A, H2B, H3 and their variant forms are ubiquitinated [56].
a reversible modication.
Histone ubiquitination is
Attachment of a chain of ubiquitin monomers is a prerequisite
for the selective degradation of intracellular proteins by the ubiquitin-dependent proteolytic
pathway. H2B ubiquitination may disrupt chromatin structure exposing H3K4 to Set1 [306]
1.7.7.4 Histone methylation
Histone methylation does not alter the charge of the histone tail but instead inuences the
basicity, hydrophobicity, and the anity of certain molecules such as transcription factors
toward DNA [343]. There are two general classes of methylating enzymes, Protein Arganine
MethylTransferase (PRMT) and Histone Lysine Methyl Transferase (HKMT). Methylation
of histones was previously though to be a permanent mark on chromatin [161]. This was
based partly on the 30-year old reports that methylated lysines seemed to have the same
half-life as histones [15]. It was previously though a histone swap for a variant would be the
only way methylated lysines could be removed. A variant Histone H3.3 could replace H3,
essentially replacing the canonical histone H3 with one that had dierent epigenetics modications [111]. While these marks are stable, it is now known they are reversible enzymatically.
Arginine methylation is can be removed by deiminases which convert methyl-arginine to citrulline. Methylated lysine residues appears to be more stable but still removable. Lysine
methylation can be present in mono-, di-, or tri-methylated states.
CHAPTER 1. INTRODUCTION
1.7.8
24
H3K4me1
1.7.8.1 Mono-, di- and tri-methylation
All three histone methylation states are found in an elevated state surrounding the TSSs of
know genes and are correlated with gene activation [16]. The monomethylation peaks are
more disperse though, on average. H3K4me1 peaks are found 900 kb upstream of the TSS,
as opposed to 500 kb for H3K4me2, and 300 kb for H3K4me3 [16]. All three states of H3K4
methylation are also highly enriched at insulators [16]. High levels of H3K4me1 with low
levels of H3K4me3 were found to be a signature predicting enhancers in HeLa [110]. Though
there are many epigenetic modications that act together to aect transcription, a study
claims that H3K4me1 may be at top of causal relationship chain [339]. Active genes were
previously found to associated with the mono- and tri-methylation of H3K4 [324].
1.7.8.2 Bimodal locii
Studies done by Robertson et al. have studied the spatial distribution of H3K4me1 around
TFBS. Bimodal H3K4me1 proles were found, with peaks of H3K4me1 enrichment on either
side of the indicated sites, such as transcription factor binding sites [278].
Genes with
associated bimodal loci had been found to have signicantly higher expression than genes
with associated monomodal or low H3K4me1 loci [115].
1.7.9
Histone methyltransferases and histone demethylases
Early studies of histones and methylated lysine residues demonstrated similar half-lives,
which was interpreted as evidence of histone lysine methylation as an irreversible event.
Evidence for the turnover of methyl groups arose.
The putative mechanisms included
demethylaes, histone replacement and clipping [321].
There are multiple histone methyl-
transferases (HMTs) and histone demethylases (HDMs) involved in H3K4 methylation [122].
Enzymes that methylate H3K4 include Mll1-4, Set1a/b, Ash2L (H3K4me2/3 only), Set7/9,
Meisetz, Smyd1/Bop1, Smyd3, and Whistle. Enzymes that demethyate H3K4 include Lsd1,
Jhdm1a/b, Jarid1a/Rbp2, and Jarid1b/c/d.
CHAPTER 1. INTRODUCTION
25
1.7.9.1 LSD1
LSD1 is a gene which codes a avin-dependent monoamine oxidase. It catalyses demethylation at distinct lysine residues in histone H3K4me1/2, but cannot aect H3K4me3 due
to it's lack of protonated nitrogen [293]. As a component of co-repressor complexes, LSD1
contributes to target gene repression by removing mono- and dimethyl marks from lysine 4
of histone H3 (H3K4) [220]. LSD1 is a avin-containing amine oxidase [312]. LSD1 catalyses both HDAC and a histone lysine demethylase [7] and HDAC inhibitors diminish H3K4
demethylation by LSD1
in vitro
[177].
The transcriptional activation complex that LSD1 is part of includes MLL1. This suggests
the balance between methylated and unmethylated H3K4 is important to transcriptional
regulation [231]. In addition, CoREST enhances the ability of LSD1 to reverse methylation
and protects LSD1 from proteasomal degregation
in vivo
[175].
A possible mechanism is
that CoREST binds to LSD1 and tethers it to the nucleosome, bringing the amine oxidase
domain close to the H3 tail [175]. A study has proposed a mechanism by which DNA binding
of CoREST facilitates the histone demethylation of nucleosomes by LSD1 [337]. CoREST
is necessary to make LSD1 able to act on intact nucleosomal particles and CoREST-bound
LSD1 exhibits a 2-fold increase in the rate of catalysis [85].
1.7.9.2 MLL1
The mixed lineage leukemia protein-1 (MLL1) is a member of the SET1 family of H3K4
methyltransferases.
MLL1, methyltransferase was in a transcriptional activation complex
that includes LSD1. This may be an indication that a functional interplay between histone
methyltransferases and histone demethylases may be what ultimately denes the transcriptional states of the targeted genes. MLL1 has been shown to interact with RNAPII [222].
1.7.10
Smyd
The SMYD protein family consists of ve proteins SMYD15 (SET- and MYND-containing
protein).
Smyd1, Smyd2, and Smyd3 have activity on H3K4 methylation [122, 43].
The
SET domain in Smyd2 is required for the methylation at H3K4 [2]. Also, it was found an
CHAPTER 1. INTRODUCTION
26
interaction of SMYD2 with HSP90α enhances SMYD2 histone methyltransferase activity
and specicity for H3K4
1.7.11
in vitro
[2].
Whistle
WHISTLE (WHSC1-like 1 isoform 9 with methyltransferase activity to lysine) methylates
histone H3K4 and H3K27 residues [152]. There have been studies that show that WHISTLE
can induce apoptotic cell death through caspase-3 activation and that HMTase activity is
important for the apoptosis induction [151].
and
in vivo
WHISTLE interacts with HDAC1
in vitro
that the recruitment of the HDAC1 is involved in the WHISTLE-mediated
transcriptional repression [151].
1.7.12
JHDM
The JHDM (JmjC domain-containing histone demethylase) [153] is conserved in various
oraganisms and predicted to be a metalloenyme catalytic motif [47].
There are multiple
members of this family. JHDM1 demethylates H3K36, JHDM2 demethylates H3K9, JHDM3
demethylates H3K9 and H3K36, and JARID1 demethylates H3K4 [122].
This class of enzymes catalyzes the removal of methylation by using a hydroxylation reaction
and required iron and
α-ketoglutarate
as cofactors. JARID1B is one of the four members
of the JARID1 protein family. All four members of this family have recently been shown
to possess H3K4 demethylase activity [176, 126, 43, 154].
Overexpression of JARID1B
resulted in loss of tri-, di-, and monomethyl H3K4 but did not aect other histone lysine
methylations [122]. JARID1B can catalyze the removal of all three methyl groups from the
H3K4 lysine residue.
JARID1B, also known as PLU-1, was shown to be up-regulated in
breast cancer and probably involved in breast cancer development [122, 336].
1.8
Transcription Regulation
Unravelling the mechanisms that regulate gene expression is a major challenge in biology.
Eukaryotic protein coding genes are transcribed by RNA polymerase II, however the basal
CHAPTER 1. INTRODUCTION
27
transcription is tightly regulated by complex processes involving chromatin modifying proteins, transcription factors (TF), co-factors and RNA polymerase [326]. This rate varies for
each TF binding model and is inuenced by model parameters, but the application of most
models with standard settings will report TFBSs in the range of 1/5001/5000 bp [326].
An important task in this challenge is to identify regulatory elements and the conserved
regions of DNA called motifs.
Recent advances in genome sequence availability and in
high-throughput gene expression analysis technologies have allowed for the development of
computational methods for motif nding [55]. TFs have distinct preferences towards specic
target sequences. Given a set of known binding sites, it is possible to construct a model to
describe the target sequence properties that can be used to predict potential binding sites
in genomic sequences [326].
These DNA motifs are of important biological signicance. Normally, the pattern is fairly
short (5 to 20 bp long) and is known to recur in dierent genes or several times within
a gene [55]. Sequences could have zero, one, or multiple copies of a motif. They can form
patters such as palindromic motifs or spaced dyad motifs. Spaced dyads are motifs consisting
of two short conserved boxes separated by a region of xed size and variable content.
1.8.1
Popular TF binding sites programs
Dening the transcription factor binding site can help elucidate the transcriptional machinery of the cell. The goal of motif nding is to detect novel, over-represented unknown
signals in a set of sequences [272]. Existing motif nding approaches can be classied into
two main categories for representing the consensus DNA pattern, probabilistic or mismatch
representation [70].
1.8.2
Mismatch representation
Patterns can be used to dene a signal to be a consensus pattern and allow up to a certain number of mismatches to occur in each instance of the pattern [55].
This is called
mismatch representation. The goal of these algorithms is to recover the consensus pattern
with the most signicant number of instances, given a certain background model.
These
CHAPTER 1. INTRODUCTION
28
methods view the representation of the signals as discrete and rely on exhaustive enumeration [55]. These algorithms is that guarantee that the highest scoring pattern will be the
global optimum for any scoring function, however, consensus patterns are not as expressive of the DNA signal as prole representations. Recent approaches within this framework
include Projection methods [31], string based methods [257], Pattern-Branching [263], and
MULTIPROFILER [145].
1.8.3
Probabilistic
A generative probabilistic representation of the nucleotide positions can be used to discover a
consensus DNA pattern that maximizes the information content score [272]. In this method,
nding the best consensus pattern is done by nding the global maximum of a continuous
non-convex function. Algorithms in this category perform stochastic optimization or greedy
searches [70]. The main advantage of this approach is that the generated proles are highly
representative of the signals being determined [272].
The disadvantage, however, is that
nding global maximum of any continuous non-convex function is a challenging problem and
thus the best motif may not be the one found but the nearest local optimum instead [64].
Gibbs sampling [172], MEME [13], Weeder [249], greedy CONSENSUS algorithm [113] and
HMM based methods [65] belong use this method.
1.8.4
Expectation Maximization
Expectation-Maximization is an iterative procedure to maximize the likelihood of a probabilistic model with regard to given data. The algorithm starts with an initial guess as to
the location and size of the site of interest in each of the sequences [228]. These parts of the
sequence are aligned and this provides an estimate of the base or amino acid composition
of each column in the site. The binding sites are modelled as a Position Frequency Matrix
(PFM).
There is a background genomic sequence and the embedded binding site which have dierent
statistical properties [228]. Through multiple iterations involving calculating the probability
of each sequence for all possible choices of the binding site, the binding site is rened [228].
Convergence is achieved when the values of the predicted binding site probabilities no longer
CHAPTER 1. INTRODUCTION
29
change [164].
1.8.4.1 MEME
MEME is an example of the expectation maximization algorithm and can be used to search
for novel new transcription factor binding sites in sets of biological sequences.
MEME
searches for repeated, sequence patterns that occur in the DNA [173], including sites that
may include gaps [36].
MEME is widely used, however, there are newer programs that surpass MEME in certain
aspects.
For example, it was suggested that MEME is too conservative and could miss
discovering motifs [230]. Also, in a study of 13 motif nding tools Weeder outperformed the
other tools [313]. This may be due to the 'cautious mode' Weeder was run in, allowing only
the strongest motifs to be reported. This mode would be most useful if a search was done
with the knowledge that there was at most one motif of interest in the sequence.
1.8.5
TF binding databases
There are various databases that catalogue these transcription factors to be used in further
studies. These databases can be used to correlate regions in the genome with transcription
factor binding sites.
1.8.5.1 OregAnno
ORegAnno is an open-source, open-access database and literature curation system for communitybased annotation of experimentally identied DNA regulatory regions, transcription factor
binding sites and regulatory variants [99].
A regular user can add individual annotations
of promoters, transcription factor binding sites and regulatory mutations to the database.
These data are validated by cross-referencing against PubMed [332], Entrez Gene [206], dbSNP [292], the NCBI Taxonomy database [332] and EnsEMBL [121]. Once submitted, an
XML representation is scored by a validators who conrm the reliability of annotation from
literature.
CHAPTER 1. INTRODUCTION
30
Each annotation species an evidence type, subtype and class describing the biological technique cited to discover the regulatory sequence. Evidence classes are broken into two categories: the `regulator' classes describe evidence for the specic protein that bind a site.
The `regulatory site' classes describe evidence for the function of a regulatory sequence itself. These two categories are further divided into three levels of regulation (transcription,
transcript stability, and translation).
The experimental evidence is optionally associated
to a specic cell type using the eVOC cell type ontology [185].
Each transcription factor
binding site or regulatory mutation must specify a target transcription factor which is either
user-dened, in Entrez Gene or in EnsEMBL, or classied as `unknown'.
1.8.5.2 JASPER
Position-specic scoring matrices are the preferred models for representation of transcription
factor binding specicity.
In addition, JASPAR is an open-access database of annotated,
high-quality, matrix-based transcription factor binding site proles for multicellular eukaryotes. These proles were derived exclusively from sets of nucleotide sequences experimentally
demonstrated to bind transcription factors [283].
1.8.5.3 TRANSFAC
The TRANScription FACtor database (TRANSFAC) models the interaction of eukaryotic
transcription factors with their DNA-binding sites and how this aects gene expression. At
its core are the three tables: Factor, Site, and Gene.
A link between the factor table and the site table indicates the binding interaction. Experimental evidence for this interaction and the cell from which the factor was derived is given
in the site entry. On the basis of the method and cell, a quality value is iven to describe the
condence with which a binding activity could be assigned to a specic factor [215]. When
a number of binding sites have been collected for a factor, the site sequences are aligned
to create nucleotide distribution matrices.
These matrices are used by the tool Match to
nd potential binding sites in uncharacterized sequences, while Patch, another tool, uses the
single sites stored in the site table.
CHAPTER 1. INTRODUCTION
31
The Gene table connects information of TRANSFAC, TRANSCompel, HumanPSDTM,
S/MARtDBTM, or TRANSPATH. Gene entries serve as major linking source to a growing number of external databases.
Public versions of TRANSFAC and the above men-
tioned programs are freely accessible for research groups from non-prot organizations at
http://www.gene-regulation.com.
The professional version of TRANSFAC, is available at
http://www.biobase-international.com [215].
1.8.6
Interpretation of motif-nder output
Motif-discovery is often one of the rst steps performed during computational analysis of
gene-regulation. For instance, researchers often wish to discover over-represented motifs that
are common to sets of genes with similar expression patterns. Interpretation of the output
from motif-nders is a challenge.
Many distinct motifs may be reported with little or no
indication as to whether each may potentially possesses regulatory function.
A tool that
can assess similarity between novel, computationally identied motifs and the known motifs
stored in the databases would be necessary for interpretation [207].
1.8.6.1 STAMP
STAMP is a web server that is designed to support the study of DNA-binding motifs. It is
used to query motifs against databases of known motifs. The software aligns input motifs
against the chosen database, and lists of the highest-scoring matches are returned.
Such
similarity-search functionality is expected to facilitate the identication of transcription factors that potentially interact with newly discovered motifs [207].
This resource is exible in format of data it inputs.
Motifs may be input as frequency
matrices, consensus sequences, or alignments of known binding sites. STAMP also directly
accepts the output les from 12 supported motif-nders, enabling quick interpretation of
motif-discovery analyses [207].
STAMP automatically builds multiple alignments, familial binding proles and similarity
trees when more than one motif is input. These functions are expected to enable evolutionary
studies on sets of related motifs and xed-order regulatory modules, as well as illustrating
CHAPTER 1. INTRODUCTION
32
similarities and redundancies within the input motif collection [207].
STAMP's functionality is essentially pairwise comparison of motifs. In general, two motifs
can be aligned using NeedlemanWunsch [235] (global) or SmithWaterman [298] (local)
alignment methods [207]. Alignment algorithms require a distance metric. There are ve
supported distance metrics: (i) Pearson's correlation coecient [258], (ii) KullbackLeibler
information content [280], b sum of squared distances [284], (iv) average log-likelihood ratio
(ALLR) [323] and (v) ALLR with a lower limit of 2 imposed on the score [207].
This
algorithm avoids length biases when comparing motifs of dierent lengths, using the method
of Sandelin and Wasserman for the calculation of empirical
p-values
based on simulated
PSSM models [284].
1.9
Functional analysis
When using a high throughput technique that allows you to monitor the expression of tens of
thousands of genes, you need an automated method to extract meaningful information from
the large amount of data that results [150]. This section describes the common challenge
of translating such lists of dierentially regulated genes into a better understanding of the
underlying biological phenomenon.
The output of RNA-seq experiments are often a list of dierentially expressed genes.
An
automatic ontological analysis approach can help with the biological interpretation of such
results.
Currently, this approach is the de facto standard for the secondary analysis of
high throughput experiments and a large number of tools have been developed for this
purpose [150].
This type of analysis may have drawbacks. For instance, experimentally derived gene lists
have substantially more annotation associated with them, as they have been researched upon
for a longer period of time. This annotation bias, a result of patterns of research activity
within the biomedical community, is a major problem for classical hypergeometric test-based
ORA approaches, which cannot account for such bias [180].
The need to formalize this interpretation process has led to the development of a range of
CHAPTER 1. INTRODUCTION
33
tools, of which a family of statistical methods collectively known as over-representation analysis is becoming increasingly popular among researchers undertaking microarray analysis.
The fundamental question asked by ORA is: what biological terms or functional categories
are represented in the gene list more often than expected by chance [180].
Multiple database are useful for the functional analysis.
GO is the primary resource for
annotating gene groups to three types of knowledge: cell components, molecular functions,
and biological processes [9].
The KEGG database provides functional annotations for metabolic and information processing pathways, cellular processes, human diseases and drug development data [134].
Reactome is a mammalian-specic pathway database with thorough annotations of numerous
well-studied biological processes, ranging from intermediary metabolism to signal transduction to cell cycle and apoptosis
1.9.1
[62].
DAVID
There are also web-based tools that amalgamate the output of such tools.
DAVID is the
Database for Annotation, Visualization and Integrated Discovery, is one such tool. This provides mainly batch annotation and Gene Ontology (GO) term enrichment analysis. Other resources provided include protein-protein interactions, protein functional domains, disease associations, bio-pathways, sequence general features, homologies, gene functional summaries,
and gene tissue expressions [120].
Functional enrichment tests are used to interpret biological meanings of a gene list. Such
statistical tests are performed on the functional categories of the gene lists. A hypergeometric
test is used to test the enrichment of genes belonging to a given category in the identied
gene list versus the genome [72].
DAVID uses various methods of multiple testing correction techniques including Bonferroni,
Benjamini, and FDR. In addition, DAVID gives the option of using an EASE score (Expression Analysis Systematic Explorer) to quantify overall enrichment of gene groups. The
EASE score is a modied Fisher's exact test.
control family-wide false discovery rate [150].
It globally corrects enrichment
p-values
to
CHAPTER 1. INTRODUCTION
1.9.2
34
g:Proler
g:Proler is a web-based toolset for functional proling of gene lists from large-scale experiments [276]. Primary input can be a list of genes, proteins, or probe identiers. It supports
many ID types and even mixing of arbitrary ID types [276].
The purpose of g:Proler
is to nd common high-level knowledge such as pathways, biological processes, molecular
functions, subcellular localizations, or shared TFBSs to the list of input genes. The data
used in g:Proler is derived from the Gene Ontology [9], KEGG [134], Reactome [62] and
TRANSFAC [215] databases [276].
GO is a structured vocabulary in a form of a directed acyclic graph. The results from GO and
other relevant biological databases are presented in either tree-like top-down order, grouped
by domains, or ranked by statistical signicance. The GO-structure-preserving visualization
captures the hierarchical relationships between signicantly enriched categories. Hierarchical
relations hold within GO. Vocabulary terms are related to one or several more general `parent'
terms. Any term automatically involves all terms below via all relational paths. Therefore,
genes annotated to a specic term in g:Proler are also added to all associated `parents', and
the proling is performed at all hierarchical levels simultaneously. g:Proler strips out GO
annotations that apply the `NOT' qualier. A visualization technique called gene-to-term
mapping shows a coloured box if there is an association with a term in question. Furthermore,
the colour coding used correlates to dierent types of evidence in heatmap style.
g:Proler uses cumulative hypergeometric
p-values
to identify the most signicant terms
corresponding to the input set of genes. Unlike most of the common proling tools, g:Proler
supports annotations of descendants according to the ``True Path Rule'' [53].
A crucial factor in functional proling is the estimation of statistical signicance due to
multiple testing against many categories if the specic functional category was not selected
a priori [150]. Multiple testing corrections can broadly be split into two groups. Family-Wise
Error Rates (FWER) such as Bonferroni, or Sidak, measure the chance of at least one falsepositive match. Functional proling provides testing against hundreds to thousands of terms,
and such approaches become rather conservative, especially as tests are not independent
due to the hierarchical structure of GO. These tests do not apply for heavily overlapping
functional classications from GO.
CHAPTER 1. INTRODUCTION
35
A more liberal group of corrections, false discovery rates (FDR), measure the proportion
of false discoveries in a multi-test experiment and gain a test-wide threshold by ranking
observed
p-values
and comparing their relative rank to individual test thresholds [17]. FDR
approaches are more promising, since some versions also allow partial dependencies in input
data [17].
g:Proler also has an option fo g:SCS (Set Counts and Sizes) by default.
This is a novel
method to estimate thresholds in complex and structured functional proling data such as
GO, pathways and TFBS, where statistical signicance is determined from set intersections in
2
× 2 contingency tables.
g:SCS has been claimed to be superior to standard multiple testing
methods, since it takes into account the actual structure behind functional annotations [276].
1.10
Summary of research
The research in done in this thesis aimed to elucidate eects of an epigenetic modication,
H3K4me1. This histone modication was studied in multiple breast cancer cell lines. Functional groups were used based on a comparison of breast cancer subtypes, or tumourigenic
vs. non-tumourigenic matched controls. This was to look for the involvement of this mark in
cancer gene regulation. We formed this hypothesis based on previous evidence [115] where
regions formed by anking H3K4me1 sites where found to be enriched for TF binding sites.
RNA-seq was used to determine the expression levels of genes downstream of these valley
regions. The functional groups were used to correlated uniquely marked valleys with overexpression. A motif analysis was done on the valley sequences using MEME and STAMP
to yield putative transcription factor binding sites. The purpose of this experiment was to
look for known and putative tumour suppressors and oncogenic factors.
Chapter 2
Materials and Methods
2.1
2.1.1
Cell lines
Framentation methods
These cell lines were prepared either with sonication or using micrococcal nuclease to fragment the DNA [253]. Sonication is generally believed to create randomly sized DNA fragments, with no section of the genome being preferentially cleaved. The fragments created
by sonicating, are on average 500700 base pairs, are typically larger than those created
via enzymatic cleavage [61]. Sonication tends to break DNA segments across the fault lines
which dene nucleosome boundaries [197].
Enzymatic cleavage, in contrast, will not produce random sections of chromatin. Miccrococcal nuclease favors certain areas of genome sequence over others and will not digest DNA
evenly or equally [74]. When using micrococcal nuclease MNase is the enzyme that catalyzes
the endonucleolytic cleavage of DNA. In contrast to sonication, MNase treated chromatin
preparations show highly homogenous lengths [179]. Also, enzymatic digestion of chromatin
is milder than sonication and better preserves the integrity of the chromatin and antibody
epitopes, which means increased IP eciency [87].
36
CHAPTER 2. MATERIALS AND METHODS
2.1.2
37
Immunohistochemical properties
These cell lines represent dierent breast cancer subtypes which result in diering immunohistochemical properties. Steroid receptors are useful to predicting outcome and response
to therapy of breast cancer. Also, they help predict the relevance of cell line experiments
in breast tissues of dierent types. Immunohistochemical markers with clinical importance
include amplication of HER2 [296]. Also, changes in EGFR a tyrosine kinase receptor that
is expressed in normal breast [68].
2.1.3
Cell lines used
Cell lines were used as experimental resources to in this study. Cell lines used were MCF-7,
BT-549, T-47D, MDA-MB-231, and Hs578T. These cell lines are widely studied and retain
DNA mismatch repair activity. Defects in this process would result in an approximate 20-fold
increase in obfuscating background mutations.
2.1.3.1 MCF7
MCF-7 is a luminal cell line that was derived from a pleural eusion from a 69-year-old
woman who underwent two mastectomies in a ve year span [302].
These cells show low
motility and are not metastatic [49]. It cells express E-cadherin, epidermal growth factor
receptor, estrogen receptor, and progesterone receptor. MCF-7 cells express full-length functional BRCA1 [49]. The media used for this cell line was RPMI1640 + 10%FBS +1% L-Gln
+1% Pen/Strep. Sonication was used to break up the DNA.
2.1.3.2 T47D
T47D was a luminal cell line obtained from the pleural eusion of a 54-year-old woman with
intrating ductal carinoma [148]. T47D cells carry receptors for a variety of steroids and
calcitonin. They express mutant tumour suppressor protein p53 protein. The progesterone
receptor (PR) is expressed constitutively and these cells are responsive to estrogen. They
are able to lose the ER
during long-term estrogen deprivation in vitro [132]. As a result,
CHAPTER 2. MATERIALS AND METHODS
38
sometimes these cells are use as a model for studies of drug resistance to tamoxifen in patients
with mutant p53 breast tumours. The cells are also HER2 positive. There is no evidence of
BRCA1 mutations in this cell line [162]. The media used for this cell line was RPMI1640 +
10%FBS +1% L-Gln +1% Pen/Strep. Sonication was used to break up the DNA.
2.1.3.3 BT549
BT-549 is a basal breast cancer cell line that was derived from a papillary, invasive ductal
tumour of a 72 year-old woman that had metastasized to 3 of 7 regional lymph nodes [170].
BT-474 is ER, PR, and HER2 negative [142]. There is no evidence of BRCA1 mutations in
this cell line [162]. The media used for this cell line was RPMI1640 + 10% FBS +1% L-Gln
+1% Pen/Strep. Sonication uses was used to break up the DNA.
2.1.3.4 MDA-MB-231
MDA-MB-231 is a basal cell line that was obtained from a pleural eusion of a 51-year-old
female [34]. MDA-MB-231 expresses very low levels of both ER and PR and is categorized
as HR-negative, with HER-2/neu did not produce a statistically signicant change in HR
levels [88]. There is no evidence of BRCA1 mutations in this cell line [162]. The media used
for this cell line was RPMI1640 + 10%FBS +1% L-Gln +1% Pen/Strep. Sonication uses
was used to break up the DNA.
2.1.3.5 HS578T
Hs578T was derived from a carcinosarcoma and was epithelial, aneuploid, and lacks estrogenreceptor protein.
It was a basal cell line that was taken from a 74-year-old woman with
invasive ductal carcinoma [105]. The breast tissue it was derived from was excised at surgery
and showed an inltrating ductal carcinoma. Hs578T cells are ER and PR negative, lack
estrogen receptor, E-cadherin, and have low HER2/neu expression. There is no evidence of
BRCA1 mutations in this cell line [162]. The media used for this cell line was RPMI 1640
+ 10% FBS + 1% L-Glutamine + 1% Penicillin/Streptomycin. Sonication uses was used to
break up the DNA.
CHAPTER 2. MATERIALS AND METHODS
39
2.1.3.6 HS578Bst
Hs578Bst was diploid and possibly of myoepithelial origin. It was a basal cell line that was
taken from normal tissue distal to the region Hs578T and was removed from (in the same
patient, in the same breast) and no tumour cells were identied in it. This made it a good
control from Hs578T [105]. These cells are ER, PR and HER2, negative. The media used
for this cell line was ATCC Hybri-Care Medium, Catalog No. 46-X. Hybri-Care Medium.
This was supplied as a powder and was reconstituted in 1 L cell-culture-grade water and
supplemented with 1.5 g/L sodium bicarbonate. To make the complete growth medium the
following components were added 30 ng/ml mouse Epidermal Growth Factor (EGF) and
fetal bovine serum to a nal concentration of 10%. Enzymatic Digestion with Miccrococcal
Nuclease (MNase) was used to cleave DNA into smaller fragments.
2.2
Aligning sequence reads to reference genome
Sequence reads of 27 bp or 32 bp derived from Illumina 1G sequencers were aligned to the
NCBI reference human (hg18) genomes using MAQ. MAQ was used successfully in previous
large scale experiments [18, 185] and was a good choice for alignment at the time it was used.
Today, we would perhaps use an aligner such as Bowtie which exhibits a large performance
advantage over MAQ at a slight cost in accuracy [82]. Only sequence reads that aligned to
unique genomic locations were retained. The alignment was done by Richard Varhol.
2.3
Filtering reads
Any reads whose sequences were similar to sequences for gel size selection ladders or sequencing adapters were removed from the alignment output. All sets of multiple reads that
corresponded to a single DNA fragment start were collapsed into a single read.
CHAPTER 2. MATERIALS AND METHODS
2.4
2.4.1
40
Identifying enriched regions
Vancouver Short Read (Find Peaks 4)
Enrichment proles were generated with Find Peaks v.4.0.15 which is available at
//sourceforge.net/projects/vancouvershortr/.
http:
A maq read size of 128 is used. Triangle
distribution is used to weight the contribution of bases in the reads. A mappable genome
fraction of 0.7 was used based on previous estimates done at the Genome Science Centre. 5
iterations are used for the FindPeaks runs. A subpeak value of 0.2 and a trim value of 0.2 is
used to separate subpeaks and trim the sides of peaks. To reduce the amount of noise after
running Find Peaks 4 peaks a height threshold was used based on a FDR value of 0.01.
2.4.2
Saturation
To assist in generating the saturation plot in Figure 3.1 on page 45 Find Peaks v.2 by Mikhail
Bilenky was used.
2.5
Valley regions
Flanking H3K4me1 peaks were searched for genome-wide in the promoter regions of genes
2.5 kb upstream of the transcription start site. The locations of the two anking peaks were
separated by no more than 1000 bp and the centre 80% of peak to peak region is dened as
the valley.
2.6
Concordance
To determine the enrichment of transcription factor binding site in Figure 3.2, on page 46,
and extract the sequences for Tables 3.23-3.26, on page 84, two packages were used. The
SequenceExtractor package by Mikhail Bilenky and the BedTools package [267] available at
http://code.google.com/p/bedtools/.
CHAPTER 2. MATERIALS AND METHODS
2.7
41
Expression
RNA-Sequencing (RNA-seq) data was obtained to further characterize the eects of the
epigenetic changes.
RNA-Seq also provides a far more precise measurement of levels of
transcripts and their isoforms than other methods.
[325]. Also, RNA-Seq also shows high
levels of reproducibility, for both technical and biological replicates [325].
The Genomic
Alignment Analysis package of Find Peaks 4 was used to get the number of reads per gene
isoform. When comparisons between cell lines or groups of cell lines were done there may
be several splice isoforms per gene.
To simplify the comparison the splice isoform with
the highest expression in either group was chose and then used for the comparison in both
genes.
Expression data was expressed in terms of reads per million base pairs (rpkm).
The expression changes are at least two-fold with genes with pairs of low expression values
eliminated. This threshold for expression change should lter all but the most signicant
results [10].
2.8
Motifs
Motifs were searched for in the valleys where a unique valley coincided with an overexpression
in one of the cell lines. MEME [12] was used to search for conserved regions between 6 and 15
bp. This was chosen based on previous research that said motifs are typically fairly short (5 to
20 bp long) [55] or typically about 10 bp long [313]. A site of conservation needed to occur in
5 promoter regions or more to be considered in this analysis. Twenty such sites were retrieved
per category. A search was then performed to check whether any of the conserved regions
matched known motifs. STAMP [207] was used to identify know transcription factors with
the JASPAR v2010 motif set. Bonferroni correction is a method used to address the problem
of multiple comparisons in these data [254]. This is a conservative test [22]. Matches with
low complexity or with
p>1×10−3
were discarded. Figures of valley regions in the promoter
regions were obtained using the UCSC genome browser [147]
http://genome.ucsc.edu/.
CHAPTER 2. MATERIALS AND METHODS
2.8.1
42
Association of valley marked genes with breast cancer tumourigenesis
Figure 3.3 on page 47 used the genes to systems database available at
cnr.it/breastcancer/
2.8.2
to associate of valleys marked genes with breast cancer.
Functional analysis
Gproler, available at
http://biit.cs.ut.ee/gprofiler/,
was used to obtain these func-
tional data.
Data in this resource is derived from several sources.
database [9]
http://www.geneontology.org/,
egories.
share.
http://www.itb.
KEGG [134]
MiRBase [159]
The Gene Ontology
is used to obtain the gene ontology cat-
http://www.genome.jp/kegg/
http://www.mirbase.org/
describes the pathways these genes
is a searchable database of published
miRNA sequences and annotation. Bonnferoni was used for multiple testing correction in
these cell lines.
Chapter 3
Results
3.1
Note regarding contributions
In this study, Yongjun Zhao did all of the preparation of the libraries for ChIP-Sequencing.
Richard Varhol did the alignment of the sequencing reads to the reference genome. Mikhail
Bilenky wrote Find Peaks 2 and Anthony Fejes wrote Find Peaks 4. I did the bioinformatics
analyses dening valley regions.
I found a control for the cell line HS-578T and added
it to enable match-controlled analysis.
I found the overlap of these regions with various
databases containing breast cancer genes or transcription factors. I found concordance of
valleys amongst cell lines. I calculated expression levels in RPKM from the RNA-sequencing
data and chose an appropriate transcript to use for each gene. I performed all of the motif
analysis. Dr. Steven Jones conceived the study.
3.2
3.2.1
Chip sequencing Quality
Tally of Reads and Peaks
Several cell lines were analyzed with ChIP-sequencing. Reads were generated and aligned
to the Human Mar. 2006 (NCBI36/hg18) assembly genome. The reads were overlapped to
create islands. The Vancouver Short Read Analysis Package [79] created peaks from these
43
CHAPTER 3. RESULTS
44
islands. Peaks below an FDR threshold of 0.01 were discarded to reduce noise. Table 3.1
shows the cell lines used, their reads, and enriched islands of reads, or peaks, generated by
the Vancouver Short Read Analysis Package.
Table 3.1: Tally of Reads and Peaks
3.2.2
Cell lines
Reads
Peaks
MDA-MB-231
6774327
20791
BT-549
4384352
522727
HS-578T
4747582
501543
T-47D
7065557
770301
MCF-7
5972111
641704
Sum-149
2868543
670431
PC9
5308285
534586
HS-578Bst
10182518
751500
Saturation curves
A saturated library refers to a library with enough reads such that almost all of the peaks have
been discovered. Depending on the library, the initial peaks allow new areas of enrichment
to be discovered. With the addition of more reads, the library nears saturation. Then rather
than new peaks being discovered, deeper sequencing of known peaks occurs [149]. Simulation
is used to estimate binding saturation. By running the peak-calling algorithm on smaller
random subsets of the set of sequence reads, the number of detected regions (on the y axis)
can be plotted against the number of reads (on the x axis). This will often result in a curve
that rises rapidly in the beginning but then starts to saturate. The curve can be extrapolated
to estimate at what number of sequenced reads it will start to appear at [195].
In Figure 3.1 we see MCF-7 is an example of a fully saturated library. It starts saturating
at approximately 2.5 million reads. There, the number of regions per reads levels o to a
plateau. This indicates that we would nd no new regions with deeper sequencing. A library
such HS-578Bst starts to saturate but has not yet quite reached saturation, even with a large
number of reads. Thus the library is deeply sequenced but noisy.
CHAPTER 3. RESULTS
45
Figure 3.1: Combined Saturation plots. This gure was generated using Find Peaks 2 and
a modied MatLab script,
saturation.m,
both created by Mikhail Bilenky.
CHAPTER 3. RESULTS
3.3
46
Enrichment of TF binding sites in H3K4me1 marked motifs
The ORegAnno database (Open REGulatory ANNOtation) [223] contains known regulatory
elements curated from scientic literature. Table 3.2 correlates valley regions, in promoters
of genes with regulatory regions found in ORegAnno.
To control for chance overlap with
valley regions the prevelance of ORegAnno regions in the entire genome is calculated. To do
this, the ORegAnno regions were shued randomly in the genome and overlap with valleys
was again calculated. In a thousand repetitions, the percentage overlap of true ORegAnno
sites was always greater than regions of the same length randomly placed in the genome.
Table 3.2 shows these results and valleys are signicantly enriched for ORegAnno regulatory
regions (p
< 1 × 10−3 ).
Table 3.2: Enrichment of TF binding sites in valleys
Randomized
ORegAnno
Valleys
Overlap
Overlap
%
%
MDA-MB-231
Basal
6
138
4.35
0.18
BT549
Basal
289
3825
7.56
0.35
HS578T
Basal
293
2543
11.52
0.37
MCF7
Luminal
293
3543
8.27
0.36
T47D
Luminal
245
2417
10.14
0.38
HS578T
Cancer
297
2553
11.63
0.38
HS578BST
Control
197
2256
8.73
0.33
3.4
Correlation of Valleys with Downstream Genes
A study by Homan et. al found anking H3K4 monomethylation peaks mark sites of putative transcription factor binding [115]. To look for evidence of this binding, a search was
done to nd correlation of valleys in promoter regions 2.5 kb upstream of the TSS with downstream genes. This search looked for functional relevance of genes downstream of marked
promoters.
CHAPTER 3. RESULTS
3.4.1
47
Association of valley marked genes with breast cancer tumourigenesis
Next, we examine the set of genes whose promoters contain a valley region, or more simply,
valley marked genes. To look for evidence of association of valley marked genes with breast
cancer tumourgenesis and progression, we compare the genes downstream of the valleys
with genes found in the G2SBC (Genes to Systems Breast Cancer) [227] database. G2SBC
is an integration of many sources, such as NCBI, Breast Cancer Database, Uniprot, InterPro,
KEGG, BioGRID, and Gene Ontology. In Table 3.3, there is an enrichment (p
= 3.0×10−15 )
of breast cancer-related genes in the set marked by H3K4 monomethylation.
Table 3.3:
Proportion of breast cancer genes of the set of genes marked with H3K4me1
valleys
Total
Marked
63281
12466
2180
1322
3.4
10.6
Ensemble
Transcripts
Breast Cancer
Genes
Percent (%)
3.5
Concordance of valleys between cell lines
Previous studies on the delity of cell lines to primary breast tumours showed that cell lines
tend to mirror the modications of the tumours from which they are derived [236]. These
analyses used multiple dierent cell lines, which gave us the opportunity of grouping the cell
lines and looking for dierences in those groups. The two groupings done were a comparison
of a cancer cell line vs. a matched control cell line, and luminal cell lines vs. basal cell lines.
3.5.1
Concordance between breast cancer cell line and a matched control
We used two cell lines, HS578T and HS578Bst, that were derived from the same breast
in the same patient [105].
HS578T was the tumourigenic cell line, and HS578Bst was a
non-tumourigenic cell line taken from a distal location with no tumour cells identied in
CHAPTER 3. RESULTS
48
Figure 3.2: Overlap of valley regions in tumourigenic cell line vs. control
Cancer
1710
Control
373
2180
Table 3.4: Concordance of valleys in match controlled cell lines
Shared
373
it.
Unique to
Unique to
Cancer cell
Control cell
line
line
1710
2180
In Table 3.4, we can see the concordance of valley regions in the promoter regions
2.5 kb upstream of the TSS in a matched pair of control and cancer cell lines.
The lack
of shared valleys is consistent with a hypothesis that these H3K4me1 anking peaks mark
transcription factor binding sites, whose binding aects the genes downstream.
Since we
expect many genes to be regulated in opposite ways in the cancer vs. the control cell line, it
is not surprising that more H3K4me1 marked genes are unique to a cell line than the number
of genes that are shared between the two cell lines.
3.5.2
Concordance among various luminal and basal breast cancer cell
lines
3.5.2.1 Breast cancer subtypes
There are multiple cell lines used that represent dierent breast cancer subtypes. The cell
lines used are representative of the two subtypes of breast cancer and are shown in Table
3.5. The cell lines BT549 and HS578T were chosen to represent the basal subtype due to an
abnormally low number of valleys from the cell line MDA-MB-231 in Table 3.2. The overlap
CHAPTER 3. RESULTS
49
in the 4 cell lines representing basal and luminal breast cancer subtypes is shown in Table
3.6a.
Table 3.5: Cell lines by breast cancer subtype
Basal
Luminal
MDA-MB-231
MCF7
BT549
T47D
HS578T
3.5.2.2 Concordance with the same subtype
One would expect the two basal cell lines and the two luminal cell lines to share more
monomethylation marks than marks in luminal cell lines compared with marks in basal cell
lines. When we examine the pairwise overlap of valleys in Table 3.6a, we see that this does
not hold true for all of these cell lines.
Instead, all the cell lines have the most overlap
with BT549, the cell line with the largest total number of peaks. Other than overlaps with
BT549, the next highest overlap is between MCF7 and T47D, two luminal cell lines.
In
Table 3.7b, we see these pairwise values expressed as a fraction of either of the two cell lines
that are being compared. Here, the largest fractions are still found when cell lines overlap
with BT549, however the next highest value is an overlap between MCF7 and T47D.
3.5.2.3 Valleys shared by all cell lines
Table 3.7 shows the genes in which all cell lines are marked in the promoter with H3K4me1.
Of these genes, four, CTDSPL, BLCAP, CITED1, PCDH8, are found in the Genes-toSystems Breast Cancer Database, which is a database of genes having a role in breast cancer
that has a molecular alteration such as DNA amplication, deletion, insertion, altered protein
isoform, altered RNA expression or an RNA splice variant [227].
CTDSPL is the CTD (carboxy-terminal domain, RNA polymerase II, polypeptide A) small
phosphatase-like protein. This gene is a tumour suppressor and in previous studies missense
and nonsense mutations were found in tumours in this gene [140].
BLCAP, bladder cancer associated protein, is a tumour suppressor gene originally identied
CHAPTER 3. RESULTS
50
Table 3.6: Overlap of valleys in promoter regions of luminal and basal cell lines
(a) Pairwise overlap of valleys in promoter regions of luminal and basal
cell lines
Basal
BT549
Basal
BT549
HS578T
MCF7
T47D
3983
621
634
717
2631
465
543
3645
599
HS578T
Luminal
Luminal
MCF7
T47D
2495
Basal
Basal
Luminal
Luminal
BT549
HS578T
MCF7
T47D
BT549
1.000
0.156
0.159
0.180
HS578T
0.236
1.000
0.177
0.206
MCF7
0.174
0.128
1.000
0.164
T47D
0.287
0.218
0.240
1.000
(b) Pairwise overlap of fraction of valleys in promoter regions of luminal
and basal cell lines
Valleys
Overlaping cell lines
BT549
HS578T
MCF7
166
BT549
HS578T
T47D
164
BT549
MCF7
T47D
161
HS578T
MCF7
T47D
120
(c) Overlapping valleys in 3 luminal and basal
cell lines
Overlaping cell lines
BT549
HS578T
MCF7
Valleys
T47D
48
(d) Overlaping valleys in all 4 luminal and basal cell lines
CHAPTER 3. RESULTS
51
from human bladder carcinoma. Previous studies have found editing events that alter the
highly conserved amino terminus of the protein [91].
CITED1 is the Cbp/p300-interacting transactivator, with Glu/Asp-rich carboxy-terminal
domain, 1. A study showed CITED1 knockout mice identied a subset of estrogen-responsive
genes displaying altered expression in the absence of CITED1 [216].
Maintenance of the
ERalpha-CITED1 co-regulated signalling pathway in breast tumours can indicate good prognosis.
PCDH8, protocadherin 8, is a candidate tumour suppressor of breast cancer. Loss of PCDH8
expression is associated with loss of heterozygosity, partial promoter methylation, and increased proliferation. It is thought that loss of PCDH8 promotes oncogenesis in epithelial
human cancers by disrupting cell-cell communication dedicated to tissue organization and
repression of mitogenic signaling [341].
3.5.3
Concordance between a set of luminal and a set of basal breast
cancer cell lines
Table 3.8 indicates the overlap in genes marked in the promoter region of two basal or
two luminal cell lines. Overlapping valleys in dierent cell lines with the same breast cancer
subtype were merged and counted as one region. Valleys in Table 3.8 seem to be split between
those that have eects in all breast cancer, and those that have subtype specic eects. In
contrast, Table 3.4 on seems to indicate that as many valleys are shared between subtypes
as are unique to them.
This may indicate that H3K4me1 marks have a stronger eect
in tumourigenesis and tumour progression in general than breast cancer subtype specic
functions.
CHAPTER 3. RESULTS
52
Figure 3.3: Overlap of valley regions by breast cancer subtype
Basal
1990
3.6
Luminal
2006
2210
Unique valleys in promoter regions of overexpressed genes
3.6.1
Dening marked overexpressed categories
RNA-seq experiments were done on all of the breast cancer cell lines used.
To further
elucidate the functions of the H3K4me1 valleys we correlate them with expression data
(Tables 3.9 and 3.10). The expression changes are at least two-fold with genes with pairs of
the lowest 20% expression values eliminated. This threshold for expression change should
lter all but the most signicant results [10]. First, we use the tumourigenic cell line HS578T
and it's match control HS578Bst shown in Table 3.9. By correlating expression we generate
the following four gene categories:
1. Cancer marked, cancer overexpressed: Marked in the cancer cell line with a H3K4me1
valley and overexpressed in the cancer cell line.
2. Cancer marked, control overexpressed: Marked in the cancer cell line with a H3K4me1
valley and overexpressed in the control cell line.
3. Control marked, cancer overexpressed: Marked in the control cell line with a H3K4me1
valley and overexpressed in the cancer cell line.
4. Control marked, control overexpressed: Marked in the control cell line with a H3K4me1
valley and overexpressed in the control cell line.
Corresponding gene categories were also generated with the luminal and basal cell lines
shown in Table 3.10:
CHAPTER 3. RESULTS
53
1. Luminal marked, luminal overexpressed: Marked in the luminal cell line with a H3K4me1
valley and overexpressed in the luminal cell line.
2. Luminal marked, basal overexpressed: Marked in the luminal cell line with a H3K4me1
valley and overexpressed in the basal cell line.
3. Basal marked, luminal overexpressed: Marked in the basal cell line with a H3K4me1
valley and overexpressed in the luminal cell line.
4. Basal marked, basal overexpressed:
Marked in the basal cell line with a H3K4me1
valley and overexpressed in the basal cell line.
CHAPTER 3. RESULTS
Table 3.7: Overlap of valleys in promoter regions of luminal and basal cell lines
Hugo Genes
Description
TPR
translocated promoter region
CCDC30
coiled-coil domain containing 30
CCDC18
coiled-coil domain containing 18
CGREF1
cell growth regulator with EF-hand domain 1
POLQ
polymerase , theta
CTDSPL
CTD small phosphatase-like
CGGBP1
CGG triplet repeat binding protein 1
POLK
polymerase kappa
REEP2
receptor accessory protein 2
BOD1
biorientation of chromosomes in cell division 1
KIF6
kinesin family member 6
C6orf138
Patched domain-containing protein C6orf138
PRIM2
primase, DNA, polypeptide 2
OLFML2A
olfactomedin-like 2A
ZFAND5
zinc nger, AN1-type domain 5
LHX6
LIM homeobox 6
CITED1
Cbp/p300-interacting transactivator 1
CCNY
cyclin Y
CCDC6
coiled-coil domain containing 6
WDR74
WD repeat domain 74
AGAP2
ArfGAP with GTPase domain, ankyrin repeat and PH domain 2
PCDH8
protocadherin 8
NLRC5
NLR family, CARD domain containing 5
CRHR1
Corticotropin-releasing factor receptor 1 Precursor
ADAMTSL5
ADAMTS-like 5
HAUS5
HAUS augmin-like complex, subunit 5
GGN
gametogenetin
SPRED3
sprouty-related, EVH1 domain containing 3
ZNF283
zinc nger protein 230
BLCAP
bladder cancer associated protein
SEPT5
septin 5
SMC1B
structural maintenance of chromosomes 1B
EMID
N/A
54
CHAPTER 3. RESULTS
Table 3.8: Valleys shared between breast cancer subtypes
Basal
Luminal
Total
4216
3996
Unique
2210
1990
Shared
2006
2006
55
CHAPTER 3. RESULTS
3.6.2
56
Tally of unique valleys in promoter region of overexpressed genes
3.6.2.1 Breast cancer subtype specic valleys
Table 3.11 is a tally of the four categories of subtype-specic valleys that were in the promoter
regions of overexpressed genes.
Since there were ve cell lines to use for this analysis,
promoters with valleys in at least two of the same subtype of genes were used.
3.6.2.2 Tumourigenics valleys
A similar analysis was done on HS578T and it's match control HS578Bst (results shown in
Table 3.14). We see an average of 60 valley regions found in each category.
3.6.3
Tally of uniquely marked overexpressed genes
The number of overexpressed genes that are marked uniquely in a promoter in at least two
cell lines is shown in Table 3.14. Similarily, the number of genes that are uniquely marked
with at least two valley regions is shown in Table 3.13 in matched control cell lines.
Table 3.9: Categories correlating expression with H3K4me1 mark in tumourigenic and nontumourigenic cell lines
H3K4me1 Valley
Control
×
×
Expression
Cancer
in Cancer
×
×
↑
↓
↑
↓
CHAPTER 3. RESULTS
57
Table 3.10: Categories correlating expression with H3K4me1 mark in luminal and basal cell
lines
H3K4me1 Valley
Luminal
Basal
×
×
Expression
Luminal
Basal
↑
↑
×
×
↑
↑
Table 3.11: Number of valleys in the promoter region marking overexpressed genes in breast
cancer by subtype
Over-expression
Basal
Luminal
Marked
Basal
131
116
Cell line
Luminal
100
104
Table 3.12: Valleys in promoters of genes correlated with overexpression in match-controlled
cell lines
Over-expression
Cancer
Control
Marked
Cancer
55
81
Cell line
Control
47
62
Table 3.13: Uniquely marked genes correlated with overexpression by breast cancer subtype
Over-expression
Basal
Luminal
Marked
Basal
53
44
Cell line
Luminal
42
46
Table 3.14: Uniquely marked genes correlated with overexpression in match-controlled cell
lines
Over-expression
Cancer
Control
Marked
Cancer
45
61
Cell line
Control
42
52
CHAPTER 3. RESULTS
3.7
3.7.1
58
Functional analysis
Functional analysis of basal and luminal cell lines
A functional analysis was done using gProler [276]. This database includes data from Gene
Ontology, KEGG, and miRBase. This analysis was done for all four categories of marked
overexpresion. The
p-values
listed are multiple-testing corrected. The individual genes and
select signicantly enriched functions are listed in Tables 3.15-3.22.
3.7.1.1 Functional analysis of basal marked basal overexpressed genes
In Table 3.19 we see that there are many gene ontology categories enriched in basal marked
basal overexpressed genes. One of the functions that is observed in these analysis is metastasis.
Metastasis involves the spread of cancer from its primary site to other places in the body.
Some metastatic functions seem to be revealed by this analysis.
genes are associated with the focal adhesion (p
=
the GO terms integrin-mediated signaling pathway (p
(p
= 9.5 × 10−4 )
For example, ve of the
1.1 × 10−2 ) KEGG [135] pathway. Also,
= 7.8 × 10−4 ) and integrin binding
provide evidence that these genes may be involved in a breakdown of
adhesion.
Some of the functional categories may indicate involvement in angiogenesis. As a tumour gets
bigger, it is less able to suciently access the blood vessels. The generation of vascular stroma
is thus essential for solid tumour growth [29]. Vascular stroma formation is evident in two
GO categories, vasculature development (p
(p
= 8.1 × 10−5 ).
= 2.8×10−5 ) and blood vessel morphogenesis
Heparin binding was also enriched in these genes (p
= 6.5 × 10−4 ).
MicroRNAs are regulatory, non-coding RNAs about 22 nucleotides in length. They control
gene expression by targeting mRNAs and triggering either translation repression or RNA
degradation. In previous studies, miRNAs were identied whose expression was correlated
with specic breast cancer biopathologic features, such as estrogen and progesterone receptor
expression, tumour stage, vascular invasion, or proliferation index [124].
miR-586 (p
= 6.4 ×
10−3 ) is enriched in this analysis [248].
The microRNA
CHAPTER 3. RESULTS
59
3.7.1.2 Functional analysis of basal marked luminal overexpressed genes
Table 3.20 shows enriched miRNAs.
mice and rats [159].
The miRNA miR-351 is reported to be specic to
It belongs to the miR-125 family, shown to perform varied roles in
development, cancer and inammation.
The miRNA miR-351 regulates genes involved in
the TNF-α signaling pathway [225].
3.7.1.3 Functional analysis of luminal marked basal overexpressed genes
In Table 3.21, we see only actin binding (p
= 2.2 × 10−4 )
as an enchiched GO category.
3.7.1.4 Functional analysis of luminal marked luminal overexpressed genes
In Table 3.22, we see two microRNAs were found at multiple genes, miR-486 (p
10−4 )
and miR-542-5p (p
= 4.2 × 10−4 ).
upgreagulated in grade 3 vs.
tumours [59].
= 4.2 ×
MiR-486 was found in a breast study to be
grade 1/2 tumours and upgregulated in IBC vs.
non-IBC
The miRNA miR-542-5p was thought was a putative tumour suppressor
discovered in neuroblastoma [290].
3.7.2
Functional analysis of cancer and control cell lines
3.7.2.1 Functional analysis of control marked cancer overexpressed genes
Table 3.15 shows an enrichment in the KEGG term cell cycle (p
there is a enrichment of the reactome term cell cycle, mitotic (p
enriched GO terms include cell cycle process (p
regulation of cell cycle checkpoint (p
1.4 ×
10−3 ).
=
= 5.9×10−3 ).
= 1.6×10−3 ).
2.6 × 10−9 ), mitosis (p
= 2.0 × 10−4 )
In addition,
Similarily an
= 2.1 × 10−4 ),
and cell cycle checkpoint (p
One could describe cancer as a disease of mitosis.
=
A breakdown in normal
checkpoints results in unregulated growth.
DNA packaging (p
= 8.3 × 10−4 )
was another GO term that was enriched.
DNA is
associated with many proteins that organize and package it. The proteins and complexes
CHAPTER 3. RESULTS
60
can aect accessibilty of DNA or modulate transcription factor binding. Genes involved in
DNA packaging may thus also be involved in cancer progression.
3.7.2.2 Functional analysis of cancer marked control overexpressed genes
The miRNA miR-650 (p
= 3.2 × 10−3 )
is enriched in this analysis. In other studies miR-
650 was found to be downregulated in colon cancer [297]. miR-491-5p was also enriched
(p
= 4.9 × 10−3 ).
A study found miR-491-5p expression was induced by TGF-β1 through the
MEK/p38 MAPK pathway [345]. This microRNA down-regulated the expression of Par-3
´
through a binding site in the 3
UTR integrity, and thus disrupts cell junction.
3.7.2.3 Functional analysis of cancer marked cancer overexpressed genes
The miRNA miR-7-1 (p
= 7.0 × 10−3 )
is enriched in this analysis. Previous studies have
shown miR-7 to be correlated with genes that had predicted chromosomal instability [84].
Also, miR-7 was linked to cell cycle deregulation in breast cancer [84]. Alternatively, miR-7
inhibited expression of p21-activated kinase 1, an invasion-promoting kinase up-regulated in
multiple cancer types [273]. Transfection of miR-7 was found in previous studies to inhibit the
motility, invasiveness, anchorage-independent growth, and tumourigenic potential of highly
invasive breast cancer cells [273].
3.7.2.4 Functional analysis of control marked control overexpressed genes
The ability of tumour cells to invade tissue requires that the tumour cell be able to traverse
the basement membrane and extracelluar matrix [301].
region
Three GO terms Extracellular
p = 1.5×10−6 , Extracellular matrix part p = 4.4×10−5 , and Basement membrane
p = 1.0 × 10−4
possibly indicate that invasion is occuring.
The microRNA miR-130b (p
= 1.0 × 10−2 )
is enriched in this analysis.
MiR-130b is a
tumour-suppressing micro RNA and there is a down-regulation of miR-130b in metastatic
breast cancers. TAp63, suppresses tumourigenesis and metastasis, and coordinately regulates Dicer and miR-130b to suppress metastasis [305]. A conicting study found miR-130b
upregulated in breast cancer with metastasis and in grade 3 vs. grade 1/2 tumours [59].
BUB1
C13ORF3
IL8
RSPO1
×
×
×
×
×
×
×
p = 2.0 × 10−4
p = 1.4 × 10−3
p = 8.3 × 10−4
Regulation of cell cycle
Cell cycle checkpoint
DNA packaging
ZNF238
×
PCDH7
×
×
GPATCH4
C8ORF38
PDLIM3
INO80C
Continued on next page
Cell cycle, mitotic
Reactome
p = 1.6 × 10−3
p = 2.1 × 10−4
Mitosis
miRNA
p = 9.7 × 10−5
p = 2.6 × 10−9
Cell cycle process
GO
miR-144
p = 5.9 × 10−3
Cell cycle
CHAPTER 3. RESULTS
61
Table 3.15: Control marked cancer overexpressed genes
PAGE2
C4orf46
HSPA1A
FAM36A
MT1X
INSIG1
CADM1
×
CHAPTER 3. RESULTS
62
Cell cycle, mitotic
miR-144
×
DNA packaging
Mitosis
×
Cell cycle checkpoint
Cell cycle process
×
Regulation of cell cycle
Cell cycle
Control marked cancer overexpressed genes (cont.)
PAGE2
NCAPG2
×
×
PSPH
FANCD2
×
×
×
ANP32E
SERBP1
TCF19
CDCA8
×
×
×
×
×
HAT1
×
PKMYT1
×
DLGAP5
×
×
×
×
HNRNPR
×
RPP40
×
ZNF706
CENPA
×
×
SMC4
×
×
TTK
×
×
KRT18
×
×
×
×
×
×
×
×
RPA3
×
H2AFV
RBBP8
×
×
×
×
×
×
×
CENPM
TGFB2
Continued on next page
×
CHAPTER 3. RESULTS
63
Cell cycle, mitotic
miR-144
DNA packaging
Cell cycle checkpoint
×
Regulation of cell cycle
Cell cycle process
×
Mitosis
Cell cycle
Control marked cancer overexpressed genes (cont.)
PAGE2
GTSE1
×
HMMR
SNAI2
Table 3.16: Cancer marked control overexpressed genes
p = 4.9 × 10−3
miR-491-5p
miR-650
p = 3.2 × 10−3
miRNA
×
FER1L4
PSG4
HOXC6
PDCD1LG2
C15ORF52
×
ADAMTSL1
IRX3
SEZ6L2
Continued on next page
×
CHAPTER 3. RESULTS
64
miR-491-5p
miR-650
Cancer marked control overexpressed genes (cont.)
×
FER1L4
MRGPRF
×
GAA
TMEM129
IL7R
GDNF
ANGPTL4
×
MAP1A
CITED2
ESM1
PBXIP1
NCSTN
×
NMRAL1
COL8A1
×
ECM1
LHX9
CRABP2
×
IFITM3
SLC16A3
NAAA
ITM2B
ITGA7
×
NT5E
MYH11
×
STARD13
NES
Continued on next page
×
CHAPTER 3. RESULTS
65
miR-491-5p
miR-650
Cancer marked control overexpressed genes (cont.)
×
FER1L4
REEP2
DLG4
SLC44A2
CD68
ISLR
TUBA1
PSG2
XG
CPXM2
PLAGL1
PCTK3
×
H2AFJ
WSB1
HPS1
ENG
×
ARSA
TIMP3
HSD3B7
TBC1D2
×
×
SLC22A17
×
×
GPR137B
P4HA2
×
DGKA
HAGH
DCN
Continued on next page
miR-650
miR-491-5p
CHAPTER 3. RESULTS
66
miR-650
miR-491-5p
Cancer marked control overexpressed genes (cont.)
×
FER1L4
RHBDF1
TMEM98
KIAA1539
×
×
Table 3.17: Cancer marked cancer overexpressed genes
p = 7.0 × 10−3
p = 2.1 × 10−3
miR-543
miR-7-1
p = 2.5 × 10−3
miR-606
miRNA
SLC12A8
CSNK2B
TIMM23B
CBWD1
×
×
CK17
TUT1
C1ORF110
×
AP1S2
MLF1
RPS7
POLR2J
PLK1
Continued on next page
×
×
CHAPTER 3. RESULTS
67
miR-543
miR-7-1
miR-606
Cancer marked cancer overexpressed genes (cont.)
×
×
SLC12A8
KIAA0101
×
PSIP1
GEM
ACTG2
NRG1
EEF1A1
×
×
UBC
CEP55
TROAP
×
POSTN
NASP
EMILIN2
×
CDKN2C
ZWINT
TJP2
×
×
×
×
×
×
LPHN2
SIPA1L2
×
LOXL3
ECT2
LMNB1
×
RAD51AP1
FOXM1
CNTNAP3
×
MEST
PBEF1
TNNT1
Continued on next page
×
CHAPTER 3. RESULTS
68
miR-7-1
miR-543
miR-606
Cancer marked cancer overexpressed genes (cont.)
SLC12A8
CDC45L
MTMR2
WDR62
×
TMEM48
×
CENPQ
IL32
POLR2J4
Table 3.18: Control marked control overexpressed genes
p = 1.0 × 10−4
Basement membrane
ITGBL1
×
PSAP
×
LAMB3
×
p = 1.0 × 10−2
p = 4.4 × 10−5
Extracellular matrix part
×
miR-130b
p = 1.5 × 10−6
FAM19A5
Continued on next page
miRNA
Extracellular region
ECM organization
p = 7.3 × 10−4
GO
×
×
×
CHAPTER 3. RESULTS
69
Basement membrane
×
×
miR-130b
Extracellular matrix part
Extracellular region
ECM organization
Control marked control overexpressed genes (cont.)
×
FAM19A5
S100A4
GAS6
×
RNASE4
×
FUCA1
P4HTM
PNPLA2
×
A2M
COL8A2
×
×
NUDT6
ITFG3
×
TPP1
×
AP000926.2
×
IGSF8
MEGF6
×
C5ORF45
GPSM1
JAM2
LYPD6B
×
SIDT2
×
HERC4
IGFBP3
Continued on next page
×
CHAPTER 3. RESULTS
70
FAM19A5
×
SCUBE3
×
PAM
×
miR-130b
Basement membrane
Extracellular matrix part
Extracellular region
ECM organization
Control marked control overexpressed genes (cont.)
×
TBCK
×
FGF2
CYP1B1
FAM129B
COL4A2
×
×
×
×
×
LOXL2
PODXL
HSPA2
RECK
×
×
F3
×
RGS4
NID1
PTGDS
×
×
×
×
×
×
×
PIK3IP1
SEL1L3
NID2
×
KCNK2
MOXD1
TNS1
Continued on next page
×
CHAPTER 3. RESULTS
71
×
FAM19A5
EVC
PPAP2A
LIMCH1
COL11A1
DKK3
ALDH3B1
×
×
×
miR-130b
Basement membrane
Extracellular matrix part
Extracellular region
ECM organization
Control marked control overexpressed genes (cont.)
ALCAM
DLC1
×
DCBLD1
×
Continued on next page
CTHRC1
C8ORF84
p = 5.1 × 10−2
Melanoma
KEGG
NUDT6
p = 6.4 × 10−3
p = 1.1 × 10−2
Focal adhesion
GO
miR-586
p = 9.5 × 10−4
Integrin binding
×
p = 6.5 × 10−4
×
Heparin binding
×
p = 1.2 × 10−6
×
Extracellular matrix
p = 1.2 × 10−4
p = 2.8 × 10−5
p = 9.9 × 10−8
Adherens junction
BGN
p = 7.8 × 10−4
MXRA7
Integrin-mediated signaling pathway
SPATS2L
p = 8.1 × 10−5
TGM2
Blood vessel morphogenesis
C4orf46
Vasculature development
Cell adhesion
CHAPTER 3. RESULTS
72
Table 3.19: Basal marked basal overexpressed genes
miRNA
×
HYI
×
×
C6ORF145
IKBIP
×
×
×
×
CHAPTER 3. RESULTS
73
miR-586
Melanoma
Focal adhesion
Integrin binding
Heparin binding
Extracellular matrix
Adherens junction
Integrin-mediated signaling pathway
Blood vessel morphogenesis
Vasculature development
Cell adhesion
Basal marked basal overexpressed genes (cont.)
C4orf46
DTX3L
×
ANTXR2
SNX7
SYNC
FBLIM1
×
×
ADAMTS1
×
×
×
×
×
×
×
×
PDE1C
DST
×
×
ITGB1
×
×
×
×
×
ZEB1
COL8A1
×
×
×
×
×
×
×
×
×
CDCA7
FGF2
THBS1
×
LMO7
×
COL5A1
×
×
×
×
×
×
×
×
×
×
×
×
×
MAP7D3
×
FLNC
SGCE
Continued on next page
×
×
CHAPTER 3. RESULTS
74
Melanoma
×
×
C4orf46
CALD1
LPHN2
AKT3
GPR177
DOCK7
LOXL3
NCOA7
LPXN
×
AC005562
ARHGEF10
SLC39A14
PLS3
MYH9
×
×
×
×
×
TIMP3
×
FBLN1
×
×
FGFR1
×
×
FOSL2
PLEKHC1
CAPG
Continued on next page
×
×
×
miR-586
Focal adhesion
Integrin binding
Heparin binding
Extracellular matrix
Adherens junction
Integrin-mediated signaling pathway
Blood vessel morphogenesis
Vasculature development
Cell adhesion
Basal marked basal overexpressed genes (cont.)
miR-351
p = 6.7 × 10−3
MRPL38
S100A14
ZNF552
Continued on next page
miRNA
×
SLC25A29
×
BCAM
SIGIRR
×
HDDC3
×
SNAI2
Table 3.20: Basal marked luminal overexpressed genes
miR-586
Melanoma
Focal adhesion
Integrin binding
Heparin binding
Extracellular matrix
Adherens junction
Integrin-mediated signaling pathway
Blood vessel morphogenesis
Vasculature development
Cell adhesion
CHAPTER 3. RESULTS
75
Basal marked basal overexpressed genes (cont.)
C4orf46
×
CHAPTER 3. RESULTS
76
miR-351
Basal marked luminal overexpressed genes (cont.)
MRPL38
×
EFCAB4A
×
SEZ6L2
PPM1D
ZNF444
TMC4
CRYL1
TOR2A
C21ORF33
DUSP23
HIST1H2BD
NMRAL1
FAM128B
DAK
PGAP2
CRABP2
×
SYTL1
×
BCKDHA
KREMEN2
KIAA0182
CBFA2T3
BCAS4
C14ORF179
TP53I3
EPB41L5
H2AFJ
COQ5
Continued on next page
CHAPTER 3. RESULTS
77
miR-351
Basal marked luminal overexpressed genes (cont.)
MRPL38
×
CPT1A
TRIM37
TJP3
×
TRPS1
DECR2
EEF1A2
SULT2B1
CELSR1
CA12
HAGH
SH3YL1
Table 3.21: Luminal marked basal overexpressed genes
Actin binding
p = 2.2 × 10−4
GO
NCRNA00152
TUBB3
S100A2
Continued on next page
CHAPTER 3. RESULTS
78
Actin binding
Luminal marked basal overexpressed genes (cont.)
NCRNA00152
AFAP1
×
IL1RAP
FAM92A1
SEPT10
CDCA2
MCFD2
PTRF
ANTXR1
×
TPM4
×
EDIL3
FBLN2
TGFBR2
DDR2
MEGF6
MX1
DAB2
SACS
LHFPL2
NFKBIZ
CEP170
COL6A2
MFGE8
DUSP6
NT5E
FST
Continued on next page
CHAPTER 3. RESULTS
79
Actin binding
Luminal marked basal overexpressed genes (cont.)
NCRNA00152
PLAU
F3
CCDC88A
×
NR3C1
FAM46A
QKI
GPR162
DDX58
MET
FXYD5
×
RAGE
MYLK
×
LIMCH1
×
ITGA3
CHAPTER 3. RESULTS
80
Table 3.22: Luminal marked luminal overexpressed genes
p = 4.2 × 10−4
miR-542
p = 4.2 × 10−4
miR-486
miR-448
p = 7.4 × 10−3
miRNA
NPEPL1
SNURF
DDR1
MB
ZP3
×
C19ORF46
FAM128A
×
PFKFB3
×
BOLA2B
ABCA3
×
C10ORF32
RBM47
DMKN
×
FGFR4
×
ZDHHC12
×
×
×
×
ACSS1
PDCD4
DOC2A
×
SOX13
FAM63A
×
SUOX
DDB2
Continued on next page
×
CHAPTER 3. RESULTS
81
miR-542
miR-486
miR-448
Luminal marked luminal overexpressed genes (cont.)
NPEPL1
INADL
EPS8L1
×
×
×
RAB17
PEX16
×
EPCAM
HDHD3
BSPRY
PCTK3
EFHD1
MANSC1
×
FOLR1
TUBD1
×
NPDC1
AGR2
ISYNA1
×
GSTZ1
×
×
ESR1
FXYD3
×
LPHN1
PRKCZ
×
ERBB3
PTGER3
×
KIAA1370
×
DBNDD1
×
×
CHAPTER 3. RESULTS
3.8
82
Marked overexpressed genes
3.8.1
Motifs
Motifs were searched for in the valleys where a unique valley coincided with an overexpression
in one of the cell lines. MEME [13] was used to search for conserved regions between 6 and
15 bp. A site of conservation needed to occur in 5 promoter regions or more. Twenty such
sites were retrieved. A search was then performed to check whether any of the conserved
regions matched known motifs. STAMP [207] was used with the JASPAR v2010 motif set.
Matches with low complexity or with
p > ×10−3
were discarded. If more than one motif
may match a site well then they are all listed.
Motifs that match at dierent sites are separated by lines in Tables 3.23 - 3.26 on pages 84 85.
The motifs that were found were ESR1, ESR2, REST, Egr1, sna, che-1, stat3, cup2,
EWSR1-FLI1, Ixr1, Tlx1_NFIC, tinman, bcd, oc, gsc, IRF1, MEF2A, and NFκB.
3.8.1.1 ESR1
ESR1 is found in Table 3.23 and Table 3.24. The presence of ESR1 in the control marked
control overexpressed category can be explained by the H3K4me1 mark aiding the activatory
role of ER. ESR1, as a tumour suppressor [19], is likely activating genes ghting tumourigenesis such as apoptotic genes. The presence of ESR1 in the cancer marked control category
is not intially expected and possible explanations are discussed in Section 4.7.
Estrogen Receptor 1 (ESR1) is the gene that encodes estrogen receptor alpha (ER-α). ESR1
is activated by the ligand estrogen and aects physiological processes such as growth, dierentiation, and homeostasis in eukaryotic cells [93].
3.8.1.2 ESR2
ESR2 is found in Table 3.23 and Table 3.24. The presence of ESR2 in the control marked
control overexpressed category can be explained by the H3K4me1 mark aiding the activatory
CHAPTER 3. RESULTS
83
role of ER. ESR2, as a tumour suppressor [136], is likely activating genes ghting tumourigenesis such as apoptotic genes. The presence of ESR2 in the cancer marked control category
is not intially expected and possible explainations are discussed in Section 4.7.
EStrogen Receptor 2 (ESR2) is the gene that encodes Estrogen receptor beta (ER-β). Like
ESR1, ESR2 is activated by the ligand estrogen and aects physiological processes such as
growth, dierentiation, and homeostasis in eukaryotic cells [93].
CHAPTER 3. RESULTS
84
Table 3.23: Uniquely Marked in Control and Overexpressed in Control
Motif
TF
p-value
MA0450
hkb
2.8×10
13
8
−4
MA0055
Myf
−5
1.4×10
MA0402
SWI5
2.7×10
MA0193
Lag1
4.9×10
MA0247
tin
1.7×10
MA0086
sna
1.2×10
MA0112
ESR1
7.1×10
MA0258
ESR2
1.2×10
MA0149
EWSR1-FLI1
1.7×10
MA0393
STE12
8.0×10
achi
−4
4.7×10
MA0207
Sites
−4
−5
33
−7
8
−6
−5
10
−4
−7
18
−5
13
Table 3.24: Uniquely Marked in Cancer and Overexpressed in Control
Motif
TF
p-value
MA0149
EWSR1-FLI1
6.5×10
12
MA0162
Egr1
−4
1.5×10
12
MA0260
che-1
7.2×10
−11
−6
MA0023
dl_2
−5
4.8×10
MA0304
GCR1
8.1×10
MA0212
bcd
3.1×10
MA0234
oc
−7
3.1×10
MA0190
Gsc
6.0×10
MA0112
ESR1
6.1×10
MA0258
ESR2
−4
6.8×10
MA0105
NFKB1
2.3×10
MA0061
NF-kappaB
−4
7.3×10
MA0023
dl_2
7.9×10
MA0287
CUP2
1.6×10
MA0144
Stat3
MA0430
MA0087
Sites
6
−5
−7
23
−7
−4
−5
5
7
−4
−6
5
5.1×10
−14
5
YLR278C
−4
3.7×10
5
Sox5
2.6×10
−4
9
CHAPTER 3. RESULTS
85
Table 3.25: Uniquely Marked in Cancer and Overexpressed in Cancer
Motif
TF
p-value
MA0162
Egr1
7.9×10
−4
12
MA0162
Egr1
−6
6.0×10
12
MA0323
IXR1
8.2×10
MA0138
REST
−5
1.8×10
MA0373
RPN4
3.9×10
−5
7
MA0260
che-1
−6
2.0×10
5
MA0393
STE12
3.2×10
MA0050
IRF1
5.8×10
MA0212
bcd
2.4×10
MA0234
oc
2.4×10
MA0190
Gsc
4.6×10
MA0052
MEF2A
1.2×10
−6
Sites
9
−6
−6
−7
5
−7
−7
−6
5
Table 3.26: Uniquely Marked in Control and Overexpressed in Cancer
Motif
TF
p-value
Sites
MA0234
oc
−7
9.3×10
21
MA0212
bcd
9.8×10
MA0190
Gsc
1.4×10
MA0190
Gsc
2.5×10
MA0212
bcd
5.8×10
MA0234
oc
6.3×10
MA0218
ct
4.9×10
6
MA0373
RPN4
−5
6.2×10
10
MA0344
NHP10
2.0×10
5
MA0344
NHP10
−4
1.5×10
18
MA0016
usp
2.0×10
MA0323
IXR1
1.0×10
−7
−6
−7
7
−7
−7
−5
−5
−5
5
−8
8
MA0119
TLX1_NFIC
−6
8.5×10
MA0373
RPN4
3.3×10
MA0162
Egr1
4.3×10
MA0393
STE12
3.7×10
MA0260
che-1
−5
7.9×10
MA0430
YLR278C
4.8×10
−5
−5
9
−5
13
−5
10
CHAPTER 3. RESULTS
3.9
86
Genes downstream of ESR1 motifs in Valleys
The ESR1 gene encodes an estrogen receptor which is important for hormone binding, DNA
binding, and activation of transcription.
The ESR1 gene is amplied in 21% of breast
carcinomas [119]. UCSC [147] was used to plot the data showing the H3K4me1 mark in the
tumourigenic and control cell line in Figure 3.4. Below the H3K4me1 data is shown where
valleys were identied and the location of the ESR1 motif. Many of these genes are known
to have functions in tumourigenesis.
0_
0_
ENST00000374476
ENST00000374479
FUCA1
ESR1 motif
HS578-Bst Valley
0_
HS578T Valley
16.05 _
HS578-Bst
HS578T
Scale
chr1:
16.35 _
MIR548F3
ESR1 motif
HS578-Bst Valley
0.04 _
HS578T Valley
15.85 _
HS578-Bst
HS578T
Scale
chr1: 213243000
44.55 _
1 kb
RefSeq Genes
ESR1 motif
HS578-Bst Valley
HS1328
HS578T Valley
HS0356
213244500
Ensembl Gene Predictions
213244000
213245000
1 kb
24068000
Ensembl Gene Predictions
RefSeq Genes
ESR1 motif
HS578-Bst Valley
HS1328
HS578T Valley
24068500
HS0356
24069000
(b) ENST00000374476, AC092162.1, HS578T:3.97307 rpkm, HS578-Bst:21.8525 rpkm
213243500
(a) ENST00000391895, KCNK2, HS578T:15.2132 rpkm, HS578-Bst:31.4393 rpkm
the control cell line and overexpressed in the control cell line, cont.
24069500
ENST00000391895
KCNK2
213245500
Figure 3.4: ESR1 motifs found in valleys upstream of genes that were uniquely marked by H3K4me1 mono-methylation in
CHAPTER 3. RESULTS
87
0.23 _
0_
ENST00000260630
ENST00000407341
CYP1B1
ESR1 motif
HS578-Bst Valley
0_
HS578T Valley
9.54 _
HS578-Bst
HS578T
Scale
chr2:
20.16 _
ENST00000354332
ENST00000368714
ENST00000368716
S100A4
S100A4
ESR1 motif
HS578-Bst Valley
1.8 _
HS578T Valley
19.16 _
HS578-Bst
HS578T
Scale
chr1:
12.35 _
1 kb
151786000
151786500
ENST00000368712
Ensembl Gene Predictions
ENST00000368713
RefSeq Genes
S100A3
ESR1 motif
HS578-Bst Valley
HS1328
HS578T Valley
HS0356
151787000
1 kb
38157500
Ensembl Gene Predictions
RefSeq Genes
ESR1 motif
HS578-Bst Valley
HS1328
HS578T Valley
38158000
HS0356
38158500
(d) ENST00000407341, CYP1B1, HS578T:9.10138 rpkm, HS578-Bst:39.3691 rpkm
38157000
(c) ENST00000368716, S100A4, HS578T:221.861 rpkm, HS578-Bst:506.054 rpkm
151785500
38159000
151787500
CHAPTER 3. RESULTS
88
0_
0_
ENST00000292586
ENST00000376931
C5orf45
C5orf45
ESR1 motif
HS578-Bst Valley
1.14 _
HS578T Valley
13.38 _
HS578-Bst
HS578T
Scale
chr5:
10.4 _
ESR1 motif
HS578-Bst Valley
0_
HS578T Valley
17.04 _
HS578-Bst
HS578T
Scale
chr3:
25.72 _
500 bases
RefSeq Genes
Ensembl Gene Predictions
ESR1 motif
HS578-Bst Valley
HS1328
HS578T Valley
49001500
HS0356
Ensembl Gene Predictions
RefSeq Genes
ESR1 motif
HS578-Bst Valley
HS1328
HS578T Valley
HS0356
179219500
49002000
(f) ENST00000376931, C5ORF45, HS578T:21.0684 rpkm, HS578-Bst:42.5321 rpkm
500 bases
179219000
(e) ENST00000383729, P4HTM, HS578T:9.88339 rpkm, HS578-Bst:23.777 rpkm
49001000
179220000
ENST00000383729
CHAPTER 3. RESULTS
89
1.16 _
0_
ESR1 motif
HS578-Bst Valley
0_
HS578T Valley
11.41 _
HS578-Bst
HS578T
Scale
chr16:
6.24 _
ENST00000275521
ENST00000381083
ENST00000381086
IGFBP3
IGFBP3
ESR1 motif
HS578-Bst Valley
0.47 _
HS578T Valley
10.65 _
HS578-Bst
HS578T
Scale
chr7:
26.53 _
223800
1 kb
RefSeq Genes
ESR1 motif
HS578-Bst Valley
HS1328
HS578T Valley
HS0356
Ensembl Gene Predictions
45928500
45929000
500 bases
224000
224100
224300
HS0356
Ensembl Gene Predictions
RefSeq Genes
ESR1 motif
HS578-Bst Valley
HS1328
HS578T Valley
224200
224400
ENST00000301677
224500
(h) ENST00000301679, P4HTM, HS578T:50.9351 rpkm, HS578-Bst:102.124 rpkm
223900
(g) ENST00000381086, IGFBP3, HS578T:398.96 rpkm, HS578-Bst:10702.8 rpkm
45928000
224600
45929500
ITFG3
224800
ENST00000301679
224700
CHAPTER 3. RESULTS
90
0.1 _
ENST00000215912
ENST00000402249
PIK3IP1
PIK3IP1
ESR1 motif
HS578-Bst Valley
0.64 _
HS578T Valley
16.19 _
HS578-Bst
HS578T
Scale
chr22:
6.16 _
500 bases
Ensembl Gene Predictions
RefSeq Genes
ESR1 motif
HS578-Bst Valley
HS1328
HS578T Valley
30019500
HS0356
UCSC[147] was used to plot the data in these gures.
(i) ENST00000402249, PIK3IP1, HS578T:0.092294 rpkm, HS578-Bst:12.3945 rpkm
30019000
30020000
CHAPTER 3. RESULTS
91
Chapter 4
Discussion & Conclusions
Advances in sequencing technologies have allowed for the unbiased examination of global
histone modications within a cell at tenable timeframes and cost. This study took advantage
of the advances by examining genome-wide Histone H3K4me1 modications in several breast
cancer cell lines.
Transcription Factors (TFs) promote or block the recruitment of RNA polymerase.
This
inuence on gene transcription can be modulated by either enhancing or inhibiting the
accessibility of site-specic transcription factors to target loci.
A central problem in TF
biology is how binding sites are selected given the near ubiquity of short and degenerate
recognition motifs and the small fraction of high-anity sites that are actually bound [130].
In these studies, we discovered novel putative activatory and repressive regions.
We saw
that valleys were signicantly enriched for ORegAnno regulatory regions. Thus, we saw that
the bimodal H3K4me1 peaks seem to mark areas of putative transcription factor binding.
These results were consistent with studies by Homan et al. that nd bimodal loci are more
highly occupied than loci with low H3K4me1 [115]. Studies by Robertson et al. have found
that the spatial distribution for H3K4me1 around TF binding sites have found symmetric
anking pairs of enrichment [278].
in such anking pairs or valleys.
We found transcription factor binding site enrichment
This enrichment may be due to modulated accessibil-
ity of chromatin [158], or interactions with molecular eectors involved in recognition of
H3K4me1 [186].
92
CHAPTER 4. DISCUSSION & CONCLUSIONS
93
We found that genes marked with H3K4me1 were more likely to be involved in breast cancer
(p
= 3.0 × 10−15 ).
This is consistent with studies that have found monomethylation of
histone H3K4 has been associated with active transcription of a promoter [324].
This is
evidence that these novel putative activatory and repressive regions have an eect in the
progression of the tumour.
4.1
Valley concordance
To further analyze these novel putative activatory and repressive regions we look for their
concordance in multiple dierent cell lines. This gave us the opportunity of grouping the cell
lines and looking for dierences in those groups. The section below discusses the ndings
when comparing tumourigenic to non-tumourigenic cell lines and also dierent breast cancer
subtypes.
4.1.1
Match control
When comparing a cancer cell line vs. a matched control cell line, these results indicate the
majority of valleys are unshared between the two. This would be expected if H3K4me1 was
an epigenetic modication that directs the transcriptional program of a cancer cell.
4.1.2
Breast cancer subtype
When comparing a pair of basal cell lines with a pair of luminal cell lines, their valleys seem
to be largely unshared. There doesn't seem to be a distinct between valley concordance and
breast cancer subtype. This may indicate that H3K4me1 plays less of a role in breast cancer
subtype specic functions as it does in tumourigenesis in general.
4.1.2.1 Core shared marks
When DNA is conserved across many organisms that indicates the level of importance of
a gene's functionality. Similarly, it could be hypothesized H3K4me1 marks shared between
CHAPTER 4. DISCUSSION & CONCLUSIONS
94
cell lines of dierent breast cancer subtypes are putative activatory or repressive regions
important in tumourigenesis. There were 48 genes marked in two basal cell lines and two
luminal cell lines. Four of those genes CTDSPL, BLCAP, CITED1, and PCDH8, were listed
in the Genes-to-Systems Breast Cancer Database. CCDC18 and KIF6 are listed as having
mutations in breast cancer in the COSMIC database.
Some of the others seem to have
roles in cancer as well. Tpr was found to be a fusion partner with the MET oncogene and
was involved in gastric tumourigenesis [299].
CGREF1 was found in a study predicting
epigenetically regulated genes in breast cancer cell lines [199]. Overexpression of POLQ is
known to be correlated with poor prognosis in early breast cancer patients [114]. OLFML2A
is listed in a patent developing a signature to predict and reduce the risk of metastasis of
breast cancer to lung [213].
AGAP2 is overexpressed in human cancers, including breast
cancer, and prevents apoptosis by up-regulating Akt [33, 3].
There is also evidence that
corticotropin-releasing hormone exerts antiproliferative activity on growth of human breast
cancer cells via the activation of CRH-R1 [97]. Bladder cancer-associated protein is a novel
candidate tumour suppressor gene originally identied from human bladder carcinoma [91].
4.2
Association of valley marked genes with breast cancer tumourigenesis
Using multiple dierent breast cancer cell lines allowed us to examine functional groups. We
nd that genes marked with H3K4me1 valleys in their promoters are enriched for breast
cancer related genes found in the G2SBC (Genes to Systems Breast Cancer database) [227]
(Table 3.3). This is further evidence that the valleys represent novel putative activatory or
repressive regions that could be binding sites for cancer related TFs.
4.3
Marked genes with corresponding expression modulation
There are multiple valley regions found in promoters of genes where a two-fold expression
modulation correlates with the H3K4me1 mark. The H3K4me1 mark would serve to aid the
binding and function of the transcription factor. Activators would be expected to bind to
valleys in the promoter regions of two categories of genes, cancer marked cancer overexpressed
CHAPTER 4. DISCUSSION & CONCLUSIONS
95
and control marked control overexpressed. There were 117 such valleys. Repressors would
be expected to bind to valleys in the promoter regions of two categories of genes, cancer
marked control overexpressed and control marked cancer overexpressed. There were 99 such
valleys.
4.3.1
Functions of H3K4me1 Marked genes with corresponding expression modulation
The analysis of the marked overexpressed genes yielded many functional annotations which
may be related to cancer progression.
Gene Ontology, KEGG and miRBase annotations
were included.
4.3.1.1 Cell cycle checkpoints
Cell cycle machinery controls cell proliferation, and cancer is a disease of inappropriate cell
proliferation. Reduction in sensitivity to signals leads to a cycle of increasing cell number
due to disregulation of signals telling a cell to adhere, dierentiate, or die [52]. Cell cycle
checkpoints sense aws in DNA replication and chromosome segregation [66]. When checkpoints are activated, signals are relayed to the cell cycle-progression machinery causing a
delay in cycle progression, until the danger of mutation has been averted [52]. In addition to
directly repairing DNA breaks or adducts, cells can respond to DNA damage by undergoing
programmed cell death.
Cells with an intact DNA-damage response frequently arrest or
die in response to DNA damage, thus reducing the likelihood of progression to malignancy.
Mutations in mitotic-checkpoint pathways, can thus permit the survival or the continued
growth of cells with genomic abnormalities [141]. An enrichment of the KEGG term cell
cycle, the reactome term cell cycle, mitotic, and GO terms such as cell cycle process,
mitosis, cell cycle checkpoint, and regulation of cell cycle checkpoint point to a breakdown
in the cell cycle checkpoints, possibly contributing to the unregulated growth.
CHAPTER 4. DISCUSSION & CONCLUSIONS
96
4.3.1.2 Metastasis
The metastatic process involves multiple steps, including cell detachment from the primary
tumour, degradation of the basement membrane and ECM, migration into surrounding connective tissue, entry into the vascular or lymphatic circulation, attachment to the endothelial
cells in suitable organs, extravasation from the circulation, and colony formation in the secondary sites [205]. Cellular adhesion molecules are involved in these steps.
4.3.1.3 Cellular adhesion
Proteins involved in focal adhesion are macromolecules through which the cytoskeleton of a
cell connect to the ECM and mediate it's regulatory eects through ECM-receptor interaction pathways [256, 134]. Focal adhesion kinase (FAK) is a protein tyrosine kinase expressed
in invasive breast cancer and eects antiapoptotic signaling [169].
FAK might have roles
both in the later stages of tumour progression, such as invasion and metastasis, promoting
the adhesion of invading cells' metastatis to distant sites. They are also involved in early
stage functions in cancer progression that precede invasion and metastasis [192].
Integrin proteins are major cell surface receptors for extracellular matrix molecules. FAK is a
key component of the signal transduction pathways triggered by integrins [102]. Alterations
to integrin function within human breast cancer may be linked to metastasis [80]. The GO
terms integrin-mediated signaling pathway and integrin binding provide evidence that
these genes may be involved in a breakdown of adhesion.
The mammary gland consists of a ductal epithelial network. These ducts contain two major
layers, a luminal layer of secretory epithelial cells and an outer, basal layer of myoepithelial
cells. The basal surface of the epithelium is a basement membrane (BM) that interacts with
an ECM. The BM is a layer separating basal cells from the extracellular matrix.
During
tumour progression, changes arise that perturb interactions of epithelium and ECM [108].
The degradation of both the myoepithelial cell layer and the basement membrane is a prerequisite for breast cancer invasion and metastasis [209]. The GO terms Extracellular region,
Extracellular matrix part, and Basement membrane possibly indicate that this invasion
is occurring.
CHAPTER 4. DISCUSSION & CONCLUSIONS
97
Tumour cell migration and adhesion and are important features during the switch to the
metastatic state. Actin cytoskeleton is important in these processes and involved in many
aspects of cancer and cancer progression [166]. In normal tissue, broblasts and epithelial
cells locally migrate during wound repair, and white blood cells cross vessel walls. Myoepithelial cells, are contractile and arranged in a similar manner to smooth muscle cells [271].
Their cytoplasm contains the contractile protein actin.
These kinds of processes can be
disregulated to allow malignant cancer cells to move out of the primary tumour and beyond
the boundaries of the tissue or organ where the tumour initially developed [211]. The GO
term Actin binding may indicate such processes are occurring in this cell line.
4.3.2
Angiogenesis
As a tumour gets bigger, it is less able to suciently access the blood vessels. The generation
of vascular stroma is thus essential for solid tumour growth [29]. Vascular stroma formation
is evident in two GO categories, vasculature development and blood vessel morphogenesis.
Studies using the MDA-MB-231 breast cancer cell line had concluded that heparin-binding
growth-associated molecule was found to function as a tumour growth factor [329].
In
another study, the expression of integrins, strongly expressing epidermal growth factor (EGF)
receptors, was increased by addition of the heparin-binding EGF-like growth factor [233].
Heparin-binding proteins can promote angiogenesis in endothelial cells [318].
4.3.3
MicroRNAs
MicroRNAs control gene expression by targeting mRNAs. In previous studies, miRNAs were
identied whose expression was correlated with specic breast cancer biopathologic features,
such as estrogen and progesterone receptor expression, tumour stage, vascular invasion, or
proliferation index [124]. The eect of microRNAs is post-transcriptional and is such not
aected by H3K4me1, but some miRNAs mark genes that are cancer related. There were
several genes that had enrichment of cancer-related miRNAs.
CHAPTER 4. DISCUSSION & CONCLUSIONS
4.4
4.4.1
98
Putative regulatory regions
Relevance of marked overexpressed categories
The categories generated by combining comparisons of monomethylation data and expression
data in dierent functional groups in Tables 3.9 and 3.10 allows us to dene novel putative
regulatory regions. The correlation of a unique valley region with the signicant change in
expression leads us to the hypothesis that these are activatory or repressive regions.
4.4.1.1 Putative activatory region
In a case where an H3K4me1 mark is correlated with overexpression of the downstream gene
in the same cell line, we could expect an activatory transcription factor to be binding within
the valley region of the gene's promoter. The mark would aid the binding and eect of the
activatory transcription factor, contributing to the overexpression of the downstream gene.
4.4.1.2 Putative repressive region
On the other hand, in a case where an H3K4me1 mark is correlated with overexpression
of the downstream gene in a dierent cell line, we could expect a repressive transcription
factor to be binding within the valley region of the gene's promoter. The mark would aid the
binding and eect of the repressive transcription factor, contributing to decreased expression
of the downstream gene.
4.5
Experimentally determined functions of TFs potentially
regulated by valley regions
To test whether these valleys have regulatory functions, they were correlated with motifs.
This was done comparing tumourigenic and non-tumourigenic cell lines in cases where there
were uniquely marked genes that were overexpressed in one of the cell lines. This analysis
nds putative activatory and repressive regions where the H3K4me1 appears to modulate
CHAPTER 4. DISCUSSION & CONCLUSIONS
99
the eect of the transcription factor.
4.5.1
ESR1 and ESR2
There are many examples of the regulatory role of the valleys being supported by what is
known about the TFs binding the motifs in the literature. For example, ESR1 and ESR2 are
found in Table 3.23, the control marked control overexpressed category. This would indicate
the H3K4me1 mark is aiding the activatory role of ER, which is corroborated in the ER
literature.
and
ER is known to have three activation domains AF-1, AF-2, and AF-2a [241],
in vitro
studies show that TATA-binding protein-associated factor interacts with the
AF-2a domain to enhance ER-mediated transcription [28]. The presence of ESR1 and ESR2
in Table 3.24, the cancer marked control overexpressed category, indicates a repressive role
that has not been described in the literature. Possible interpretations of this inconsistency
are discussed in Section 4.7.
EStrogen Receptor 2 (ESR2) is the gene that encodes Estrogen receptor beta (ER-β) and EStrogen Receptor 1 (ESR1) encodes Estrogen receptor beta (ER-α). The ESR's are activated
by the ligand estrogen and aect physiological processes such as growth, dierentiation, and
homeostasis in eukaryotic cells [93]. These TFs are tumour suppressors [136], and are likely
activating genes ghting tumourigenesis such as apoptotic genes.
Breast cancers whose cell growth rate is not aected by the presence of estrogen are estrogen
receptor-negative (ER-). The cell lines used in these studies, HS-578T and HS578Bst, are
known to be ER- [105]. The ER- status may appear to conict with the result indicating
ER bind to promoters of genes causing changes in downstream expression. However, these
results are consistent with the presence of ERRs. ER-related receptors (ERRs) are nuclear
orphan receptors with signicant homology to ERs, which do not bind estrogen.
These
have unknown physiological ligands can take over for estrogen or are constitutively active.
ERRs are known to be able to bind to classic EREs, in which they exert a constitutive
transcriptional activity [100, 118].
These studies were done in ER- cell lines which may
indicate that ERRs are involved in tumourigenesis in this case.
The presence of ESR1 in the results of this breast cancer study are consistent with this
TF's major role in the disease well documented in previous literature.
Studies on breast
CHAPTER 4. DISCUSSION & CONCLUSIONS
cancer samples showed ESR1 amplication in 20.6% of breast cancers [117].
100
The loss of
ER expression causes tumour growth that is no longer under estrogen control and cannot
be stopped by endocrine therapy.
prognosis.
This results in higher tumour aggressiveness and poor
Therefore, ER is a critical growth regulatory gene in breast cancer, and its
expression in breast cancer cells is critical for tumour progression [93].
4.5.2
Egr1
Furthermore, the presence of Egr1 (Early Growth Response Protein 1) in Table 3.25, the
cancer marked cancer overexpressed category, is also corroborated in other studies.
Our
results would indicate the H3K4me1 mark aids the eect of this TF which we would expect
to be an activator. Indeed, Egr1 does have an activation domain; a serine/threonine/prolinerich region between amino acids 174 and 270 [38].
The presence of Egr1 in Table 3.24, the cancer marked control overexpressed category, and
Table 3.26, the control marked cancer overexpressed category, indicate a repressive role for
Egr1. Again the literature conrms that Egr1 has both activatory and repressive domains.
The repressive domain is between amino acids 281-314 to the 5' of it's zinc ngers [92]. In
addition, Swirno et. al found evidence Nab1, a corepressor of Egr-1, was an active, direct
(non-quenching) repressor that appears to work via a direct mechanism. Thus, it interferes
with the function of the general transcription apparatus (GTA) but not that of specic
activating TFs [307].
Egr1 was also shown in other studies to have a role in cancer. It has been found to have
a key role and is a convergence point for many signaling cascades and involved in gene
proliferation, stress responses and apoptosis [57, 196]. This complex TF is known to act as
both a tumour suppressor and a tumour promoter [160]. It's dual roles as tumour suppressor
and tumour promoter, activator and repressor, appear consistent with our nding this TF
in three dierent categories.
There has been previous evidence of Egr1 specically involved in breast cancer as well. It
has been linked to apoptosis and shown to be activated by extracellular signal-regulated kinase [11]. However, EGR1 was previously shown to be needed for TBX2 to repress NDRG1
CHAPTER 4. DISCUSSION & CONCLUSIONS
101
and drive cell proliferation in breast cancer [274]. In mammary normal tissue, Egr-1 expression is low, suggesting a possible relation between the low levels of Egr-1 and the development
of mammary neoplasias [247]. Analyses of the expression of Egr-1 in breast carcinoma cells,
such as MCF-7, demonstrated a relatively high expression of the endogenous Egr-1 in these
cells [247].
Other results in the literature suggest that siRNA-Egr-1 potent antineoplas-
tic agent in suppressing the growth of breast tumour despite the known role of Egr-1 as a
tumour-suppressor in several other types of human cancers [247].
4.5.3
Che-1
In addition, Che-1 is found in Table 3.25, the cancer marked cancer overexpressed category.
This indicates that Che-1 is an activator.
The literature matches this observation, and
previous studies have found Che-1 contains an activation domain [60].
The presence of
Che-1 in Table 3.26, the control marked cancer overexpressed category, and Table 3.24, the
cancer marked control overexpressed category, indicates a repressive role that has not been
described in the literature.
Possible interpretations of this inconsistency are discussed in
Section 4.7.
Che-1 was previously shown to have a a proproliferative role, interacting with the retinoblastoma protein (Rb) and inhibiting its ability to suppress expression of E2F [75]. Furthermore,
Che-1 appears to counteract Par-4 or
β-amyloid
induced apoptosis [83]. In contrast, Che-1
was also shown to have antiproliferative activity by inducing expression of p21Waf1 [246].
4.5.4
EWSR1/Fli-1
Furthermore, EWSR1/Fli-1 is found in Table 3.23, the control marked control overexpressed
category. This is evidence that EWSR1/Fli-1 is an activator. EWSR1/Fli-1 is a chimeric
protein fusing Ewing sarcoma breakpoint region 1 and Friend Leukemia Integration 1 protein.
´
This chimera joins fusing a 5
domain.
part of EWS to the to the 3
EWS-FLI1 can recognize
in vitro
´
half encoding the DNA binding
the same sequences as FLI-1, but is a more
potent transactivator than the wild type FLI-1 [14]. The activatory role for EWSR1/Fli-1
is consistent with our ndings.
CHAPTER 4. DISCUSSION & CONCLUSIONS
102
Our studies also nd EWSR1/Fli-1 in Table 3.24, the cancer marked control overexpressed
category. This is inconsistent with it's activatory role discussed above but there has been
evidence in the literature that could corroborate this eect.
EWS/FLI-1 has been shown
to bind the IGFBP-3 promoter in vitro and in vivo and can repress its activity [264]. This
bivalent role is discussed in a study that has had characterized eight transcripts that are
dependent on EWS/FLI for expression and two transcripts that are repressed in response to
EWS/FLI [27].
4.5.5
Ixr1
Ixr1 is a homeobox gene that encodes the iroquois homeobox 1 protein. Homeobox genes
encode transcription factors that play key roles in the determination and maintenance of
cell fate and cell identity [40].
Ixr1 is found in Table 3.26, the control marked cancer
overexpressed category, indicating a role as a repressor. The specic role of Ixr1 does not
appear to be known decisively, but one study has found it has a possible role as a repressor.
In this study, mutations in IXR1 cause de-repression of COX5B [165].
It also appears in Table 3.25, indicating a activatory role. This could be a novel unknown
function of Ixr1 or there may be other factors involved. For example, homeobox genes do
not generally act alone to determine cell identity.
There is a combinatorial, spatial, and
temporally regulated pattern of homeobox genes functioning in a given cell that determines
the cell's identity [183].
Acting together the genes can be considered a Homeobox code
programming cellular outcome. The binding of this one TF may not determine the outcome
of downstream genes alone.
There is some evidence homeobox genes could be involved in breast cancer.
IRX-2, for
example, is expressed in discrete epithelial cell lineages being found in ductal and lobular
epithelium [184]. IRX-2 expression is maintained in human mammary neoplasias [184].
4.5.6
Tlx1_NFIC
TLX1 is the gene encoding the T-cell leukemia homeobox protein and NFIC is the Nuclear
factor I/C-type protein.
Tlx1_NFIC is found in Table 3.26, the control marked cancer
CHAPTER 4. DISCUSSION & CONCLUSIONS
overexpressed category.
103
This appears to be a case where H3K4me1 marks a valley for
Tlx1_NFIC to bind and repress the downstream genes.
This is consistent with literature that reports TLX1 functions as a bifunctional transcriptional regulator, being capable of activation or repression depending on cell type [277].
Tlx1_NFICis a homeoprotein, that is known to interact with the CCAAT binding transcription factor NFIC [342]. There is evidence for this complex's involvement in previous
cancer studies. TLX1, is essential to spleen organogenesis and oncogenic when aberrantly
expressed in immature T cells [277]. NFIC is upregulated in breast cancer [201].
4.5.7
Tin
Tin is found in Table 3.23, the control marked control overexpressed category. This would
indicate a role for Tin an activator where the H3K4me1 mark in the control is aiding the
binding of Tin resulting of activation of the downstream genes.
This is consistent with
previous studies. The human homologs are NKX2-5 and NKX2-6 (NK2 transcription factor
related) which are members of the NK homeobox family [311]. NKX2-5 has been found to
act either as a specic transcriptional activator or repressor [4]. In addition, apoptosis and
reduced proliferation was observed in Nkx2.5 and Nkx2.6 double-mutant mice [311].
4.5.8
Bcd, oc, and gsc
Bcd encodes bicoid a homeodomain-containing transcriptional factor.
goosecoid homeobox Goosecoid.
Gsc encodes the
Oc encodes the homeobox gene ocelliless [39].
Bcd, oc,
and gsc are found in Tables 3.24, the cancer marked control overexpressed category, and
3.26, the control marked cancer overexpressed category. This would indicate H3K4me1 is
promoting binding of repressors. In the literature, it is found that goosecoid and bicoid act
translational repressors [63, 171]. The presence of Bcd, oc, and gsc in Table 3.26, the control
marked cancer overexpressed category, indicates a repressive role that has not been described
in the literature. Possible interpretations of this inconsistency are discussed in Section 4.7.
These genes are also known to have a role in breast cancer, consistent with our discovery of
their role in breast cancer. Goosecoid, promotes tumour metastasis and is overexpressed in
CHAPTER 4. DISCUSSION & CONCLUSIONS
104
a majority of human breast tumours. Moreover, Goosecoid signicantly enhanced the ability
of breast cancer cells to form pulmonary metastases in mice [109].
4.5.9
IRF1
IRF1 encodes the interferon regulatory factor 1.
IRF1 is found in Table 3.25, the cancer
marked cancer overexpressed category. This could be interpreted as H3K4me1 marking the
binding site and facilitating the role of the activatory protein IRF-1. Indeed, previous studies
conrm, IRF-1 is an activator of transcription [214].
Other studies have also conrmed its involvement in breast cancer. IRF1 behaves as a tumour
suppressor gene in breast cancer through caspase activation and induction of apoptosis [25].
Other studies have shown that ectopic expression of IRF1 using an adenovirus delivery
system led to a decrease in survivin expression and an increase in cell death in breast cancer
cell lines [259].
4.5.10
MEF2A
MEF2A encodes the Myocyte-specic enhancer factor 2A protein. MEF2A is found in Table
3.25. It is found in the cancer marked cancer overexpressed category which indicates that
the H3K4me1 is aiding the activatory function of MEF2A in the cancer cell line.
This
is consistent with ndings in the literature that MEF2 can act either as an activator or
repressor of transcription under dierent circumstances [218].
4.5.11
Sna
Sna encodes the Snail protein [143]. Sna is found in Table 3.23, the control marked control
overexpressed category. This would indicate that Snail has activatory capabilities. This is
inconsistent with previous reports of Snail's repressive SNAG domain [238]. It is consistent
with one study that found a Snail-type TF, CES-1, which also binds to E-boxes, was found
to activate transcription
in vivo
[275].
CHAPTER 4. DISCUSSION & CONCLUSIONS
105
Figure 4.1: Snail1 complex [44]
In other cancer studies, Snail was also observed. It's expression has been detected in a number of dierent human carcinoma and melanoma cell lines [252]. Snail is sucient to promote
mammary tumour recurrence
in vivo
[224]. High levels of Snail predict decreased relapse-
free survival in women with breast cancer [224].
Snail has been associated to the lymph
node status and/or invasiveness of ductal breast carcinomas [21]. Snail expression has been
shown to confer resistance to cell death mediated by several factors and chemotherapeutic
agents [133, 316].
Snail was found to be upregulated in recurrent tumours.
This recurrence is accompanied
by epithelial-to-mesenchymal transition (EMT). Snail has been shown to be sucient in
inducing EMT in primary tumour cells [224]. However, silencing of Snail by stable RNA interference has been shown to induce a complete mesenchymal to epithelial transition (MET),
associated to the upregulation of E-cadherin, downregulation of mesenchymal markers, and
inhibition of invasion [242].
Other studies have also veried cross-talk between Snail and epigenetic factors. As shown
in Figure 4.1, Snail physically interacts with, and recruits, the histone demethylase LSD1 to
epithelial gene promoters. The Snag domain of Snai1 is sucient for interaction with the
LSD1 complex [194]. LSD1 removes dimethylation of lysine 4 on histone H3 (H3K4me2
H3K4me1/H3K4me0) and in the absence of LSD1, Snai1 fails to repress E-cadherin
[194].
LSD1 associates with co-repressors including HDAC1/2 and CoREST to form a core ternary
complex.
This is recruited to chromatin and can eciently bind and modify nucleosomal
CHAPTER 4. DISCUSSION & CONCLUSIONS
substrates to repress transcription
106
[167]. Previous studies have shown that Snail induces
repressive histone modications at the E-cadherin promoter through recruitment of histone
deacetylases (HDACs) and a H3K27 methyltransferase [251, 112].
4.5.12
Stat3
Stats are a family of latent transcription factors that mediate signalling from cytokines and
growth factors. Signal Transducers and Activators of Transcription 3 protein (Stat3) regulates the transcriptional activation of VEGF (vascular endothelial growth factor) [96]. Stat3
is found in Table 3.24, the cancer marked control overexpressed category. This is inconsistent
with literature that nds STAT family members are transcription activators [300].
Stat3 was also found in other cancer studies.
oncogenic signalling pathways.
It is a point of convergence for numerous
It is constitutively activated both in tumour cells and in
immune cells in the tumour microenvironment through consitutive phosphorylation on Tyrosine [340].
Stat3 plays a key role in many cellular processes such as cell proliferation,
survival, invasion, and tumour angiogenesis [1]. In lung cancer, Stat3 transduces survival
signals downstream of tyrosine kinases such as Src, EGF-R, and c-Met, as well as cytokines
such as IL-6
[300]. Stat3 has been found to be essential for the early phase of mammary
gland involution [1]. Involution is characterized by extensive apoptosis of the epithelial cells
and a dramatic switch from survival to death signalling.
4.5.13
REST
RE1-Silencing Transcription factor encodes a transcriptional repressor. REST was initially
proposed to silence the transcription of neuronal genes in non-neuronal cells.
known to have essential roles in both neuronal and non-neuronal cells.
It is now
REST is found
in Table 3.25, the cancer marked cancer overexpressed category. This is inconsistent with
previous literature that says REST is thought to repress genes by binding to a 1733 base
pair neuron-restrictive silencer element [288, 41, 289].
This inconsistency may be due to post-transcriptional regulation of REST. Previous studies
CHAPTER 4. DISCUSSION & CONCLUSIONS
107
Figure 4.2: Various REST isoforms [76]
have found this occurs during oncogenic transformation [320]. Protein levels can be significantly reduced in the absence of altered mRNA levels [330].
REST can therefore not be
directly measured by its mRNA levels in breast tumours, such as our studies. The level of
RNA-Seq coverage we have allows us to observe changes in expression, but not observe SNPs
conclusively at a base pair level. Thus there may be a isoform due to a mutation to a stop
codon that we do not observe.
The various splice isoforms for REST are shown in Figure 4.2.
Isoform 1 consists of two
repression domains (RD1 and RD2) and a DNA-binding Domain (DBD). Expression of a
dominant negative form of REST derepresses a promoter [42]. REST4 or Isoform 3 in Figure
4.2 lacks RD2 and has a truncated DNA binding domain, but retains zinc ngers 1-5 and
nuclear localization [76]. Re-expression of functional REST in REST4-expressing cells has
been shown to induce apoptosis, suggesting that suppression of REST function is key to
survival of these cells [104].
CHAPTER 4. DISCUSSION & CONCLUSIONS
108
Other studies conrm REST's involvement in breast cancer. Loss of REST has been found
to result in a highly aggressive breast cancer disease course [320]. Also, RESTless tumours
have signicantly increased tumour size and lymph node involvement [320]. Furthermore,
patients with RESTless breast cancer undergo signicantly more early disease recurrence
than those with fully functional REST, regardless of estrogen receptor or HER2 status [320].
Also, other studies have conrmed cross-talk of REST with epigenetic factors. REST has
been found to act as a hub for the recruitment of multiple chromatin-modifying enzymes such
as histone deacetylases (HDACs) and histone methyltransferases (HMTs) [244]. CoREST has
been found to enhance the ability of LSD1, a known histone H3K4 histone demethylase, to
reverse methylation and protects LSD1 from proteasomal degradation in vivo [175]. Figure
4.1 shows such an example complex containing REST.
4.6
Experimental validation
These experiments give us a better understanding of key molecular targets that underlie
the pathways that are associated with disease development.
By inhibiting a gene with
proproliferative roles we might slow the progression of breast cancer in an individual. We
could do further research to validate such genes as potential targets.
RNA interference
(RNAi) approaches are an eective means of target validation [125].
This method would
allow us to model the pharmacological inhibition of a target protein.
RNAi is a valuable
laboratory research tool, both in cells, and in whole animal models [125].
RNAi is a naturally occurring mechanism that controls gene expression at the post-transcriptional
level [125]. In eukaryotes, double-stranded interfering RNAs target complementary mRNAs
for degradation. RNAi can be eected in mammalian cells by the use of small-interfering
RNA (siRNA) duplexes that silence gene expression without inducing an inhibitory interferon
response [219, 30, 219, 125]. siRNAs can either be directly introduced into cells by transfection or can be generated within the cell by introducing plasmids that express short-hairpin
RNA (shRNA) precursors of siRNAs [125]. shRNAs are processed by the DICER enzyme
into siRNAs, which, in turn, enable transcript degradation by binding to a complementary
mRNA in the context of the RNA-induced silencing complex (RISC) [125].
CHAPTER 4. DISCUSSION & CONCLUSIONS
109
Once the inhibition of the gene has been modelled, assays can be run to test the eect. Some
of the assays described in the literature have included the luminescent measurement of cell
viability [204], a wound-healing assay modelling cell motility [51], the use of a uorescencebased plasmid reporter system measuring proteasome function and microscopic image analysis as a measure of mitotic progression [125].
4.7
Uncorroborated experimental results
There are other cases where, while our ndings that a TF is involved in cancer appears
to be supported by previous literature, the activatory or repressive role we predict is not
corroborated in the literature. This section discusses some of the possible reasons for those
cases.
4.7.1
Post-transcriptional regulation
Post-transcriptional regulation is when protein levels are signicantly reduced in the absence of altered mRNA levels [320]. An example of this is REST post-transcriptional regulation which occurs during oncogenic transformation [320]. REST is regulated by ubiquitinmediated proteolysis.
degradation.
β-TRCP
β-TRCP
is the specic E3 ubiquitin ligase responsible for REST
overexpression causes oncogenic transformation of human mammary
epithelial cells and this pathogenic function requires REST degradation [330]. This kind of
post-transcriptional regulation would not be evident in a ChIP-seq experiment.
4.7.2
Co-regulators
Transcription coregulators interact with transcription factors to either activate or repress the
transcription of specic genes [95]. Nuclear co-regulators act cooperatively with transcription factors to establish patterns of gene expression and thus provide functional exibility
in specifying transcriptional regulation [130].
An example of this is NKX2-5, the human
Tin homolog, whose transcriptional activity is modulated positively and negatively by its
respective binding partner, Tbx-5 or Tbx-2, in a region-specic manner [4].
CHAPTER 4. DISCUSSION & CONCLUSIONS
4.8
4.8.1
110
Progressive methylation
Binding strengths of eectors
It is known that both H3K4me1 and H3K4me3 can be present in sites proximal to the TSS.
Furthermore, there has been work done to characterize eectors of the H3K4me1 mark,
proteins that recognize the methylation at lysine H3K4 and eect change. NURF is one such
protein with a BPTF PHD nger that recognizes H3K4 methylation [186]. It was found that
binding was most tight to H3K4me1 but binding to other methylation states possible, though
weaker [186]. There was a gradient of binding anity H3K4me3 > H3K4me2 > H3K4me1 >
H3K4me0 [186]. Both H3K4me1 and H3K4me3 are activatory but we can expect H3K4me3
to have a stronger eect if all of the eectors are like NURF.
4.8.2
H3K4me3 unobserved in these studies
4.8.2.1 Expected case
There are dierent ways to interpret an H3K4me1 mark. It could be an increased activatory
mark from an H3K4me0. In this case we would expect an activator to bind a valley where
a unique H3K4me1 mark correlates with increased expression.
4.8.2.2 Methylation states
In these studies, we are only examining H3K4me1 and not other methylation states. There is
evidence that distinct methylation states could reect the stability of histone lysine methylation. Stability would gradually increase from mono-, di-, and nally to trimethylation of
the various histone lysine positions [163]. Furthermore, there is evidence that eciency of
readout of the dierent methylation states increases from mono-, di-, and nally to trimethylation [186].
CHAPTER 4. DISCUSSION & CONCLUSIONS
111
H3K4me3
TF
H3K4me1
Promoter
TSS
Figure 4.3: Low H3K4me1 could indicate higher H3K4me3
4.8.2.3 Reasons for unexpected case
Thus, the presence of H3K4me1 could indicate a decreased activatory mark from a H3K4me3.
Furthermore, H3K4me1 and H3K4me3 are competitive and a low H3K4me1 could indicate
a higher unseen H3K4me3.
In this case, H3K4me1 would decreased as these sites were
progressively methylated to H3K4me2 and/or H3K4me3 [186]. Thus H3K4me1 can be used
to assess genes that have modulated expression in oncogenesis but it is dicult to use this
mark alone to determine if a particular gene is activated or repressed.
4.9
Epigenetic crosstalk
In addition to various dierent methylation states on histone H3 lysine 4, there are other
residues that can be methylated and many types of histone modications. They include histone acetylation, phosphorylation, ubiquitination, sumoylation, ADP-ribosylation, biotinylation, proline isomerization, and histone methylation [314].
These modications act together creating Histone Code [303]. According to the histone
code, distinct combinations of histone modications are related to specic chromatin-related
functions and processes [128]. Multiple modications can help tip the balance of one chromatin state to another, making the underlying DNA more or less accessible to the protein
machinery. These histone modications generate a language that is interpreted through the
ability to recruit the proteins that modulate chromatin. While our experiments are unique
CHAPTER 4. DISCUSSION & CONCLUSIONS
112
in that they nd novel regulatory regions in a genomewide comparision of multiple breast
cancer cell lines, there have been other studies of histone modications.
Previous studies
have also found them to have roles in gene transcription, DNA repair, mitosis, meiosis,
development and in apoptosis [90].
Many dierent epigenetic modications have been described in the human genome and have
been previously shown to play diverse roles in gene regulation, cellular dierentiation and
the onset of disease [69].
Studying individual modications can help us nd links to the
activity levels of various genetic functional elements.
However, to better understand the
complete eect, the combinatorial patterns of many epigenetic factors must be considered.
I think this study shows how H3K4me1 contributes an eect to gene expression, but this
mark alone does not explain all gene expression eects.
4.10
Conclusions
In conclusion, bimodal H3K4me1 peak forms valleys, which are putative regulatory regions.
In these regions, transcription factor binding is enriched, and some H3K4me1 marked genes
are involved in cancer progression or anti-cancer functions. Genes marked multiple breast
cancer cell lines may be important for tumour progression. Finally, correlating motifs found
in valleys with overexpression data has yielded genes with important functions in breast
cancer.
Bibliography
[1] Kathrine Abell, Antonio Bilancio, Richard W E Clarkson, Paul G Tien, Anton I Altaparmakov, Thomas G Burdon, Tomoichiro Asano, Bart Vanhaesebroeck, and Christine J Watson. Stat3-induced apoptosis requires a molecular switch in pi(3)k subunit
composition.
Nat Cell Biol, 7(4):392398, Apr 2005.
[2] Mohamed Abu-Farha, Jean-Philippe Lambert, Ashraf S Al-Madhoun, Fred Elisma,
Ilona S Skerjanc, and Daniel Figeys. The tale of two domains: proteomics and genomics
analysis of smyd2, a new histone methyltransferase.
Mol Cell Proteomics, 7(3):560572,
Mar 2008.
[3] Jee-Yin Ahn, Yuanxin Hu, Todd G Kroll, Paulette Allard, and Keqiang Ye. Pike-a is
amplied in human cancers and prevents apoptosis by up-regulating akt.
Proc Natl
Acad Sci U S A, 101(18):69936998, May 2004.
[4] Hiroshi Akazawa and Issei Komuro. Cardiac transcription factor csx/nkx2-5: Its role
in cardiac development and diseases.
Pharmacol Ther, 107(2):252268, Aug 2005.
[5] F. Albertorio, M. E. Hughes, J. A. Golovchenko, and D. Branton.
dna-carbon nanotube interactions:
control.
Base dependent
Activation enthalpies and assembly-disassembly
Nanotechnology, 20(39):395101, 2009.
[6] Donna G Albertson, Colin Collins, Frank McCormick, and Joe W Gray. Chromosome
aberrations in solid tumors.
Nat Genet, 34(4):369376, Aug 2003.
[7] C. David Allis, Shelley L Berger, Jacques Cote, Sharon Dent, Thomas Jenuwien, Tony
Kouzarides, Lorraine Pillus, Danny Reinberg, Yang Shi, Ramin Shiekhattar, Ali Shilatifard, Jerry Workman, and Yi Zhang. New nomenclature for chromatin-modifying
enzymes.
Cell, 131(4):633636, Nov 2007.
113
BIBLIOGRAPHY
114
[8] S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman.
alignment search tool.
Basic local
J Mol Biol, 215(3):403410, Oct 1990.
[9] M. Ashburner, C. A. Ball, J. A. Blake, D. Botstein, H. Butler, J. M. Cherry, A. P.
Davis, K. Dolinski, S. S. Dwight, J. T. Eppig, M. A. Harris, D. P. Hill, L. Issel-Tarver,
A. Kasarskis, S. Lewis, J. C. Matese, J. E. Richardson, M. Ringwald, G. M. Rubin,
and G. Sherlock. Gene ontology: tool for the unication of biology. the gene ontology
consortium.
Nat Genet, 25(1):2529, May 2000.
[10] S. Audic and J. M. Claverie.
The signicance of digital gene expression proles.
Genome Res, 7(10):986995, Oct 1997.
[11] S.J. Baek, L.C. Wilson, L.C. Hsi, and T.E. Eling.
Troglitazone, a peroxisome
proliferator-activated receptor gamma (ppar gamma ) ligand, selectively induces the
early growth response-1 gene independently of ppar gamma. a novel mechanism for its
anti-tumorigenic activity.
J Biol Chem, 278:584553, 2003.
[12] T. L. Bailey and C. Elkan. Fitting a mixture model by expectation maximization to
discover motifs in biopolymers.
Proc Int Conf Intell Syst Mol Biol, 2:2836, 1994.
[13] Timothy L Bailey, Nadya Williams, Chris Misleh, and Wilfred W Li. Meme: discovering and analyzing dna and protein sequence motifs.
Nucleic Acids Res, 34(Web Server
issue):W369W373, Jul 2006.
[14] R. A. Bailly, R. Bosselut, J. Zucman, F. Cormier, O. Delattre, M. Roussel, G. Thomas,
and J. Ghysdael. Dna-binding and transcriptional activation properties of the ews-i-1
fusion protein resulting from the t(11;22) translocation in ewing sarcoma.
Mol Cell
Biol, 14(5):32303241, May 1994.
[15] Andrew J Bannister and Tony Kouzarides.
Reversing histone methylation.
Nature,
436(7054):11031106, Aug 2005.
[16] Artem Barski, Suresh Cuddapah, Kairong Cui, Tae-Young Roh, Dustin E Schones,
Zhibin Wang, Gang Wei, Iouri Chepelev, and Keji Zhao. High-resolution proling of
histone methylations in the human genome.
Cell, 129(4):82337, May 2007.
[17] Yoav Benjamini and Daniel Yekutieli. The control of the false discovery rate in multiple
testing under dependency.
Annals of Statistics, 29:11651188, 2001.
BIBLIOGRAPHY
115
[18] David R Bentley, Shankar Balasubramanian, Harold P Swerdlow, Georey P Smith,
John Milton, Clive G Brown, Kevin P Hall, Dirk J Evers, Colin L Barnes, Helen R
Bignell, Jonathan M Boutell, Jason Bryant, Richard J Carter, R. Keira Cheetham, Anthony J Cox, Darren J Ellis, Michael R Flatbush, Niall A Gormley, Sean J Humphray,
Leslie J Irving, Mirian S Karbelashvili, Scott M Kirk, Heng Li, Xiaohai Liu, Klaus S
Maisinger, Lisa J Murray, Bojan Obradovic, Tobias Ost, Michael L Parkinson, Mark R
Pratt, Isabelle M J Rasolonjatovo, Mark T Reed, Roberto Rigatti, Chiara Rodighiero,
Mark T Ross, Andrea Sabot, Subramanian V Sankar, Aylwyn Scally, Gary P Schroth,
Mark E Smith, Vincent P Smith, Anastassia Spiridou, Peta E Torrance, Svilen S
Tzonev, Eric H Vermaas, Klaudia Walter, Xiaolin Wu, Lu Zhang, Mohammed D Alam,
Carole Anastasi, Ify C Aniebo, David M D Bailey, Iain R Bancarz, Saibal Banerjee,
Selena G Barbour, Primo A Baybayan, Vincent A Benoit, Kevin F Benson, Claire
Bevis, Phillip J Black, Asha Boodhun, Joe S Brennan, John A Bridgham, Rob C
Brown, Andrew A Brown, Dale H Buermann, Abass A Bundu, James C Burrows,
Nigel P Carter, Nestor Castillo, Maria Chiara E Catenazzi, Simon Chang, R. Neil
Cooley, Natasha R Crake, Olubunmi O Dada, Konstantinos D Diakoumakos, Belen Dominguez-Fernandez, David J Earnshaw, Ugonna C Egbujor, David W Elmore,
Sergey S Etchin, Mark R Ewan, Milan Fedurco, Louise J Fraser, Karin V Fuentes Fajardo, W. Scott Furey, David George, Kimberley J Gietzen, Colin P Goddard, George S
Golda, Philip A Granieri, David E Green, David L Gustafson, Nancy F Hansen, Kevin
Harnish, Christian D Haudenschild, Narinder I Heyer, Matthew M Hims, Johnny T Ho,
Adrian M Horgan, Katya Hoschler, Steve Hurwitz, Denis V Ivanov, Maria Q Johnson,
Terena James, T. A. Huw Jones, Gyoung-Dong Kang, Tzvetana H Kerelska, Alan D
Kersey, Irina Khrebtukova, Alex P Kindwall, Zoya Kingsbury, Paula I Kokko-Gonzales,
Anil Kumar, Marc A Laurent, Cynthia T Lawley, Sarah E Lee, Xavier Lee, Arnold K
Liao, Jennifer A Loch, Mitch Lok, Shujun Luo, Radhika M Mammen, John W Martin,
Patrick G McCauley, Paul McNitt, Parul Mehta, Keith W Moon, Joe W Mullens,
Taksina Newington, Zemin Ning, Bee Ling Ng, Sonia M Novo, Michael J O'Neill,
Mark A Osborne, Andrew Osnowski, Omead Ostadan, Lambros L Paraschos, Lea
Pickering, Andrew C Pike, Alger C Pike, D. Chris Pinkard, Daniel P Pliskin, Joe Podhasky, Victor J Quijano, Come Raczy, Vicki H Rae, Stephen R Rawlings, Ana Chiva
BIBLIOGRAPHY
116
Rodriguez, Phyllida M Roe, John Rogers, Maria C Rogert Bacigalupo, Nikolai Romanov, Anthony Romieu, Rithy K Roth, Natalie J Rourke, Silke T Ruediger, Eli Rusman, Raquel M Sanches-Kuiper, Martin R Schenker, Josena M Seoane, Richard J
Shaw, Mitch K Shiver, Steven W Short, Ning L Sizto, Johannes P Sluis, Melanie A
Smith, Jean Ernest Sohna Sohna, Eric J Spence, Kim Stevens, Neil Sutton, Lukasz
Szajkowski, Carolyn L Tregidgo, Gerardo Turcatti, Stephanie Vandevondele, Yuli Verhovsky, Selene M Virk, Suzanne Wakelin, Gregory C Walcott, Jingwen Wang, Graham J Worsley, Juying Yan, Ling Yau, Mike Zuerlein, Jane Rogers, James C Mullikin,
Matthew E Hurles, Nick J McCooke, John S West, Frank L Oaks, Peter L Lundberg,
David Klenerman, Richard Durbin, and Anthony J Smith.
genome sequencing using reversible terminator chemistry.
Accurate whole human
Nature,
456(7218):5359,
Nov 2008.
[19] Carolin Berner, Eva AumÃ×ller, Anne Gnauck, Manuela Nestelberger, A. Just, and
Alexander G Haslberger. Epigenetic control of estrogen receptor expression and tumor
suppressor genes is modulated by bioactive food compounds.
Ann Nutr Metab, 57(3-
4):183189, 2010.
[20] Donald A Berry, Constance Cirrincione, I. Craig Henderson, Marc L Citron, Daniel R
Budman, Lori J Goldstein, Silvana Martino, Edith A Perez, Hyman B Muss, Larry Norton, Cliord Hudis, and Eric P Winer. Estrogen-receptor status and outcomes of modern chemotherapy for patients with node-positive breast cancer.
JAMA, 295(14):1658
1667, Apr 2006.
[21] Maria J Blanco, Gema Moreno-Bueno, David Sarrio, Annamaria Locascio, Amparo
©
Cano, JosÃ
Palacios, and M. Angela Nieto.
Correlation of snail expression with
histological grade and lymph node status in breast carcinomas.
Oncogene, 21(20):3241
3246, May 2002.
[22] J. M. Bland and D. G. Altman. Multiple signicance tests: the bonferroni method.
BMJ, 310(6973):170, Jan 1995.
[23] K. I. Bland, M. M. Konstadoulakis, M. P. Vezeridis, and H. J. Wanebo.
Oncogene
protein co-expression. value of ha-ras, c-myc, c-fos, and p53 as prognostic discriminants
for breast carcinoma.
Ann Surg, 221(6):70618; discussion 71820, Jun 1995.
BIBLIOGRAPHY
117
[24] Fiona M Blows, Kristy E Driver, Marjanka K Schmidt, Annegien Broeks, Flora E
van Leeuwen, Jelle Wesseling, Maggie C Cheang, Karen Gelmon, Torsten O Nielsen,
¿
©
Carl Blomqvist, PÃ
slen, Louis R BÃ
ivi HeikkilÃ
¿
, Tuomas Heikkinen, Heli Nevanlinna, Lars A Ak-
gin, William D Foulkes, Fergus J Couch, Xianshu Wang, Vicky
Cafourek, Janet E Olson, Laura Baglietto, Graham G Giles, Gianluca Severi, Catriona A McLean, Melissa C Southey, Emad Rakha, Andrew R Green, Ian O Ellis, Mark E
Sherman, Jolanta Lissowska, William F Anderson, Angela Cox, Simon S Cross, Malcolm W R Reed, Elena Provenzano, Sarah-Jane Dawson, Alison M Dunning, Manjeet
Humphreys, Douglas F Easton, Montserrat GarcÃa-Closas, Carlos Caldas, Paul D
Pharoah, and David Huntsman. Subtyping of breast cancer by immunohistochemistry
to investigate a relationship between subtype and short and long term survival: a collaborative analysis of data for 10,159 cases from 12 studies.
PLoS Med, 7(5):e1000279,
2010.
[25] Kerrie B Bouker, Todd C Skaar, Rebecca B Riggins, David S Harburger, David R
Fernandez, Alan Zwart, Antai Wang, and Robert Clarke. Interferon regulatory factor1 (irf-1) exhibits tumor suppressor activities in breast cancer associated with caspase
activation and induction of apoptosis.
Carcinogenesis, 26(9):15271535, Sep 2005.
[26] D. Branton, D. W. Deamer, A. Marziali, H. Bayley, S. A. Benner, T. Butler, M. Di
Ventra, S. Garaj, A. Hibbs, X. Huang, S. B. Jovanovich, P. S. Krstic, S. Lindsay,
X. S. Ling, C. H. Mastrangelo, A. Meller, J. S. Oliver, Y. V. Pershin, J. M. Ramsey,
R. Riehn, G. V. Soni, V. Tabard-Cossa, M. Wanunu, M. Wiggin, and J. A. Schloss.
The potential and challenges of nanopore sequencing.
Nat. Biotechnol., 26(10):1146 1153, 2008.
[27] B. S. Braun, R. Frieden, S. L. Lessnick, W. A. May, and C. T. Denny.
Identica-
tion of target genes for the ewing's sarcoma ews/i fusion protein by representational
dierence analysis.
Mol Cell Biol, 15(8):46234630, Aug 1995.
[28] Miguel H. Bronchud, editor.
Principles of Molecular Oncology.
Humana Press, 2
edition, 1 2000.
[29] L. F. Brown, A. J. Guidi, S. J. Schnitt, L. Van De Water, M. L. Iruela-Arispe, T. K.
Yeo, K. Tognazzi, and H. F. Dvorak.
Vascular stroma formation in carcinoma in
BIBLIOGRAPHY
118
situ, invasive carcinoma, and metastatic carcinoma of the breast.
Clin Cancer Res,
5(5):10411056, May 1999.
[30] Thijn R Brummelkamp, RenÃ
©
Bernards, and Reuven Agami. A system for stable
expression of short interfering rnas in mammalian cells.
Science,
296(5567):550553,
Apr 2002.
[31] Jeremy Buhler and Martin Tompa. Finding motifs using random projections.
J Comput
Biol, 9(2):225242, 2002.
[32] Sarah E Burdall, Andrew M Hanby, Mark R J Lansdown, and Valerie Speirs. Breast
cancer cell lines: friend or foe?
Breast Cancer Res, 5(2):8995, 2003.
[33] Yi Cai, Jianghua Wang, Rile Li, Gustavo Ayala, Michael Ittmann, and Mingyao Liu.
Ggap2/pike-a directly activates both the akt and nuclear factor-kappab pathways and
promotes prostate cancer progression.
Cancer Res, 69(3):819827, Feb 2009.
[34] R. Cailleau, M. Olive, and Q. V. Cruciger. Long-term human breast carcinoma cell
lines of metastatic origin: preliminary characterization.
In Vitro, 14(11):911915, Nov
1978.
[35] Xia Cao, Shuai Cheng Li, and Anthony K. H. Tung.
Indexing dna sequences using
q-grams. pages 416, 2005.
[36] L. R. Cardon and G. D. Stormo. Expectation maximization algorithm for identifying
protein-binding sites with variable lengths from unaligned dna fragments.
J Mol Biol,
223(1):159170, Jan 1992.
[37] Lisa A Carey, Charles M Perou, Chad A Livasy, Lynn G Dressler, David Cowan,
Kathleen Conway, Gamze Karaca, Melissa A Troester, Chiu Kit Tse, Sharon Edmiston,
Sandra L Deming, Joseph Geradts, Maggie C U Cheang, Torsten O Nielsen, Patricia G
Moorman, H. Shelton Earp, and Robert C Millikan. Race, breast cancer subtypes, and
survival in the carolina breast cancer study.
JAMA, 295(21):24922502, Jun 2006.
[38] J. A. Carman and J. G. Monroe. The egr1 protein contains a discrete transcriptional
regulatory domain whose deletion results in a truncated protein that blocks egr1induced transcription.
DNA Cell Biol, 14(7):581589, Jul 1995.
BIBLIOGRAPHY
119
[39] M. Carr, I. Hurley, K. Fowler, A. Pomiankowski, and H.K. Smith.
Expression of
defective proventriculus during head capsule development is conserved in drosophila
and stalk-eyed ies (diopsidae).
Dev Genes Evol, 215:4029, 2005.
[40] Hexin Chen and Saraswati Sukumar. Role of homeobox genes in normal mammary
gland development and breast tumorigenesis.
J Mammary Gland Biol Neoplasia,
8(2):159175, Apr 2003.
[41] A. Cheong, A.J. Bingham, J. Li, B. Kumar, P. Sukumar, C. Munsch, N.J. Buckley, C.B.
Neylon, K.E. Porter, D.J. Beech, and I.C. Wood.
Downregulated rest transcription
factor is a switch enabling critical potassium channel expression and cell proliferation.
Mol Cell, 20:4552, 2005.
[42] J. A. Chong, J. Tapia-RamÃrez, S. Kim, J. J. Toledo-Aral, Y. Zheng, M. C. Boutros,
Y. M. Altshuller, M. A. Frohman, S. D. Kraner, and G. Mandel.
Rest:
a mam-
malian silencer protein that restricts sodium channel gene expression to neurons.
Cell,
80(6):949957, Mar 1995.
[43] Jesper Christensen, Karl Agger, Paul A C Cloos, Diego Pasini, Simon Rose, Lau
Sennels, Juri Rappsilber, Klaus H Hansen, Anna Elisabetta Salcini, and Kristian Helin.
Rbp2 belongs to a family of demethylases, specic for tri-and dimethylated lysine 4 on
histone 3.
Cell, 128(6):10631076, Mar 2007.
[44] Gerhard Christofori.
Snail1 links transcriptional control with epigenetic regulation.
EMBO J, 29(11):17871789, Jun 2010.
[45] Jonas Cicenas.
The potential role of the egfr/erbb2 heterodimer in breast cancer.
Expert Opinion on Therapeutic Patents, 17(6):607616, 2007.
[46] Rachel Ann Clark, Roy Levine, and Suzanne Snedeker. The biology of breast cancer,
fact sheet 5. Technical report, Cornell University, College of Veterinary Medicine, Vet
Box 31, Ithaca, NY 14853-6401, October 1997. [Online; accessed 19-July-2010].
[47] P. M. Clissold and C. P. Ponting. Jmjc: cupin metalloenzyme-like domains in jumonji,
hairless and phospholipase a2beta.
Trends Biochem Sci, 26(1):79, Jan 2001.
[48] Nicole Cloonan, Alistair R R Forrest, Gabriel Kolle, Brooke B A Gardiner, Georey J
Faulkner, Mellissa K Brown, Darrin F Taylor, Anita L Steptoe, Shivangi Wani, Graeme
BIBLIOGRAPHY
120
Bethel, Alan J Robertson, Andrew C Perkins, Stephen J Bruce, Clarence C Lee,
Swati S Ranade, Heather E Peckham, Jonathan M Manning, Kevin J McKernan,
and Sean M Grimmond.
sequencing.
Stem cell transcriptome proling via massive-scale mrna
Nat Methods, 5(7):613619, Jul 2008.
[49] Elisabeth D Coene, Catarina Gadelha, Nicholas White, Ashraf Malhas, Benjamin
Thomas, Michael Shaw, and David J Vaux. A novel role for brca1 in regulating breast
cancer cell spreading and motility.
J Cell Biol, 192(3):497512, Feb 2011.
[50] Collins. Collins english dictionary: 30th anniversary edition (dictonary). 6 2010.
[51] Cynthia S Collins, Jiyong Hong, Lisa Sapinoso, Yingyao Zhou, Zheng Liu, Kenneth
Micklash, Peter G Schultz, and Garret M Hampton. A small interfering rna screen for
modulators of tumor cell motility identies map4k4 as a promigratory kinase.
Proc
Natl Acad Sci U S A, 103(10):37753780, Mar 2006.
[52] Kathleen Collins, Tyler Jacks, and Nikola P. Pavletich.
The cell cycle and cancer.
Proceedings of the National Academy of Sciences of the United States of America,
94(7):27762778, 1997.
[53] Gene Ontology Consortium. Creating the gene ontology resource: design and implementation.
Genome Res, 11(8):14251433, Aug 2001.
[54] Carlo M. Croce. Oncogenes and cancer.
New England Journal of Medicine, 358(5):502
511, 2008.
[55] Modan K Das and Ho-Kwok Dai.
A survey of dna motif nding algorithms.
BMC
Bioinformatics, 8 Suppl 7:S21, 2007.
[56] J. R. Daviea. Histone modications.
New Compr. Biochem., 39(03):205 240, 2004.
[57] I. de Belle, R. P. Huang, Y. Fan, C. Liu, D. Mercola, and E. D. Adamson. p53 and
Egr-1 additively suppress transformed growth in HT1080 cells but Egr-1 counteracts
p53-dependent apoptosis.
Oncogene, 18:36333642, Jun 1999.
[58] Geneviève P Delcuve, Mojgan Rastegar, and James R Davie. Epigenetic control.
Cell. Physiol., 219(2):24350, May 2009.
J.
BIBLIOGRAPHY
121
[59] I. Van der Auwera, R. Limame, P. van Dam, P. B. Vermeulen, L. Y. Dirix, and
S. J. Van Laere. Integrated mirna and mrna expression proling of the inammatory
breast cancer subtype.
Br J Cancer, 103(4):532541, Aug 2010.
[60] Agata Desantis, Annalisa Onori, Maria Grazia Di Certo, Elisabetta Mattei, Maurizio
Fanciulli, Claudio Passananti, and Nicoletta Corbi. Novel activation domain derived
from che-1 cofactor coupled with the articial protein jazz drives utrophin upregulation.
Neuromuscul Disord, 19(2):158162, Feb 2009.
[61] V. G. Deshpande and P. K. Ranjekar. Repetitive dna in three gramineae species with
low dna content.
Hoppe Seylers Z Physiol Chem, 361(8):12231233, Aug 1980.
[62] Peter D'Eustachio. Reactome knowledgebase of human biological pathways and processes.
Methods Mol Biol, 694:4961, 2011.
[63] J. Dubnau and G. Struhl. Rna recognition and translational regulation by a homeodomain protein.
Nature, 379(6567):694699, Feb 1996.
[64] Richard Durbin, Sean R. Eddy, Anders Krogh, and Graeme Mitchison.
Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids.
Biological
Cambridge
University Press, 1 edition, 5 1998.
[65] S. R. Eddy. Prole hidden markov models.
Bioinformatics, 14(9):755763, 1998.
[66] S. J. Elledge.
preventing an identity crisis.
Cell cycle checkpoints:
Science,
274(5293):16641672, Dec 1996.
[67] I. T. Ernberg. Oncogenes and tumor growth factors in breast cancer. a minireview.
Acta Oncol, 29(3):331334, 1990.
[68] E. Ernst. Mistletoe for cancer?
Eur J Cancer, 37(1):911, Jan 2001.
[69] Jason Ernst and Manolis Kellis. Discovery and characterization of chromatin states
for systematic annotation of the human genome.
Nat Biotechnol, 28(8):817825, Aug
2010.
[70] Eleazar Eskin. From proles to patterns and back again: a branch and bound algorithm
for nding near optimal motif proles. pages 115124, 2004.
BIBLIOGRAPHY
122
[71] Manel Esteller. Cancer epigenomics: Dna methylomes and histone-modication maps.
Nat. Rev. Genet., 8(4):28698, Apr 2007.
[72] S. Falcon and R. Gentleman. Using gostats to test gene lists for go term association.
Bioinformatics, 23(2):257258, Jan 2007.
[73] Cheng Fan, Daniel S Oh, Lodewyk Wessels, Britta Weigelt, Dimitry S A Nuyten,
Andrew B Nobel, Laura J van't Veer, and Charles M Perou.
gene-expression-based predictors for breast cancer.
Concordance among
N Engl J Med,
355(6):560569,
Aug 2006.
[74] Xiaochun Fan, Zarmik Moqtaderi, Yi Jin, Yong Zhang, X. Shirley Liu, and Kevin
Struhl. Nucleosome depletion at yeast terminators is not intrinsic and can occur by
a transcriptional mechanism linked to 3'-end formation.
Proc Natl Acad Sci U S A,
107(42):1794517950, Oct 2010.
[75] M. Fanciulli, T. Bruno, M. Di Padova, R. De Angelis, S. Iezzi, C. Iacobini, A. Floridi,
and C. Passananti. Identication of a novel partner of rna polymerase ii subunit 11,
che-1, which interacts with and aects the growth suppression function of rb.
FASEB
J, 14(7):904912, May 2000.
[76] M Faronato and JM Coulson. Rest (re1-silencing transcription factor).
Atlas Genet
Cytogenet Oncol Haematol, 2010.
[77] E. R. Fearon.
Human cancer syndromes: clues to the origin and nature of cancer.
Science, 278(5340):10431050, Nov 1997.
[78] M. Fedurco, A. Romieu, S. Williams, I. Lawrence, and G. Turcatti.
Bta, a novel
reagent for dna attachment on glass and ecient generation of solid-phase amplied
dna colonies.
Nucleic Acids Res., 34(3):e22, 2006.
[79] Anthony P. Fejes, Gordon Robertson, Mikhail Bilenky, Richard Varhol, Matthew Bainbridge, and Steven J. M. Jones.
Findpeaks 3.1: a tool for identifying areas of en-
richment from massively parallel short-read sequencing technology.
Bioinformatics,
24(15):17291730, Aug 2008.
[80] B. Felding-Habermann, T. E. O'Toole, J. W. Smith, E. Fransvea, Z. M. Ruggeri,
M. H. Ginsberg, P. E. Hughes, N. Pampori, S. J. Shattil, A. Saven, and B. M. Mueller.
BIBLIOGRAPHY
123
Integrin activation controls metastasis in human breast cancer.
Proc Natl Acad Sci U
S A, 98(4):18531858, Feb 2001.
[81] Paolo Ferragina and Giovanni Manzini. Opportunistic data structures with applications. pages 390398, 2000.
[82] Paul Flicek and Ewan Birney. Sense from sequence reads: methods for alignment and
assembly.
Nat Methods, 6(11 Suppl):S6S12, Nov 2009.
[83] Aristide Floridi and Maurizio Fanciulli. Che-1: a new eector of checkpoints signaling.
Cell Cycle, 6(7):804806, Apr 2007.
[84] John A Foekens, Anieta M Sieuwerts, Marcel Smid, Maxime P Look, Vanja de Weerd,
Antonius W M Boersma, Jan G M Klijn, Erik A C Wiemer, and John W M Martens.
Four mirnas associated with aggressiveness of lymph node-negative, estrogen receptor-
Proc Natl Acad Sci U S A,
positive human breast cancer.
105(35):1302113026, Sep
2008.
[85] Federico Forneris, Claudia Binda, Antonio Adamo, Elena Battaglioli, and Andrea Mattevi. Structural basis of lsd1-corest selectivity in histone h3 recognition.
J Biol Chem,
282(28):2007020074, Jul 2007.
[86] Mario F. Fraga and Manel Esteller.
draft of histone modications.
Towards the human cancer epigenome: a rst
Cell Cycle, 4(10):13771381, Oct 2005.
[87] Yutao Fu, Manisha Sinha, Craig L Peterson, and Zhiping Weng. The insulator binding
protein ctcf positions 20 nucleosomes around its binding sites across the human genome.
PLoS Genet, 4(7):e1000138, 2008.
[88] G. Fuh and J. A. Wells.
breast cancer cell lines.
Prolactin receptor antagonists that inhibit the growth of
J Biol Chem, 270(22):1313313137, Jun 1995.
[89] P. Andrew Futreal, Lachlan Coin, Mhairi Marshall, Thomas Down, Timothy Hubbard,
Richard Wooster, Nazneen Rahman, and Michael R Stratton.
cancer genes.
A census of human
Nat Rev Cancer, 4(3):177183, Mar 2004.
[90] J. FÃ×llgrabe, N. Hajji, and B. Joseph. Cracking the death code: apoptosis-related
histone modications.
Cell Death Dier, 17(8):12381243, Aug 2010.
BIBLIOGRAPHY
124
[91] Federica Galeano, Anne Leroy, Claudia Rossetti, Irina Gromova, Philippe Gautier,
Liam P Keegan, Luca Massimi, Concezio Di Rocco, Mary A O'Connell, and Angela
Gallo. Human blcap transcript: new editing events in normal and cancerous tissues.
Int J Cancer, 127(1):127137, Jul 2010.
[92] A. L. Gashler, S. Swaminathan, and V. P. Sukhatme. A novel repression module, an
extensive activation domain, and a bipartite nuclear localization signal dened in the
immediate-early transcription factor egr-1.
Mol Cell Biol, 13(8):45564571, Aug 1993.
[93] L. Giacinti, P.P. Claudio, M. Lopez, and A. Giordano.
estrogen receptor alpha expression in breast cancer.
Epigenetic information and
Oncologist, 11:18, 2006.
[94] T. J. Gibson and J. Spring. Genetic redundancy in vertebrates: polyploidy and persistence of genes encoding multidomain proteins.
Trends Genet, 14(2):469; discussion
4950, Feb 1998.
[95] C. K. Glass and M. G. Rosenfeld. The coregulator exchange in transcriptional functions
of nuclear receptors.
Genes Dev, 14(2):121141, Jan 2000.
[96] M.J. Gray, J. Zhang, L.M. Ellis, G.L. Semenza, D.B. Evans, S.S. Watowich, and G.E.
Gallick. Hif-1alpha, stat3, cbp/p300 and ref-1/ape are components of a transcriptional
complex that regulates src-dependent hypoxia-induced expression of vegf in pancreatic
and prostate carcinomas.
Oncogene, 24:311020, 2005.
[97] Grazia Graziani, Lucio Tentori, Alessia Muzi, Matteo Vergati, Giuseppe Tringali, Giacomo Pozzoli, and Pierluigi Navarra. Evidence that corticotropin-releasing hormone
inhibits cell growth of human breast cancer cells via the activation of crh-r1 receptor
subtype.
Mol Cell Endocrinol, 264(1-2):4449, Jan 2007.
[98] Christopher Greenman, Philip Stephens, Raaella Smith, Gillian L Dalgliesh, Christopher Hunter, Graham Bignell, Helen Davies, Jon Teague, Adam Butler, Claire Stevens,
Sarah Edkins, Sarah O'Meara, Imre Vastrik, Esther E Schmidt, Tim Avis, Syd
Barthorpe, Gurpreet Bhamra, Gemma Buck, Bhudipa Choudhury, Jody Clements,
Jennifer Cole, Ed Dicks, Simon Forbes, Kris Gray, Kelly Halliday, Rachel Harrison, Katy Hills, Jon Hinton, Andy Jenkinson, David Jones, Andy Menzies, Tatiana
Mironenko, Janet Perry, Keiran Raine, Dave Richardson, Rebecca Shepherd, Alexandra Small, Calli Tofts, Jennifer Varian, Tony Webb, Soe West, Sara Widaa, Andy
BIBLIOGRAPHY
125
Yates, Daniel P Cahill, David N Louis, Peter Goldstraw, Andrew G Nicholson, Francis Brasseur, Leendert Looijenga, Barbara L Weber, Yoke-Eng Chiew, Anna DeFazio,
Mel F Greaves, Anthony R Green, Peter Campbell, Ewan Birney, Douglas F Easton,
Georgia Chenevix-Trench, Min-Han Tan, Sok Kean Khoo, Bin Tean Teh, Siu Tsan
Yuen, Suet Yi Leung, Richard Wooster, P. Andrew Futreal, and Michael R Stratton.
Patterns of somatic mutation in human cancer genomes.
Nature,
446(7132):153158,
Mar 2007.
[99] Obi L Grith, Stephen B Montgomery, Bridget Bernier, Bryan Chu, Katayoon Kasaian, Stein Aerts, Shaun Mahony, Monica C Sleumer, Mikhail Bilenky, Maximilian
Haeussler, Malachi Grith, Steven M Gallo, Belinda Giardine, Bart Hooghe, Peter Van Loo, Enrique Blanco, Amy Ticoll, Stuart Lithwick, Elodie Portales-Casamar,
Ian J Donaldson, Gordon Robertson, Claes Wadelius, Pieter De Bleser, Dominique
Vlieghe, Marc S Halfon, Wyeth Wasserman, Ross Hardison, Casey M Bergman, Steven
J M Jones, and Open Regulatory Annotation Consortium. Oreganno: an open-access
community-driven resource for regulatory annotation.
Nucleic Acids Res, 36(Database
issue):D107D113, Jan 2008.
[100] Christian J Gruber, Doris M Gruber, Isabel M L Gruber, Fritz Wieser, and Johannes C
Huber. Anatomy of the estrogen response element.
Trends Endocrinol Metab, 15(2):73
78, Mar 2004.
[101] Stefan GrÃ
¿
f, Fiona G G Nielsen, Stefan Kurtz, Martijn A Huynen, Ewan Birney,
Henk Stunnenberg, and Paul Flicek. Optimized design and assessment of whole genome
tiling arrays.
Bioinformatics, 23(13):i195i204, Jul 2007.
[102] J. L. Guan. Role of focal adhesion kinase in integrin signaling.
Int J Biochem Cell
Biol, 29(8-9):10851096, 1997.
[103] Kristin C Gunsalus and Fabio Piano. Rnai as a tool to study cell biology: building
the genome-phenome bridge.
Curr Opin Cell Biol, 17(1):38, Feb 2005.
[104] Carmen Gurrola-Diaz, Jeannine Lacroix, Susanne Dihlmann, Cord-Michael Becker,
and Magnus von Knebel Doeberitz. Reduced expression of the neuron restrictive silencer factor permits transcription of glycine receptor alpha1 subunit in small-cell lung
cancer cells.
Oncogene, 22(36):56365645, Aug 2003.
BIBLIOGRAPHY
126
[105] A J Hackett, H S Smith, E L Springer, R B Owens, W A Nelson-Rees, J L Riggs,
and M B Gardner. Two syngeneic cell lines from human breast tissue: the aneuploid
mammary epithelial (hs578t) and the diploid myoepithelial (hs578bst) cell lines.
J.
Natl. Cancer Inst., 58(6):1795806, 1977.
[106] D. Hanahan and R. A. Weinberg. The hallmarks of cancer.
Cell,
100(1):5770, Jan
2000.
[107] M. F. Hansen and W. K. Cavenee. Tumor suppressors: recessive mutations that lead
to cancer.
Cell, 53(2):173174, Apr 1988.
[108] R. K. Hansen and M. J. Bissell.
Tissue architecture and breast cancer: the role of
extracellular matrix and steroid hormones.
Endocr Relat Cancer,
7(2):95113, Jun
2000.
[109] Kimberly A Hartwell, Beth Muir, Ferenc Reinhardt, Anne E Carpenter, Dennis C
Sgroi, and Robert A Weinberg.
tumor metastasis.
The spemann organizer gene, goosecoid, promotes
Proc Natl Acad Sci U S A, 103(50):1896918974, Dec 2006.
[110] Nathaniel D Heintzman, Rhona K Stuart, Gary Hon, Yutao Fu, Christina W Ching,
R. David Hawkins, Leah O Barrera, Sara Van Calcar, Chunxu Qu, Keith A Ching, Wei
Wang, Zhiping Weng, Roland D Green, Gregory E Crawford, and Bing Ren. Distinct
and predictive chromatin signatures of transcriptional promoters and enhancers in the
human genome.
Nat Genet, 39(3):311318, Mar 2007.
[111] S. Heniko, E. McKittrick, and K. Ahmad. Epigenetics, histone h3 variants, and the
inheritance of chromatin states.
Cold Spring Harb Symp Quant Biol, 69:235243, 2004.
[112] Nicolas Herranz, Diego Pasini, Victor M Diaz, Clara Francis, Arantxa Gutierrez, Natalia Dave, Maria Escriva, Inma Hernandez-Munoz, Luciano Di Croce, Kristian Helin,
Antonio GarcÃa de Herreros, and Sandra Peiro. Polycomb complex 2 is required for ecadherin repression by the snail1 transcription factor.
Mol Cell Biol, 28(15):47724781,
Aug 2008.
[113] G. Z. Hertz and G. D. Stormo. Identifying dna and protein patterns with statistically
signicant alignments of multiple sequences.
Bioinformatics, 15(7-8):563577, 1999.
BIBLIOGRAPHY
127
[114] Geo S Higgins,
Adrian L Harris,
McKenna, and Francesca M Bua.
sis in early breast cancer patients.
[115] B.
G.
Homan,
G.
Robertson,
Remko Prevo,
Thomas Helleday,
W. Gillies
Overexpression of polq confers a poor progno-
Oncotarget, 1(3):175184, Jul 2010.
B.
Zavaglia,
M.
Beach,
R.
Cullum,
S.
Lee,
G. Soukhatcheva, L. Li, E. D. Wederell, N. Thiessen, M. Bilenky, T. Cezard, A. Tam,
B. Kamoh, I. Birol, D. Dai, Y. Zhao, M. Hirst, C. B. Verchere, C. D. Helgason, M. A.
Marra, S. J. Jones, and P. A. Hoodless. Locus co-occupancy, nucleosome positioning,
and H3K4me1 regulate the functionality of FOXA2-, HNF4A-, and PDX1-bound loci
in islets and liver.
[116] K.E.V. Holde.
Genome Res., 20:10371051, Aug 2010.
Chromatin (Springer series in molecular biology). Springer-Verlag Berlin
and Heidelberg GmbH & Co. K, 12 1989.
[117] Frederik Holst, Phillip R Stahl, Christian Ruiz, Olaf Hellwinkel, Zeenath Jehan, Marc
Wendland, Annette Lebeau, Luigi Terracciano, Khawla Al-Kuraya, Fritz JÃ
¿
nicke,
Guido Sauter, and Ronald Simon. Estrogen receptor alpha (esr1) gene amplication
is frequent in breast cancer.
Nat Genet, 39(5):655660, May 2007.
[118] B. Horard and J-M. Vanacker. Estrogen receptor-related receptors: orphan receptors
desperately seeking a ligand.
J Mol Endocrinol, 31(3):349357, Dec 2003.
[119] Hugo M Horlings, Anna Bergamaschi, Silje H Nordgard, Young H Kim, Wonshik Han,
Dong-Young Noh, Keyan Salari, Simon A Joosse, Fabien Reyal, Ole Christian Lingjaerde, Vessela N Kristensen, Anne-Lise BÃºrresen-Dale, Jonathan Pollack, and Marc J
van de Vijver. Esr1 gene amplication in breast cancer: a common phenomenon?
Nat
Genet, 40(7):8078; author reply 8102, Jul 2008.
[120] Da Wei Huang, Brad T Sherman, and Richard A Lempicki. Systematic and integrative
analysis of large gene lists using david bioinformatics resources.
Nat Protoc, 4(1):4457,
2009.
[121] T. Hubbard, D. Andrews, M. Caccamo, G. Cameron, Y. Chen, M. Clamp, L. Clarke,
G. Coates, T. Cox, F. Cunningham, V. Curwen, T. Cutts, T. Down, R. Durbin,
X. M. Fernandez-Suarez, J. Gilbert, M. Hammond, J. Herrero, H. Hotz, K. Howe,
V. Iyer, K. Jekosch, A. Kahari, A. Kasprzyk, D. Keefe, S. Keenan, F. Kokocinsci,
D. London, I. Longden, G. McVicker, C. Melsopp, P. Meidl, S. Potter, G. Proctor,
BIBLIOGRAPHY
128
M. Rae, D. Rios, M. Schuster, S. Searle, J. Severin, G. Slater, D. Smedley, J. Smith,
W. Spooner, A. Stabenau, J. Stalker, R. Storey, S. Trevanion, A. Ureta-Vidal, J. Vogel,
S. White, C. Woodwark, and E. Birney. Ensembl 2005.
Nucleic Acids Res, 33(Database
issue):D447D453, Jan 2005.
[122] Philip Hublitz, Mareike Albert, and Antoine H F M Peters. Mechanisms of transcriptional repression by histone lysine methylation.
Int. J. Dev. Biol.,
53(2-3):33554,
2009.
[123] C A Iacobuzio-Donahue. Epigenetic changes in cancer.
Annu Rev Pathol, 4:229249,
2009.
[124] Marilena V Iorio, Manuela Ferracin, Chang-Gong Liu, Angelo Veronese, Riccardo
Spizzo,
Silvia Sabbioni,
©
Campiglio, Sylvie MÃ
Eros Magri,
Massimo Pedriali,
Muller Fabbri,
Manuela
nard, Juan P Palazzo, Anne Rosenberg, Piero Musiani, Ste-
fano Volinia, Italo Nenci, George A Calin, Patrizia Querzoli, Massimo Negrini, and
Carlo M Croce. Microrna gene expression deregulation in human breast cancer.
Can-
cer Res, 65(16):70657070, Aug 2005.
[125] Elizabeth Iorns, Christopher J Lord, Nicholas Turner, and Alan Ashworth. Utilizing
rna interference to enhance cancer drug discovery.
Nat Rev Drug Discov, 6(7):556568,
Jul 2007.
[126] Shigeki Iwase, Fei Lan, Peter Bayliss, Luis de la Torre-Ubieta, Maite Huarte, Hank H
Qi, Johnathan R Whetstine, Azad Bonni, Thomas M Roberts, and Yang Shi.
The
x-linked mental retardation gene smcx/jarid1c denes a family of histone h3 lysine 4
demethylases.
Cell, 128(6):10771088, Mar 2007.
[127] Rudolf Jaenisch and Adrian Bird. Epigenetic regulation of gene expression: how the
genome integrates intrinsic and environmental signals.
Nat Genet, 33 Suppl:245254,
Mar 2003.
[128] T. Jenuwein and C. D. Allis. Translating the histone code.
Science, 293(5532):1074
1080, Aug 2001.
[129] Peter A Jones and Stephen B Baylin. The epigenomics of cancer.
Feb 2007.
Cell, 128(4):683692,
BIBLIOGRAPHY
129
[130] Roy Joseph, Yuriy L Orlov, Mikael Huss, Wenjie Sun, Say Li Kong, Leena Ukil, You Fu
Pan, Guoliang Li, Michael Lim, Jane S Thomsen, Yijun Ruan, Neil D Clarke, Shyam
Prabhakar, Edwin Cheung, and Edison T Liu. Integrative model of genomic factors
for determining binding site selection by estrogen receptor-alpha.
Mol Syst Biol, 6:456,
Dec 2010.
[131] Luke Jostins. Basics: Sequencing dna, part 1, april 2009.
[132] S. M. Judge and R. T. Chatterton. Progesterone-specic stimulation of triglyceride
biosynthesis in a breast cancer cell line (t-47d).
Cancer Res,
43(9):44074412, Sep
1983.
[133] Masahiro Kajita, Karissa N McClinic, and Paul A Wade. Aberrant expression of the
transcription factors snail and slug alters the response to genotoxic stress.
Mol Cell
Biol, 24(17):75597566, Sep 2004.
[134] Minoru Kanehisa. The kegg database.
Novartis Found Symp,
247:91101; discussion
1013, 11928, 24452, 2002.
[135] Minoru Kanehisa, Susumu Goto, Miho Furumichi, Mao Tanabe, and Mika Hirakawa.
Kegg for representation and analysis of molecular networks involving diseases and
drugs.
Nucleic Acids Res, 38(Database issue):D355D360, Jan 2010.
[136] Jin Seok Kang, Na Jin Jung, Seyl Kim, Dae Joong Kim, Dong Deuk Jang, and Ki-Hwa
Yang. Downregulation of estrogen receptor alpha and beta expression in carcinogeninduced mammary gland tumors of rats.
Eksp Onkol, 26(1):3135, Mar 2004.
[137] J. Kao, K. Salari, M. Bocanegra, Y. L. Choi, L. Girard, J. Gandhi, K. A. Kwei,
T. Hernandez-Boussard, P. Wang, A. F. Gazdar, J. D. Minna, and J. R. Pollack.
Molecular proling of breast cancer cell lines denes relevant tumor models and provides a resource for cancer gene discovery.
PLoS ONE, 4:e6146, 2009.
[138] Amy V Kapp, Stefanie S Jerey, Anita Langerød, Anne-Lise Børresen-Dale, Wonshik Han, Dong-Young Noh, Ida R K Bukholm, Monica Nicolau, Patrick O Brown,
and Robert Tibshirani.
Genomics, 7:231, 2006.
Discovery and validation of breast cancer subtypes.
BMC
BIBLIOGRAPHY
[139] Juha Karkkainen.
130
Fast bwt in small space by blockwise sux sorting.
Computer Science, 387(3):249 257, 2007.
Theoretical
The Burrows-Wheeler Transform.
[140] Vladimir I Kashuba, Jingfeng Li, Fuli Wang, Vera N Senchenko, Alexey Protopopov,
Alena Malyukova, Alexey S Kutsenko, Elena Kadyrova, Veronika I Zabarovska, Olga V
Muravenko, Alexander V Zelenin, Lev L Kisselev, Igor Kuzmin, John D Minna,
¶
GÃ sta Winberg, Ingemar Ernberg, Eleonora Braga, Michael I Lerman, George Klein,
and Eugene R Zabarovsky. Rbsp3 (hya22) is a tumor suppressor gene implicated in
major epithelial malignancies.
Proc Natl Acad Sci U S A,
101(14):49064911, Apr
2004.
[141] Michael B Kastan and Jiri Bartek.
Cell-cycle checkpoints and cancer.
Nature,
432(7015):316323, Nov 2004.
[142] Y. Katayose, M. Kim, A. N. Rakkar, Z. Li, K. H. Cowan, and P. Seth.
Promoting
apoptosis: a novel activity associated with the cyclin-dependent kinase inhibitor p27.
Cancer Res, 57(24):54415445, Dec 1997.
[143] M. Katoh and M. Katoh. Comparative genomics on snai1, snai2, and snai3 orthologs.
Oncol Rep, 14:10836, 2005.
[144] L. H. Kedes. Histone genes and histone messengers.
Annu Rev Biochem, 48:837870,
1979.
[145] U. Keich and P. A. Pevzner.
Finding motifs in the twilight zone.
Bioinformatics,
18(10):13741381, Oct 2002.
[146] W. James Kent. Blatthe blast-like alignment tool.
Genome Res, 12(4):656664, Apr
2002.
[147] W. James Kent, Charles W Sugnet, Terrence S Furey, Krishna M Roskin, Tom H
Pringle, Alan M Zahler, and David Haussler.
The human genome browser at ucsc.
Genome Res, 12(6):9961006, Jun 2002.
[148] I. Keydar, L. Chen, S. Karby, F. R. Weiss, J. Delarea, M. Radu, S. Chaitcik, and H. J.
Brenner. Establishment and characterization of a cell line of human breast carcinoma
origin.
Eur J Cancer, 15(5):659670, May 1979.
BIBLIOGRAPHY
131
[149] Peter V Kharchenko, Michael Y Tolstorukov, and Peter J Park. Design and analysis
of chip-seq experiments for dna-binding proteins.
Nat Biotechnol,
26(12):13511359,
Dec 2008.
[150] Purvesh Khatri and Sorin Draghici.
Ontological analysis of gene expression data:
current tools, limitations, and open problems.
Bioinformatics, 21(18):35873595, Sep
2005.
[151] Mi-Jung Kim, Jae Y Ro, Sei-Hyun Ahn, Hak Hee Kim, Sung-Bae Kim, and Gyungyub
Gong.
Clinicopathologic signicance of the basal-like subtype of breast cancer:
comparison with hormone receptor and her2/neu-overexpressing phenotypes.
a
Hum
Pathol, 37(9):12171226, Sep 2006.
[152] Sung-Mi Kim, Hae-Jin Kee, Nakwon Choe, Ji-Young Kim, Hoon Kook, Hyun Kook,
and Sang-Beom Seo. The histone methyltransferase activity of whistle is important
for the induction of apoptosis and hdac1-mediated transcriptional repression.
Exp Cell
Res, 313(5):975983, Mar 2007.
[153] Robert J Klose, Eric M Kallin, and Yi Zhang. Jmjc-domain-containing proteins and
histone demethylation.
Nat Rev Genet, 7(9):715727, Sep 2006.
[154] Robert J Klose, Qin Yan, Zuzana Tothova, Kenichi Yamane, Hediye ErdjumentBromage, Paul Tempst, D. Gary Gilliland, Yi Zhang, and William G Kaelin.
retinoblastoma binding protein rbp2 is an h3k4 demethylase.
Cell,
The
128(5):889900,
Mar 2007.
[155] A. G. Knudson. Mutation and cancer: statistical study of retinoblastoma.
Proc Natl
Acad Sci U S A, 68(4):820823, Apr 1971.
[156] A. G. Knudson. Two genetic hits (more or less) to cancer.
Nat Rev Cancer, 1(2):157
162, Nov 2001.
[157] Daniel C Koboldt, Li Ding, Elaine R Mardis, and Richard K Wilson. Challenges of
sequencing human genomes.
Brief Bioinform, 11(5):484498, Sep 2010.
[158] Tony Kouzarides. Chromatin modications and their function.
Feb 2007.
Cell, 128(4):693705,
BIBLIOGRAPHY
132
[159] Ana Kozomara and Sam Griths-Jones.
and deep-sequencing data.
mirbase: integrating microrna annotation
Nucleic Acids Res,
39(Database issue):D152D157, Jan
2011.
[160] Anja Krones-Herzig, Shalu Mittal, Kelly Yule, Hongyan Liang, Chris English, Rafael
Urcis, Tarun Soni, Eileen D Adamson, and Dan Mercola.
Early growth response 1
acts as a tumor suppressor in vivo and in vitro via regulation of p53.
Cancer Res,
65(12):51335143, Jun 2005.
[161] Stefan Kubicek and Thomas Jenuwein. A crack in histone lysine methylation.
Cell,
119(7):903906, Dec 2004.
[162] J. Kuntzer, D. Eggle, H. P. Lenhof, H. Burtscher, and S. Klostermann. The roche cancer genome database (rcgdb).
data available as:
Hum Mutat, 31(4):407413, 2010. Specic link to BRAC1
http://rcgdb.bioinf.uni-sb.de/MutomeWeb/MutatedCellLines?
query=672.
[163] M. Lachner, R. Sengupta, G. Schotta, and T. Jenuwein.
Trilogies of histone lysine
methylation as epigenetic landmarks of the eukaryotic genome.
Cold Spring Harb
Symp Quant Biol, 69:209218, 2004.
Computational Biology of Transcription Factor Binding (Methods in Molecular Biology). Humana Press, 1st edition. edition, 9 2010.
[164] Istvan Ladunga, editor.
[165] J. R. Lambert, V. W. Bilanchone, and M. G. Cumsky.
The ord1 gene encodes a
transcription factor involved in oxygen regulation and is identical to ixr1, a gene that
confers cisplatin sensitivity to saccharomyces cerevisiae.
Proc Natl Acad Sci U S A,
91(15):73457349, Jul 1994.
[166] Anja Lambrechts, Marleen Van Troys, and Christophe Ampe. The actin cytoskeleton
in normal and pathological cell motility.
Int J Biochem Cell Biol,
36(10):18901909,
Oct 2004.
[167] Fei Lan, Amanda Clair Nottke, and Yang Shi. Mechanisms involved in the regulation
of histone lysine demethylases.
Curr Opin Cell Biol, 20(3):316325, Jun 2008.
BIBLIOGRAPHY
133
[168] Ben Langmead, Cole Trapnell, Mihai Pop, and Steven L Salzberg.
Ultrafast and
memory-ecient alignment of short dna sequences to the human genome.
Genome
Biol, 10(3):R25, 2009.
[169] Amy L Lark, Chad A Livasy, Lynn Dressler, Dominic T Moore, Robert C Millikan,
Joseph Geradts, Mary Iacocca, David Cowan, Debbie Little, Rolf J Craven, and
William Cance. High focal adhesion kinase expression in invasive breast carcinomas is
associated with an aggressive phenotype.
Mod Pathol, 18(10):12891294, Oct 2005.
[170] E. Y. Lasfargues, W. G. Coutinho, and E. S. Redeld. Isolation of two human tumor
epithelial cell lines from solid breast carcinomas.
J. Natl. Cancer Inst.,
61(4):967 978, 1978.
[171] B.V. Latinkic and J.C. Smith. Goosecoid and mix.1 repress brachyury expression and
are required for head formation in xenopus.
Development, 126:176979, 1999.
[172] C. E. Lawrence, S. F. Altschul, M. S. Boguski, J. S. Liu, A. F. Neuwald, and J. C.
Wootton. Detecting subtle sequence signals: a gibbs sampling strategy for multiple
alignment.
Science, 262(5131):208214, Oct 1993.
[173] C. E. Lawrence and A. A. Reilly. An expectation maximization (em) algorithm for the
identication and characterization of common sites in unaligned biopolymer sequences.
Proteins, 7(1):4151, 1990.
[174] Ju Youn Lee, Ji Yeon Park, and Bin Tian. Identication of mrna polyadenylation sites
in genomes using cdna sequences, expressed sequence tags, and trace.
Methods Mol
Biol, 419:2337, 2008.
[175] M. G. Lee, C. Wynder, N. Cooch, and R. Shiekhattar. An essential role for CoREST
in nucleosomal histone 3 lysine 4 demethylation.
Nature, 437:432435, Sep 2005.
[176] Min Gyu Lee, Jessica Norman, Ali Shilatifard, and Ramin Shiekhattar. Physical and
functional association of a trimethyl h3k4 demethylase and ring6a/mblr, a polycomblike protein.
Cell, 128(5):877887, Mar 2007.
[177] Min Gyu Lee, Christopher Wynder, Daniel A Bochar, Mohamed-Ali Hakimi, Neil
Cooch, and Ramin Shiekhattar.
and deacetylase enzymes.
Functional interplay between histone demethylase
Mol Cell Biol, 26(17):63956402, Sep 2006.
BIBLIOGRAPHY
134
[178] William Lee, Zhaoshi Jiang, Jinfeng Liu, Peter M Haverty, Yinghui Guan, Jeremy
Stinson, Peng Yue, Yan Zhang, Krishna P Pant, Deepali Bhatt, Connie Ha, Stephanie
Johnson, Michael I Kennemer, Sankar Mohan, Igor Nazarenko, Colin Watanabe, Andrew B Sparks, David S Shames, Robert Gentleman, Frederic J de Sauvage, Howard
Stern, Ajay Pandita, Dennis G Ballinger, Radoje Drmanac, Zora Modrusan, Somasekar
Seshagiri, and Zemin Zhang. The mutation spectrum revealed by paired genome sequences from a lung cancer patient.
[179] Pascal
Lefevre
and
Constanze
Nature, 465(7297):473477, May 2010.
Bonifer.
Analyzing
crosslinked chromatin treated with micrococcal nuclease.
histone
modication
using
Methods Mol Biol, 325:315
325, 2006.
[180] Hui Sun Leong and David Kipling. Text-based over-representation analysis of microarray gene lists with annotation bias.
Nucleic Acids Res, 37(11):e79, Jun 2009.
[181] M. A. Lever, J. P. Th'ng, X. Sun, and M. J. Hendzel. Rapid exchange of histone h1.1
on chromatin in living human cells.
Nature, 408(6814):873876, Dec 2000.
[182] Samuel Levy, Granger Sutton, Pauline C Ng, Lars Feuk, Aaron L Halpern, Brian P
Walenz, Nelson Axelrod, Jiaqi Huang, Ewen F Kirkness, Gennady Denisov, Yuan
Lin, Jerey R MacDonald, Andy Wing Chun Pang, Mary Shago, Timothy B Stockwell, Alexia Tsiamouri, Vineet Bafna, Vikas Bansal, Saul A Kravitz, Dana A Busam,
Karen Y Beeson, Tina C McIntosh, Karin A Remington, Josep F Abril, John Gill, Jon
Borman, Yu-Hui Rogers, Marvin E Frazier, Stephen W Scherer, Robert L Strausberg,
and J. Craig Venter. The diploid genome sequence of an individual human.
PLoS Biol,
5(10):e254, Sep 2007.
[183] M. T. Lewis. Homeobox genes in mammary gland development and neoplasia.
Breast
Cancer Res, 2(3):158169, 2000.
[184] M. T. Lewis, S. Ross, P. A. Strickland, C. J. Snyder, and C. W. Daniel. Regulated
expression patterns of irx-2, an iroquois-class homeobox gene, in the human breast.
Cell Tissue Res, 296(3):549554, Jun 1999.
[185] Timothy J Ley, Elaine R Mardis, Li Ding, Bob Fulton, Michael D McLellan, Ken Chen,
David Dooling, Brian H Dunford-Shore, Sean McGrath, Matthew Hickenbotham, Lisa
Cook, Rachel Abbott, David E Larson, Dan C Koboldt, Craig Pohl, Scott Smith, Amy
BIBLIOGRAPHY
135
Hawkins, Scott Abbott, Devin Locke, Ladeana W Hillier, Tracie Miner, Lucinda Fulton, Vincent Magrini, Todd Wylie, Jarret Glasscock, Joshua Conyers, Nathan Sander,
Xiaoqi Shi, John R Osborne, Patrick Minx, David Gordon, Asif Chinwalla, Yu Zhao,
Rhonda E Ries, Jacqueline E Payton, Peter Westervelt, Michael H Tomasson, Mark
Watson, Jack Baty, Jennifer Ivanovich, Sharon Heath, William D Shannon, Rakesh
Nagarajan, Matthew J Walter, Daniel C Link, Timothy A Graubert, John F DiPersio,
and Richard K Wilson.
leukaemia genome.
Dna sequencing of a cytogenetically normal acute myeloid
Nature, 456(7218):6672, Nov 2008.
[186] Haitao Li, Serge Ilin, Wooikoon Wang, Elizabeth M Duncan, Joanna Wysocka,
C. David Allis, and Dinshaw J Patel.
Molecular basis for site-specic read-out of
histone h3k4me3 by the bptf phd nger of nurf.
Nature, 442(7098):9195, Jul 2006.
[187] Heng Li and Richard Durbin. Fast and accurate short read alignment with burrowswheeler transform.
Bioinformatics, 25(14):17541760, Jul 2009.
[188] Heng Li and Nils Homer.
generation sequencing.
A survey of sequence alignment algorithms for next-
Brief Bioinform, 11(5):473483, Sep 2010.
[189] Heng Li, Jue Ruan, and Richard Durbin. Mapping short dna sequencing reads and
calling variants using mapping quality scores.
Genome Res,
18(11):18511858, Nov
2008.
[190] Ruiqiang Li, Yingrui Li, Karsten Kristiansen, and Jun Wang.
cleotide alignment program.
Soap: short oligonu-
Bioinformatics, 24(5):713714, Mar 2008.
[191] Ruiqiang Li, Chang Yu, Yingrui Li, Tak-Wah Lam, Siu-Ming Yiu, Karsten Kristiansen,
and Jun Wang. Soap2: an improved ultrafast tool for short read alignment.
Bioinfor-
matics, 25(15):19661967, Aug 2009.
[192] Harry M Lightfoot, Amy Lark, Chad A Livasy, Dominic T Moore, David Cowan,
Lynn Dressler, Rolf J Craven, and William G Cance. Upregulation of focal adhesion
kinase (fak) expression in ductal carcinoma in situ (dcis) is an early event in breast
tumorigenesis.
Breast Cancer Res Treat, 88(2):109116, Nov 2004.
[193] Hao Lin, Zefeng Zhang, Michael Q Zhang, Bin Ma, and Ming Li. Zoom! zillions of
oligos mapped.
Bioinformatics, 24(21):24312437, Nov 2008.
BIBLIOGRAPHY
136
[194] T. Lin, A. Ponn, X. Hu, B. K. Law, and J. Lu. Requirement of the histone demethylase lsd1 in snai1-mediated transcriptional repression during epithelial-mesenchymal
transition.
Oncogene, 29(35):48964904, Sep 2010.
[195] Edison T Liu, Sebastian Pott, and Mikael Huss. Q&a: Chip-seq technologies and the
study of gene regulation.
BMC Biol, 8:56, 2010.
[196] Jingbo Liu, Ya-Guang Liu, Ruochun Huang, Chen Yao, Shiyong Li, Weimin Yang,
Dongzi Yang, and Ruo-Pan Huang. Concurrent down-regulation of egr-1 and gelsolin
in the majority of human breast cancer cells.
Cancer Genomics Proteomics, 4(6):377
385, 2007.
[197] George Locke, Denis Tolkunov, Zarmik Moqtaderi, Kevin Struhl, and Alexandre V
Morozov. High-throughput sequencing reveals a simple model of nucleosome energetics.
Proc Natl Acad Sci U S A, 107(49):2099821003, Dec 2010.
[198] Harvey Lodish, Arnold Berk, Chris A. Kaiser, Monty Krieger, Matthew P. Scott, An-
Molecular Cell Biology (Lodish,
thony Bretscher, Hidde Ploegh, and Paul Matsudaira.
Molecular Cell Biology).
W. H. Freeman, 6th edition, 6 2007.
[199] Leandro A Loss, Anguraj Sadanandam, Steen Durinck, Shivani Nautiyal, Diane
Flaucher, Victoria E H Carlton, Martin Moorhead, Yontao Lu, Joe W Gray, Malek
Faham, Paul Spellman, and Bahram Parvin.
genes in breast cancer cell lines.
Prediction of epigenetically regulated
BMC Bioinformatics, 11:305, 2010.
[200] K. Luger, A. W. Mader, R. K. Richmond, D. F. Sargent, and T. J. Richmond. Crystal
structure of the nucleosome core particle at 2.8 a resolution.
Nature,
389(6648):251
260, Sep 1997.
[201] Margus
Lukk,
Misha
Kapushesky,
Janne
¿
NikkilÃ
,
Helen
Parkinson,
Goncalves, Wolfgang Huber, Esko Ukkonen, and Alvis Brazma.
human gene expression.
Angela
A global map of
Nat Biotechnol, 28(4):322324, Apr 2010.
[202] Bin Ma, John Tromp, and Ming Li. Patternhunter: faster and more sensitive homology
search.
Bioinformatics, 18(3):440445, Mar 2002.
[203] SC Macevicz.
Dna sequencing by parallel oligonucleotide extensions.
1997(163):45 45, 1997.
Biofutur,
BIBLIOGRAPHY
137
[204] Jerey P MacKeigan, Leon O Murphy, and John Blenis.
Sensitized rnai screen of
human kinases and phosphatases identies new regulators of apoptosis and chemoresistance.
Nat Cell Biol, 7(6):591600, Jun 2005.
[205] M. Maemura and R. B. Dickson.
metastasis of breast cancer?
Are cellular adhesion molecules involved in the
Breast Cancer Res Treat, 32(3):239260, 1994.
[206] Donna Maglott, Jim Ostell, Kim D Pruitt, and Tatiana Tatusova. Entrez gene: genecentered information at ncbi.
Nucleic Acids Res,
33(Database issue):D54D58, Jan
2005.
[207] Shaun Mahony and Panayiotis V Benos. Stamp: a web tool for exploring dna-binding
motif similarities.
Nucleic Acids Res, 35(Web Server issue):W253W258, Jul 2007.
[208] Lira Mamanova, Alison J Coey, Carol E Scott, Iwanka Kozarewa, Emily H Turner,
Akash Kumar, Eleanor Howard, Jay Shendure, and Daniel J Turner. Target-enrichment
strategies for next-generation sequencing.
Nat Methods, 7(2):111118, Feb 2010.
[209] Yan-Gao Man and Qing-Xiang Amy Sang. The signicance of focal myoepithelial cell
layer disruptions in human breast tumor invasion: a paradigm shift from the "proteasecentered" hypothesis.
Exp Cell Res, 301(2):103118, Dec 2004.
[210] Elaine R Mardis. The impact of next-generation sequencing technology on genetics.
Trends Genet., 24(3):13341, Mar 2008.
[211] Marc Mareel and Ancy Leroy.
invasion.
Clinical, cellular, and molecular aspects of cancer
Physiol Rev, 83(2):337376, Apr 2003.
[212] M. Margulies, M. Egholm, W. E. Altman, S. Attiya, J. S. Bader, L. A. Bemben,
J. Berka, M. S. Braverman, Y.-J. Chen, Z. Chen, S. B. Dewell, L. Du, J. M. Fierro,
X. V. Gomes, B. C. Godwin, W. He, S. Helgesen, C. H. Ho, C. H. Ho, G. P. Irzyk,
S. C. Jando, M. L. I. Alenquer, T. P. Jarvie, K. B. Jirage, J.-B. Kim, J. R. Knight,
J. R. Lanza, J. H. Leamon, S. M. Lefkowitz, M. Lei, J. Li, K. L. Lohman, H. Lu, V. B.
Makhijani, K. E. McDade, M. P. McKenna, E. W. Myers, E. Nickerson, J. R. Nobile,
R. Plant, B. P. Puc, M. T. Ronan, G. T. Roth, G. J. Sarkis, J. F. Simons, J. W.
Simpson, M. Srinivasan, K. R. Tartaro, A. Tomasz, K. A. Vogt, G. A. Volkmer, S. H.
Wang, Y. Wang, M. P. Weiner, P. Yu, R. F. Begley, and J. M. Rothberg.
Genome
BIBLIOGRAPHY
138
sequencing in microfabricated high-density picolitre reactors.
Nature, 437(7057):376 380, 2005.
[213] Joan Massague, Gaorav P Gupta, and Andy Minn. Method of predicting and reducing
risk of metastasis of breast cancer to lung, 2008.
[214] S. Matikainen, T. Ronni, M. Hurme, R. Pine, and I. Julkunen. Retinoic acid activates
interferon regulatory factor-1 gene expression in myeloid cells.
Blood,
88(1):114123,
Jul 1996.
¶
[215] V. Matys, E. Fricke, R. Geers, E. GÃ ssling, M. Haubrock, R. Hehl, K. Hornischer,
D. Karas, A. E. Kel, O. V. Kel-Margoulis, D-U. Kloos, S. Land, B. Lewicki-Potapov,
H. Michael, R. MÃ×nch, I. Reuter, S. Rotert, H. Saxel, M. Scheer, S. Thiele, and
E. Wingender. Transfac: transcriptional regulation, from patterns to proles.
Nucleic
Acids Res, 31(1):374378, Jan 2003.
[216] J. McBryan, J. Howlin, P. A. Kenny, T. Shioda, and F. Martin. Eralpha-cited1 coregulated genes expressed during pubertal mammary gland development: implications
for breast cancer prognosis.
Oncogene, 26(44):64066419, Sep 2007.
[217] Kevin Judd McKernan, Heather E Peckham, Gina L Costa, Stephen F McLaughlin,
Yutao Fu, Eric F Tsung, Christopher R Clouser, Cisyla Duncan, Jerey K Ichikawa,
Clarence C Lee, Zheng Zhang, Swati S Ranade, Eileen T Dimalanta, Fiona C Hyland,
Tanya D Sokolsky, Lei Zhang, Andrew Sheridan, Haoning Fu, Cynthia L Hendrickson,
Bin Li, Lev Kotler, Jeremy R Stuart, Joel A Malek, Jonathan M Manning, Alena A Antipova, Damon S Perez, Michael P Moore, Kathleen C Hayashibara, Michael R Lyons,
Robert E Beaudoin, Brittany E Coleman, Michael W Laptewicz, Adam E Sannicandro, Michael D Rhodes, Rajesh K Gottimukkala, Shan Yang, Vineet Bafna, Ali Bashir,
Andrew MacBride, Can Alkan, Jerey M Kidd, Evan E Eichler, Martin G Reese, Francisco M De La Vega, and Alan P Blanchard. Sequence and structural variation in a
human genome uncovered by short-read, massively parallel ligation sequencing using
two-base encoding.
Genome Res, 19(9):15271541, Sep 2009.
BIBLIOGRAPHY
139
[218] T. A. McKinsey, C. L. Zhang, and E. N. Olson.
Activation of the myocyte en-
hancer factor-2 transcription factor by calcium/calmodulin-dependent protein kinasestimulated binding of 14-3-3 to histone deacetylase 5.
Proc Natl Acad Sci U S A,
97(26):1440014405, Dec 2000.
[219] Gunter Meister, Markus Landthaler, Agnieszka Patkaniowska, Yair Dorsett, Grace
Teng, and Thomas Tuschl.
mirnas and sirnas.
Human argonaute2 mediates rna cleavage targeted by
Mol Cell, 15(2):185197, Jul 2004.
[220] Eric Metzger, Axel Imhof, Dharmeshkumar Patel, Philip Kahl, Katrin Homeyer,
Nicolaus Friedrichs, Judith M MÃ×ller, Holger Greschik, Jutta Kirfel, Sujuan Ji,
Natalia Kunowska, Christian Beisenherz-Huss, Thomas GÃ×nther, Reinhard Buettner, and Roland SchÃ×le.
Phosphorylation of histone h3t6 by pkcbeta(i) controls
demethylation at histone h3k4.
Nature, 464(7289):792796, Apr 2010.
[221] Robert C Millikan, Beth Newman, Chiu-Kit Tse, Patricia G Moorman, Kathleen Conway, Lynn G Dressler, Lisa V Smith, Miriam H Labbok, Joseph Geradts, Jeannette T
Bensen, Susan Jackson, Sarah Nyante, Chad Livasy, Lisa Carey, H. Shelton Earp, and
Charles M Perou. Epidemiology of basal-like breast cancer.
Breast Cancer Res Treat,
109(1):123139, May 2008.
[222] Thomas A Milne, Yali Dou, Mary Ellen Martin, Hugh W Brock, Robert G Roeder,
and Jay L Hess.
target genes.
Mll associates specically with a subset of transcriptionally active
Proc Natl Acad Sci U S A, 102(41):1476514770, Oct 2005.
[223] S. B. Montgomery, O. L. Grith, M. C. Sleumer, C. M. Bergman, M. Bilenky, E. D.
Pleasance, Y. Prychyna, X. Zhang, and S. J M Jones.
Oreganno:
an open access
database and curation system for literature-derived promoters, transcription factor
binding sites and regulatory variation.
Bioinformatics, 22(5):637640, Mar 2006.
[224] Susan E Moody, Denise Perez, Tien chi Pan, Christopher J Sarkisian, Carla P Portocarrero, Christopher J Sterner, Kathleen L Notorfrancesco, Robert D Cardi, and
Lewis A Chodosh. The transcriptional repressor snail promotes mammary tumor recurrence.
Cancer Cell, 8(3):197209, Sep 2005.
[225] Eyal Mor, Yuval Cabilly, Yona Goldshmit, Harel Zalts, Shira Modai, Liat Edry, Orna
BIBLIOGRAPHY
140
Elroy-Stein, and Noam Shomron. Species-specic microrna roles elucidated following
astrocyte activation.
Nucleic Acids Res, Jan 2011.
[226] Ali Mortazavi, Brian A Williams, Kenneth McCue, Lorian Schaeer, and Barbara
Wold. Mapping and quantifying mammalian transcriptomes by rna-seq.
Nat Methods,
5(7):621628, Jul 2008.
[227] Ettore Mosca, Roberta Aleri, Ivan Merelli, Federica Viti, Andrea Calabria, and Luciano Milanesi. A multilevel data integration resource for breast cancer study.
BMC
Syst Biol, 4:76, 2010.
[228] David W. Mount. Bioinformatics: Sequence and genome analysis, second edition. 7
2004.
[229] Ugrappa Nagalakshmi, Zhong Wang, Karl Waern, Chong Shou, Debasish Raha, Mark
Gerstein, and Michael Snyder.
dened by rna sequencing.
The transcriptional landscape of the yeast genome
Science, 320(5881):13441349, Jun 2008.
[230] Niranjan Nagarajan, Neil Jones, and Uri Keich. Computing the p-value of the information content from an alignment of multiple sequences.
Bioinformatics,
21 Suppl
1:i311i318, Jun 2005.
[231] Tatsuya Nakamura, Toshiki Mori, Shinichiro Tada, Wladyslaw Krajewski, Tanya Rozovskaia, Richard Wassell, Garrett Dubois, Alexander Mazo, Carlo M Croce, and Eli
Canaani. All-1 is a histone methyltransferase that assembles a supercomplex of proteins involved in transcriptional regulation.
Mol Cell, 10(5):11191128, Nov 2002.
[232] S. Nandi, R. C. Guzman, and J. Yang. Hormones and mammary carcinogenesis in mice,
rats, and humans: a unifying hypothesis.
Proc Natl Acad Sci U S A, 92(9):36503657,
Apr 1995.
[233] T. Narita, N. Kawakami-Kimura, M. Sato, N. Matsuura, S. Higashiyama, N. Taniguchi,
and R. Kannagi. Alteration of integrins by heparin-binding egf-like growth factor in
human breast cancer cells.
Oncology, 53(5):374381, 1996.
[234] Martijn C. Nawijn, Andrej Alendar, and Anton Berns. For better or for worse: the
role of pim oncogenes in tumorigenesis.
Nat Rev Cancer, 11(1):2334, January 2011.
BIBLIOGRAPHY
141
[235] S. B. Needleman and C. D. Wunsch. A general method applicable to the search for
J Mol Biol,
similarities in the amino acid sequence of two proteins.
48(3):443453,
Mar 1970.
[236] Richard M Neve, Koei Chin, Jane Fridlyand, Jennifer Yeh, Frederick L Baehner, Tea
Fevr, Laura Clark, Nora Bayani, Jean-Philippe Coppe, Frances Tong, Terry Speed,
Paul T Spellman, Sandy DeVries, Anna Lapuk, Nick J Wang, Wen-Lin Kuo, Jackie L
Stilwell, Daniel Pinkel, Donna G Albertson, Frederic M Waldman, Frank McCormick,
Robert B Dickson, Michael D Johnson, Marc Lippman, Stephen Ethier, Adi Gazdar,
and Joe W Gray. A collection of breast cancer cell lines for the study of functionally
distinct cancer subtypes.
Cancer Cell, 10(6):51527, 2006.
[237] Torsten O Nielsen, Forrest D Hsu, Kristin Jensen, Maggie Cheang, Gamze Karaca,
Zhiyuan Hu, Tina Hernandez-Boussard, Chad Livasy, Dave Cowan, Lynn Dressler,
Lars A Akslen, Joseph Ragaz, Allen M Gown, C. Blake Gilks, Matt van de Rijn, and
Charles M Perou. Immunohistochemical and clinical characterization of the basal-like
subtype of invasive breast carcinoma.
Clin Cancer Res, 10(16):53675374, Aug 2004.
[238] M. Angela Nieto. The snail superfamily of zinc-nger transcription factors.
Nat Rev
Mol Cell Biol, 3(3):155166, Mar 2002.
[239] Karl P Nightingale, Susanne Gendreizig, Darren A White, Charlotte Bradbury, Florian
Hollfelder, and Bryan M Turner. Cross-talk between histone modications in response
to histone deacetylase inhibitors: Mll4 links histone h3 acetylation and histone h3k4
methylation.
J Biol Chem, 282(7):44084416, Feb 2007.
[240] Z. Ning, A. J. Cox, and J. C. Mullikin.
databases.
Ssaha: a fast search method for large dna
Genome Res, 11(10):17251729, Oct 2001.
[241] J. D. Norris, D. Fan, S. A. Kerner, and D. P. McDonnell.
Identication of a third
autonomous activation domain within the human estrogen receptor.
Mol Endocrinol,
11(6):747754, Jun 1997.
[242] D. Olmeda, M. JordÃ½, H. Peinado, A. Fabra, and A. Cano. Snail silencing eectively
suppresses tumour growth and invasiveness.
Oncogene, 26(13):18621874, Mar 2007.
[243] M. V. Olson. Human genetics: Dr watson's base pairs.
2008.
Nature,
452(7189):819820,
BIBLIOGRAPHY
142
[244] Lezanne Ooi and Ian C Wood. Chromatin crosstalk in development and disease: lessons
from rest.
Nat Rev Genet, 8(7):544554, Jul 2007.
[245] Cynthia Osborne, Paschal Wilson, and Debu Tripathy. Oncogenes and tumor suppressor genes in breast cancer: potential diagnostic and therapeutic applications.
Oncolo-
gist, 9(4):361377, 2004.
[246] Monica Di Padova, Tiziana Bruno, Francesca De Nicola, Simona Iezzi, Carmen
D'Angelo, Rita Gallo, Daniela Nicosia, Nicoletta Corbi, Annamaria Biroccio, Aristide Floridi, Claudio Passananti, and Maurizio Fanciulli. Che-1 arrests human colon
carcinoma cell proliferation by displacing hdac1 from the p21waf1/cip1 promoter.
J
Biol Chem, 278(38):3649636504, Sep 2003.
[247] Eduardo Parra and Jorge Ferreira.
The eect of sirna-egr-1 and camptothecin on
growth and chemosensitivity of breast cancer cell lines.
Oncol Rep,
23(4):11591165,
Apr 2010.
[248] Chiara Pastrello, Jerry Polesel, Lara Della Puppa, Alessandra Viel, and Roberta
Maestro.
Association between hsa-mir-146a genotype and tumor age-of-onset in
brca1/brca2-negative familial breast and ovarian cancer patients.
Carcinogenesis,
31(12):21242126, Dec 2010.
[249] Giulio Pavesi, Paolo Mereghetti, Giancarlo Mauri, and Graziano Pesole. Weeder web:
discovery of transcription factor binding sites in a set of sequences from co-regulated
genes.
Nucleic Acids Res, 32(Web Server issue):W199W203, Jul 2004.
[250] Shannon R Payne and Christopher J Kemp. Tumor suppressor genetics.
Carcinogen-
esis, 26(12):20312045, Dec 2005.
[251] Hector Peinado, Faustino Marin, Eva Cubillo, Hans-Juergen Stark, Norbert Fusenig,
M. Angela Nieto, and Amparo Cano.
Snail and e47 repressors of e-cadherin induce
distinct invasive and angiogenic properties in vivo.
J Cell Sci, 117(Pt 13):28272839,
Jun 2004.
[252] HÃ
©
ctor Peinado, Francisco Portillo, and Amparo Cano. Transcriptional regulation
of cadherins during development and carcinogenesis.
2004.
Int J Dev Biol, 48(5-6):365375,
BIBLIOGRAPHY
143
[253] Steve Pells, editor.
ular Biology).
Nuclear Reprogramming: Methods and Protocols (Methods in Molec-
Humana Press, 1st edition. edition, 12 2010.
[254] T. V. Perneger. What's wrong with bonferroni adjustments.
BMJ,
316(7139):1236
1238, Apr 1998.
[255] C. M. Perou, T. Sørlie, M. B. Eisen, M. van de Rijn, S. S. Jerey, C. A. Rees, J. R. Pollack, D. T. Ross, H. Johnsen, L. A. Akslen, O. Fluge, A. Pergamenschikov, C. Williams,
S. X. Zhu, P. E. Lønning, A. L. Børresen-Dale, P. O. Brown, and D. Botstein. Molecular portraits of human breast tumours.
[256] V. Petit and J. P. Thiery.
Nature, 406(6797):747752, Aug 2000.
Focal adhesions:
structure and dynamics.
Biol Cell,
92(7):477494, Oct 2000.
Computational Molecular Biology: An Algorithmic Approach (Computational Molecular Biology). The MIT Press, 1 edition, 8 2000.
[257] Pavel A. Pevzner.
[258] S. Pietrokovski. Searching databases of conserved sequence regions by aligning protein
multiple-alignments.
Nucleic Acids Res, 24(19):38363845, Oct 1996.
[259] Eva Pizzoferrato, Ye Liu, Andrea Gambotto, Michaele J Armstrong, Michael T Stang,
William E Gooding, Sean M Alber, Stuart H Shand, Simon C Watkins, Walter J
Storkus, and John H Yim. Ectopic expression of interferon regulatory factor-1 promotes
human breast cancer cell death and results in reduced expression of survivin.
Cancer
Res, 64(22):83818388, Nov 2004.
[260] Anna Portela and Manel Esteller. Epigenetic modications and human disease.
Nat
Biotechnol, 28(10):10571068, Oct 2010.
[261] Sandra
ence?
Porter.
Watson's
genome,
Scitizen, September 2007.
[online]
venter's
genome,
what's
the
dier-
http://scitizen.com/biotechnology/
watson-s-genome-venter-s-genome-what-s-the-difference-_a-28-1038.html.
[262] H. W. C. Postma. Rapid sequencing of individual dna molecules in graphene nanogaps.
Nano Lett., 10(2):420 425, 2010.
[263] Alkes Price, Sriram Ramabhadran, and Pavel A Pevzner.
branching from sample strings.
Finding subtle motifs by
Bioinformatics, 19 Suppl 2:ii149ii155, Oct 2003.
BIBLIOGRAPHY
144
[264] Alexandre Prieur, Franck Tirode, Pinchas Cohen, and Olivier Delattre.
Ews/i-1
silencing and gene proling of ewing cells reveal downstream oncogenic pathways and
a crucial role for repression of insulin-like growth factor binding protein 3.
Mol Cell
Biol, 24(16):72757283, Aug 2004.
[265] Beatriz PÃ
©
rez-CadahÃa, Bojan Drobic, Protiti Khan, Chaitra C Shivashankar, and
James R Davie. Current understanding and importance of histone phosphorylation in
regulating chromatin biology.
Curr Opin Drug Discov Devel, 13(5):613622, Sep 2010.
[266] Jane Qiu. Epigenetics: unnished symphony.
Nature, 441(7090):143145, May 2006.
[267] Aaron R Quinlan and Ira M Hall. Bedtools: a exible suite of utilities for comparing
genomic features.
Bioinformatics, 26(6):841842, Mar 2010.
[268] M. Raica, I. Jung, Anca Maria Cimpean, C. Suciu, and Anca Maria Muresan. From
conventional pathologic diagnosis to the molecular classication of breast carcinoma:
are we ready for the change?
Rom J Morphol Embryol, 50(1):513, 2009.
[269] E. A. Rakha, M. E. El-Sayed, A. R. Green, E. C. Paish, A. H. S. Lee, and I. O. Ellis.
Breast carcinoma with basal dierentiation: A proposal for pathology denition based
on basal cytokeratin expression.
Histopathology, 50(4):434 438, 2007.
[270] Kim R Rasmussen, Jens Stoye, and Eugene W Myers.
nding all epsilon-matches over a given length.
Ecient q-gram lters for
J Comput Biol,
13(2):296308, Mar
2006.
Pharmacotherapy plus endoscopic intervention is more eective than pharmacotherapy or
endoscopy alone in the secondary prevention of esophageal variceal bleeding: a metaanalysis of randomized, controlled trials., volume 70. 2009.
[271] M. Ravipati, S. Katragadda, P. D. Swaminathan, J. Molnar, and E. Zarling.
[272] Chandan K Reddy, Yao-Chung Weng, and Hsiao-Dong Chiang.
Rening motifs by
improving information content scores using neighborhood prole search.
Algorithms
Mol Biol, 1:23, 2006.
[273] Sirigiri Divijendra Natha Reddy, Kazufumi Ohshiro, Suresh K Rayala, and Rakesh Kumar. Microrna-7, a homeobox d10 target, inhibits p21-activated kinase 1 and regulates
its functions.
Cancer Res, 68(20):81958200, Oct 2008.
BIBLIOGRAPHY
145
[274] K. L. Redmond, N. T. Crawford, H. Farmer, Z. C. D'Costa, G. J. O'Brien, N. E.
Buckley, R. D. Kennedy, P. G. Johnston, D. P. Harkin, and P. B. Mullan. T-box 2
represses NDRG1 through an EGR1-dependent mechanism to drive the proliferation
of breast cancer cells.
Oncogene, 29:32523262, Jun 2010.
[275] John S Reece-Hoyes, Bart Deplancke, M. Inmaculada Barrasa, Julia Hatzold, Ryan B
Smit, H. Efsun Arda, Patricia A Pope, Jeb Gaudet, Barbara Conradt, and Albertha
J M Walhout. The c. elegans snail homolog ces-1 can activate gene expression in vivo
and share targets with bhlh transcription factors.
Nucleic Acids Res, 37(11):36893698,
Jun 2009.
[276] JÃ×ri Reimand, Meelis Kull, Hedi Peterson, Jaanus Hansen, and Jaak Vilo. g:proler
a web-based toolset for functional proling of gene lists from large-scale experiments.
Nucleic Acids Res, 35(Web Server issue):W193W200, Jul 2007.
[277] K.L. Rice, D.J. Izon, J. Ford, A. Boodhoo, U.R. Kees, and W.K. Greene. Overexpression of stem cell associated aldh1a1, a target of the leukemogenic transcription factor
tlx1/hox11, inhibits lymphopoiesis and promotes myelopoiesis in murine hematopoietic
progenitors.
Leuk Res, 32:87383, 2008.
[278] A. Gordon Robertson, Mikhail Bilenky, Angela Tam, Yongjun Zhao, Thomas Zeng,
Nina Thiessen, Timothee Cezard, Anthony P Fejes, Elizabeth D Wederell, Rebecca
Cullum, Ghia Euskirchen, Martin Krzywinski, Inanc Birol, Michael Snyder, Pamela A
Hoodless, Martin Hirst, Marco A Marra, and Steven J M Jones.
Genome-wide re-
lationship between histone h3 lysine 4 mono- and tri-methylation and transcription
factor binding.
Genome Res, 18(12):19061917, Dec 2008.
[279] K. D. Robertson.
Dna methylation, methyltransferases, and cancer.
Oncogene,
20(24):31393155, May 2001.
[280] Stefan Roepcke, Steen Grossmann, Sven Rahmann, and Martin Vingron. T-reg comparator: an analysis tool for the comparison of position weight matrices.
Nucleic Acids
Res, 33(Web Server issue):W438W441, Jul 2005.
[281] M. Ronaghi, S. Karamohamed, B. Pettersson, M. Uhlén, and P. Nyrén. Real-time dna
sequencing using detection of pyrophosphate release.
1996.
Anal. Biochem., 242(1):84 89,
BIBLIOGRAPHY
146
[282] Stephen M Rumble, Phil Lacroute, Adrian V Dalca, Marc Fiume, Arend Sidow, and
Michael Brudno. Shrimp: accurate mapping of short color-space reads.
PLoS Comput
Biol, 5(5):e1000386, May 2009.
¿
[283] Albin Sandelin, Wynand Alkema, PÃ
¶
r EngstrÃ m, Wyeth W Wasserman, and Boris
Lenhard. Jaspar: an open-access database for eukaryotic transcription factor binding
proles.
Nucleic Acids Res, 32(Database issue):D91D94, Jan 2004.
[284] Albin Sandelin and Wyeth W Wasserman. Constrained binding site diversity within
families of transcription factors enhances pattern discovery bioinformatics.
J Mol Biol,
338(2):207215, Apr 2004.
[285] F. Sanger and A. R. Coulson. A rapid method for determining sequences in dna by
primed synthesis with dna polymerase.
J. Mol. Biol., 94(3):441 448, 1975.
[286] Carla Sawan, Thomas Vaissière, Rabih Murr, and Zdenko Herceg. Epigenetic drivers
and genetic passengers on the road to cancer.
Mutat. Res., 642(1-2):113, Jul 2008.
[287] Eric E Schadt, Steve Turner, and Andrew Kasarskis. A window into third-generation
sequencing.
Hum Mol Genet, 19(R2):R227R240, Oct 2010.
[288] C. J. Schoenherr and D. J. Anderson. The neuron-restrictive silencer factor (nrsf ): a
coordinate repressor of multiple neuron-specic genes.
Science, 267(5202):13601363,
Mar 1995.
[289] C. J. Schoenherr, A. J. Paquette, and D. J. Anderson.
target genes for the neuron-restrictive silencer factor.
Identication of potential
Proc. Natl. Acad. Sci. U.S.A.,
93:98819886, Sep 1996.
[290] Johannes H Schulte, Tobias Marschall, Marcel Martin, Philipp Rosenstiel, Pieter
Mestdagh, Stefanie Schlierf, Theresa Thor, Jo Vandesompele, Angelika Eggert, Stefan
Schreiber, Sven Rahmann, and Alexander Schramm. Deep sequencing reveals dierential expression of micrornas in favorable versus unfavorable neuroblastoma.
Nucleic
Acids Res, 38(17):59195928, Sep 2010.
[291] S. P. Shah, R. D. Morin, J. Khattra, L. Prentice, T. Pugh, A. Burleigh, A. Delaney,
K. Gelmon, R. Guliany, J. Senz, C. Steidl, R. A. Holt, S. Jones, M. Sun, G. Leung, R. Moore, T. Severson, G. A. Taylor, A. E. Teschendor, K. Tse, G. Turashvili,
BIBLIOGRAPHY
147
R. Varhol, R. L. Warren, P. Watson, Y. Zhao, C. Caldas, D. Huntsman, M. Hirst, M. A.
Marra, and S. Aparicio. Mutational evolution in a lobular breast tumour proled at
single nucleotide resolution.
Nature, 461:809813, Oct 2009.
[292] S. T. Sherry, M. H. Ward, M. Kholodov, J. Baker, L. Phan, E. M. Smigielski, and
K. Sirotkin.
dbsnp:
the ncbi database of genetic variation.
Nucleic Acids Res,
29(1):308311, Jan 2001.
[293] Yujiang Shi, Fei Lan, Caitlin Matson, Peter Mulligan, Johnathan R Whetstine,
Philip A Cole, Robert A Casero, and Yang Shi.
by the nuclear amine oxidase homolog lsd1.
Histone demethylation mediated
Cell, 119(7):941953, Dec 2004.
[294] A. Sigal and V. Rotter. Oncogenic mutations of the p53 tumor suppressor: the demons
of the guardian of the genome.
Cancer Res, 60(24):67886793, Dec 2000.
[295] Emily Singer. Sequencing tumors to target treatment.
Technology review india, 2009.
[296] D. J. Slamon, G. M. Clark, S. G. Wong, W. J. Levin, A. Ullrich, and W. L. McGuire.
Human breast cancer:
her-2/neu oncogene.
correlation of relapse and survival with amplication of the
Science, 235(4785):177182, Jan 1987.
[297] Martha L Slattery, Erica Wol, Michael D Homan, Daniel F Pellatt, Brett Milash,
and Roger K Wol. Micrornas and colon and rectal cancer: Dierential expression by
tumor location and subtype.
Genes Chromosomes Cancer, Dec 2010.
[298] T. F. Smith and M. S. Waterman. Identication of common molecular subsequences.
J Mol Biol, 147(1):195197, Mar 1981.
[299] N. R. Soman, P. Correa, B. A. Ruiz, and G. N. Wogan.
The tpr-met oncogenic
rearrangement is present and expressed in human gastric carcinoma and precursor
lesions.
Proc Natl Acad Sci U S A, 88(11):48924896, Jun 1991.
[300] H. Song, X. Jin, and J. Lin. Stat3 upregulates mek5 expression in human breast cancer
cells.
Oncogene, 23:83019, 2004.
[301] Wiley W. Souba and Douglas W. Wilmore, editors.
1st edition, 2 2001.
Surgical Research. Academic Press,
BIBLIOGRAPHY
148
[302] H. D. Soule, J. Vazguez, A. Long, S. Albert, and M. Brennan. A human cell line from a
pleural eusion derived from a breast carcinoma.
J Natl Cancer Inst, 51(5):14091416,
Nov 1973.
[303] B. D. Strahl and C. D. Allis. The language of covalent histone modications.
Nature,
403(6765):4145, Jan 2000.
[304] Michael R. Stratton.
Exploring the genomes of cancer cells: Progress and promise.
Science, 331(6024):15531558, 2011.
[305] Xiaohua Su, Deepavali Chakravarti, Min Soon Cho, Lingzhi Liu, Young Jin Gi, Yu-Li
Lin, Marco L Leung, Adel El-Naggar, Chad J Creighton, Milind B Suraokar, Ignacio
Wistuba, and Elsa R Flores. Tap63 suppresses metastasis through coordinate regulation of dicer and mirnas.
Nature, 467(7318):986990, Oct 2010.
[306] Zu-Wen Sun and C. David Allis. Ubiquitination of histone h2b regulates h3 methylation
and gene silencing in yeast.
Nature, 418(6893):104108, Jul 2002.
[307] A. H. Swirno, E. D. Apel, J. Svaren, B. R. Sevetson, D. B. Zimonjic, N. C. Popescu,
and J. Milbrandt. Nab1, a corepressor of ng-a (egr-1), contains an active transcriptional repression domain.
Mol Cell Biol, 18(1):512524, Jan 1998.
[308] T. SÃºrlie, C. M. Perou, R. Tibshirani, T. Aas, S. Geisler, H. Johnsen, T. Hastie,
M. B. Eisen, M. van de Rijn, S. S. Jerey, T. Thorsen, H. Quist, J. C. Matese, P. O.
Brown, D. Botstein, P. Eystein LÃºnning, and A. L. BÃºrresen-Dale. Gene expression
patterns of breast carcinomas distinguish tumor subclasses with clinical implications.
Proc Natl Acad Sci U S A, 98(19):1086910874, Sep 2001.
[309] Rulla M Tamimi, Heather J Baer, Jonathan Marotti, Mark Galan, Laurie Galaburda,
Yineng Fu, Anne C Deitz, James L Connolly, Stuart J Schnitt, Graham A Colditz,
and Laura C Collins. Comparison of molecular phenotypes of ductal carcinoma in situ
and invasive breast cancer.
Breast Cancer Res, 10(4):R67, 2008.
[310] H. Tanaka and T. Kawai. Partial sequencing of a single dna molecule with a scanning
tunnelling microscope.
Nat Nanotechnol, 4(8):518 522, 2009.
BIBLIOGRAPHY
149
[311] M. Tanaka, M. Schinke, H. S. Liao, N. Yamasaki, and S. Izumo. Nkx2.5 and nkx2.6,
homologs of drosophila tinman, are required for development of the pharynx.
Mol Cell
Biol, 21(13):43914398, Jul 2001.
[312] Xiaoqing Tian and Jingyuan Fang.
Current perspectives on histone demethylases.
Acta Biochim Biophys Sin (Shanghai), 39(2):8188, Feb 2007.
[313] Martin Tompa, Nan Li, Timothy L Bailey, George M Church, Bart De Moor, Eleazar
Eskin, Alexander V Favorov, Martin C Frith, Yutao Fu, W. James Kent, Vsevolod J
Makeev, Andrei A Mironov, William Staord Noble, Giulio Pavesi, Graziano Pesole,
Mireille RÃ
©
gnier, Nicolas Simonis, Saurabh Sinha, Gert Thijs, Jacques van Helden,
Mathias Vandenbogaert, Zhiping Weng, Christopher Workman, Chun Ye, and Zhou
Zhu. Assessing computational tools for the discovery of transcription factor binding
sites.
Nat Biotechnol, 23(1):137144, Jan 2005.
[314] Alejandro Vaquero, Alejandra Loyola, and Danny Reinberg. The constantly changing
face of chromatin.
Sci Aging Knowledge Environ, 2003(14):RE4, Apr 2003.
[315] Ignacio Varela, Patrick Tarpey, Keiran Raine, Dachuan Huang, Choon Kiat Ong,
Philip Stephens, Helen Davies, David Jones, Meng-Lay Lin, Jon Teague, Graham
Bignell, Adam Butler, Juok Cho, Gillian L Dalgliesh, Danushka Galappaththige, Chris
Greenman, Claire Hardy, Mingming Jia, Calli Latimer, King Wai Lau, John Marshall,
Stuart McLaren, Andrew Menzies, Laura Mudie, Lucy Stebbings, David A Largaespada, L. F A Wessels, Stephane Richard, Richard J Kahnoski, John Anema, David A
Tuveson, Pedro A Perez-Mancera, Ville Mustonen, Andrej Fischer, David J Adams,
Alistair Rust, Waraporn Chan-on, Chutima Subimerb, Karl Dykema, Kyle Furge, Peter J Campbell, Bin Tean Teh, Michael R Stratton, and P. Andrew Futreal. Exome
sequencing identies frequent mutation of the swi/snf complex gene pbrm1 in renal
carcinoma.
Nature, 469(7331):539542, Jan 2011.
[316] Sonia Vega, Aixa V Morales, Oscar H Ocana, Francisco Valdes, Isabel Fabregat, and
M. Angela Nieto. Snail blocks the cell cycle and confers resistance to cell death.
Genes
Dev, 18(10):11311143, May 2004.
[317] Reiner A Veitia. Dominant negative factors in health and disease.
418, Aug 2009.
J Pathol, 218(4):409
BIBLIOGRAPHY
150
[318] R. I. Viji, V. B Sameer Kumar, M. S. Kiran, and P. R. Sudhakaran.
response of endothelial cells to heparin-binding domain of bronectin.
Angiogenic
Int J Biochem
Cell Biol, 40(2):215226, 2008.
[319] M. Wadman. James watson's genome sequenced at high speed.
Nature, 452(7189):788,
2008.
[320] M. P. Wagoner, K. T. Gunsalus, B. Schoenike, A. L. Richardson, A. Friedl, and
A. Roopra. The transcription factor REST is lost in aggressive breast cancer.
PLoS
Genet., 6:e1000979, 2010.
[321] Gang G. Wang, C. David Allis, and Ping Chi. Chromatin remodeling and cancer, part
ii: Atp-dependent chromatin remodeling.
Trends Mol Med, 13(9):373380, Sep 2007.
[322] L. Wang, Q. Wu, P. Qiu, A. Mirza, M. McGuirk, P. Kirschmeier, J. R. Greene,
Y. Wang, C. B. Pickett, and S. Liu. Analyses of p53 target genes in the human genome
by bioinformatic and microarray approaches.
J Biol Chem, 276(47):4360443610, Nov
2001.
[323] Ting Wang and Gary D Stormo. Combining phylogenetic data with co-regulated genes
to identify regulatory motifs.
Bioinformatics, 19(18):23692380, Dec 2003.
[324] Zhibin Wang, Chongzhi Zang, Jerey A Rosenfeld, Dustin E Schones, Artem Barski,
Suresh Cuddapah, Kairong Cui, Tae-Young Roh, Weiqun Peng, Michael Q Zhang, and
Keji Zhao.
Combinatorial patterns of histone acetylations and methylations in the
human genome.
Nat. Genet., 40(7):897903, 2008.
[325] Zhong Wang, Mark Gerstein, and Michael Snyder. Rna-seq: a revolutionary tool for
transcriptomics.
Nat Rev Genet, 10(1):5763, Jan 2009.
[326] Wyeth W Wasserman and Albin Sandelin. Applied bioinformatics for the identication
of regulatory elements.
Nat Rev Genet, 5(4):276287, Apr 2004.
[327] Robert A Waterland and Randy L Jirtle.
Transposable elements: targets for early
nutritional eects on epigenetic gene regulation.
2003.
Mol Cell Biol, 23(15):52935300, Aug
BIBLIOGRAPHY
151
[328] Ian C G Weaver, Frances A Champagne, Shelley E Brown, Sergiy Dymov, Shakti
Sharma, Michael J Meaney, and Moshe Szyf. Reversal of maternal programming of
stress responses in adult ospring through methyl supplementation: altering epigenetic
marking later in life.
J Neurosci, 25(47):1104511054, Nov 2005.
[329] A. Wellstein, W. J. Fang, A. Khatri, Y. Lu, S. S. Swain, R. B. Dickson, J. Sasse,
A. T. Riegel, and M. E. Lippman.
A heparin-binding growth factor secreted from
breast cancer cells homologous to a developmentally regulated cytokine.
J Biol Chem,
267(4):25822587, Feb 1992.
[330] Thomas F Westbrook, Guang Hu, Xiaolu L Ang, Peter Mulligan, Natalya N Pavlova,
Anthony Liang, Yumei Leng, Rene Maehr, Yang Shi, J. Wade Harper, and Stephen J
Elledge.
Scfbeta-trcp controls oncogenic transformation and neural dierentiation
through rest degradation.
Nature, 452(7185):370374, Mar 2008.
[331] David A. Wheeler, Maithreyan Srinivasan, Michael Egholm, Yufeng Shen, Lei Chen,
Amy McGuire, Wen He, Yi-Ju Chen, Vinod Makhijani, G. Thomas Roth, Xavier
Gomes, Karrie Tartaro, Faheem Niazi, Cynthia L. Turcotte, Gerard P. Irzyk, James R.
Lupski, Craig Chinault, Xing-zhi Song, Yue Liu, Ye Yuan, Lynne Nazareth, Xiang Qin,
Donna M. Muzny, Marcel Margulies, George M. Weinstock, Richard A. Gibbs, and
Jonathan M. Rothberg. The complete genome of an individual by massively parallel
dna sequencing.
Nature, 452(7189):872876, Apr 2008.
[332] David L Wheeler, Tanya Barrett, Dennis A Benson, Stephen H Bryant, Kathi Canese,
Deanna M Church, Michael DiCuccio, Ron Edgar, Scott Federhen, Wolfgang Helmberg, David L Kenton, Oleg Khovayko, David J Lipman, Thomas L Madden, Donna R
Maglott, James Ostell, Joan U Pontius, Kim D Pruitt, Gregory D Schuler, Lynn M
Schriml, Edwin Sequeira, Steven T Sherry, Karl Sirotkin, Grigory Starchenko, Tugba O
Suzek, Roman Tatusov, Tatiana A Tatusova, Lukas Wagner, and Eugene Yaschenko.
Database resources of the national center for biotechnology information.
Nucleic Acids
Res, 33(Database issue):D39D45, Jan 2005.
¶
[333] Nava Whiteford, Tom Skelly, Christina Curtis, Matt E Ritchie, Andrea LÃ hr, Alexander Wait Zaranek, Irina Abnizova, and Clive Brown. Swift: primary data analysis for
the illumina solexa sequencing platform.
Bioinformatics, 25(17):21942199, Sep 2009.
BIBLIOGRAPHY
152
[334] Brian T Wilhelm, Samuel Marguerat, Stephen Watt, Falk Schubert, Valerie Wood, Ian
Goodhead, Christopher J Penkett, Jane Rogers, and JÃ×rg BÃ
¿
hler. Dynamic reper-
toire of a eukaryotic transcriptome surveyed at single-nucleotide resolution.
Nature,
453(7199):12391243, Jun 2008.
¢
¶
[335] Laura D Wood, D. Williams Parsons, SiÃ n Jones, Jimmy Lin, Tobias SjÃ blom,
Rebecca J Leary, Dong Shen, Simina M Boca, Thomas Barber, Janine Ptak, Natalie
Silliman, Steve Szabo, Zoltan Dezso, Vadim Ustyanksky, Tatiana Nikolskaya, Yuri
Nikolsky, Rachel Karchin, Paul A Wilson, Joshua S Kaminker, Zemin Zhang, Randal Croshaw, Joseph Willis, Dawn Dawson, Michail Shipitsin, James K V Willson,
Saraswati Sukumar, Kornelia Polyak, Ben Ho Park, Charit L Pethiyagoda, P. V Krishna Pant, Dennis G Ballinger, Andrew B Sparks, James Hartigan, Douglas R Smith,
Erick Suh, Nickolas Papadopoulos, Phillip Buckhaults, Sanford D Markowitz, Giovanni Parmigiani, Kenneth W Kinzler, Victor E Velculescu, and Bert Vogelstein. The
genomic landscapes of human breast and colorectal cancers.
Science, 318(5853):1108
1113, Nov 2007.
[336] Kenichi Yamane, Keisuke Tateishi, Robert J Klose, Jia Fang, Laura A Fabrizio, Hediye
Erdjument-Bromage, Joyce Taylor-Papadimitriou, Paul Tempst, and Yi Zhang. Plu-1
is an h3k4 demethylase involved in transcriptional repression and breast cancer cell
proliferation.
Mol Cell, 25(6):801812, Mar 2007.
[337] Maojun Yang, Christian B Gocke, Xuelian Luo, Dominika Borek, Diana R Tomchick,
Mischa Machius, Zbyszek Otwinowski, and Hongtao Yu. Structural basis for corestdependent demethylation of nucleosomes by the human lsd1 histone demethylase.
Mol
Cell, 23(3):377387, Aug 2006.
[338] Fruma Yehiely, Jose V Moyano, Joseph R Evans, Torsten O Nielsen, and Vincent L
Cryns. Deconstructing the molecular portrait of basal-like breast cancer.
Trends Mol
Med, 12(11):537544, Nov 2006.
[339] Hong Yu, Shanshan Zhu, Bing Zhou, Huiling Xue, and Jing-Dong J Han.
Infer-
ring causal relationships among dierent histone modications and gene expression.
Genome Res., 18(8):131424, Aug 2008.
[340] Hua Yu, Marcin Kortylewski, and Drew Pardoll. Crosstalk between cancer and immune
BIBLIOGRAPHY
153
cells: role of stat3 in the tumour microenvironment.
Nat Rev Immunol,
7(1):4151,
Jan 2007.
[341] J. S. Yu, S. Koujak, S. Nagase, C-M. Li, T. Su, X. Wang, M. Keniry, L. Memeo,
A. Rojtman, M. Mansukhani, H. Hibshoosh, B. Tycko, and R. Parsons. Pcdh8, the
human homolog of papc, is a candidate tumor suppressor of breast cancer.
Oncogene,
27(34):46574665, Aug 2008.
[342] N. Zhang, W. Shen, R. G. Hawley, and M. Lu. Hox11 interacts with ctf1 and mediates
hematopoietic precursor cell immortalization.
Oncogene, 18(13):22732279, Apr 1999.
[343] Y. Zhang and D. Reinberg. Transcription regulation by histone methylation: interplay between dierent covalent modications of the core histone tails.
Genes Dev,
15(18):23432360, Sep 2001.
[344] Yupeng Zheng, Sam John, James J Pesavento, Jennifer R Schultz-Norton, R. Louis
Schiltz, Sonjoon Baek, Ann M Nardulli, Gordon L Hager, Neil L Kelleher, and Craig A
Mizzen.
Histone h1 phosphorylation is associated with transcription by rna poly-
merases i and ii.
J Cell Biol, 189(3):407415, May 2010.
[345] Qin Zhou, Jinjin Fan, Xuebing Ding, Wenxing Peng, Xueqing Yu, Yueqin Chen, and
Jing Nie. Tgf-beta-induced mir-491-5p expression promotes par-3 degradation in rat
proximal tubular epithelial cells.
J Biol Chem, 285(51):4001940027, Dec 2010.
[346] XiaoGuang Zhou, LuFeng Ren, YunTao Li, Meng Zhang, YuDe Yu, and Jun Yu. The
next-generation sequencing technology: a technology review and future perspective.
Sci China Life Sci, 53(1):4457, Jan 2010.
[347] Surekha M. Zingde.
2001.
Cancer genes.
Current Science,
81(5):5085141, September 10