Proteomics of body liquids as a source for potential

Proteomics of body liquids as a source for potential
methods for medical diagnostics
Prof. Dr. Evgeny Nikolaev
Institute for Biochemical Physics,
Rus. Acad. Sci., Moscow, Russia.
Institute for Energy Problems of Chemical Physics
Rus. Acad. Sci., Moscow, Russia.
High throughput proteome analyses by tandem
mass spectrometry methods
Proteins
Peptides
digestion
Mass Spec
HPLC/MS
MS/MS
Protein DB
S14_1 #3422 RT: 52.14 AV: 1 NL: 4.69E2
T: ITMS + c ESI d w Full ms2 600.81@cid35.00 [155.00-1215.00]
1006.34
100
Relative Abundance
Parent and
fragment ion
intensities
Protein & Peptide
Identifications
639.26
95
90
85
80
75
70
65
60
843.29
55
50
274.16
45
527.33
40
35
715.26
520.75
30 194.95
470.18
25
358.11
340.02 403.14
20
927.13
726.34
548.03
664.22
790.48
15
10
5
936.21 1026.22
257.04
1096.17 1168.44
0
200
Mascot
300
400
500
600
700
m/z
800
900
1000
1100
1200
MS/MS Spectra
Problem of methods based on MS/MS
identification
- Sensitivity lost –informative are only MS/MS spectra,
whose intensity is at least ~10-fold lower than
intensity of MS spectra
- There is no possibility to detect all peptides in one
run
- Extra time for fragment spectra measurements
causes longer chromatography time (application of
UPLC is questionable for some types of MS
instruments)
Relative amounts of new peptide identifications during
several consecutive LC-MS runs with the same sample.
3
2
1
The other possibility in proteomics –
usage of high mass measurement
accuracy mass spectrometry
(From Alan Marshall NHMFL)
Ion cyclotron resonance mass spectrometer can
measure masses with sub ppm accuracy
FTMS Data
Linear ion trap
Линейная ионная ловушка
Magnet
Магнит
7T
Electron gun
Электронная пушка
ИК-лазер
IR
laser
Other mass spectrometers with high accuracy
of mass measurements are available now
Orbitraps
Q-TOFs
BRUKER micrOTOF-QII
…….
Mass accuracy 1-2 ppm (intern. calib.), 5 ppm (extern. calib.)
Resolution 20 000-60 000 FWHM
Rate of mass spectra measurements >20 Hz
At accuracy level of 1 ppm elementary composition
of peptide with mass up to 600 Da
and
amino acid composition of peptide with mass up to
500 Da could be determined almost unambiguously
It is not enough for peptide identification!
Accurate mass tag retention time
Dick Smith group (PNNL)
.
Besides we have another tag - LC retention time
Accurate mass tag together with retention time
Can identify peptide practically unambiguously!
LC reproducibility-Agilent 1100
RT: 46.10 - 80.40
NL: 1.07E6
Base Peak F:
FTMS + p ESI Full
ms [
350.00-2000.00]
MS
urine_1-5_01ul_150min
120
55.89
100
80
60.73
60
40
20
46.69
57.46
49.23 49.88
64.54
65.97
58.02
54.17
68.81 69.41
62.53
75.42
72.79
76.66 78.29
0
120
NL: 1.44E6
Base Peak F:
FTMS + p ESI Full
ms [
350.00-2000.00]
MS urine2nd
55.66
100
80
60.45
60
64.35
40
20
46.48
49.02
57.23
49.68 51.57
55.05
65.78
57.91
66.77 68.65 69.41
62.37
75.24
72.61
76.47 78.14
0
120
NL: 1.83E6
Base Peak F:
FTMS + p ESI Full
ms [
350.00-2000.00]
MS urine3thd
55.65
100
80
60.64
60
64.33
40
57.25
46.56
20
49.09
51.54
55.18
65.93
57.94
75.26
68.73 69.23 69.52 72.61
62.38
76.43 78.17
0
120
NL: 5.89E5
Base Peak MS
urine4th
55.90
100
80
60.93
60
64.46
40
20
58.17
46.66
47.48
49.97
51.58
66.16
60.13
54.27
75.42
68.81
64.00
69.35
72.76
74.20
76.56 78.44
0
48
50
52
54
56
58
60
62
64
66
68
70
72
74
76
78
80
...TGLYCESQTPRSLTLGIEPVSPTSLRVGLQRYVQLRSLR ...
Vasorin (Homo Sapiens protein)
trypsinolyses
…TGLYCESQTPR
SLTLGIEPVSPTSLR
Fragment (463-477) from Vasorin
LC- FTICR
VGLQR YVQLR SLR
LC-MS/MS (e.g. with ion trap)
y9
identification
y7
y8
y6
522.5
m/z
525.0
validation
b9
b8
y5
y4 b6 b7
450
500
550
m/z
600
y10
b11
b10
y11
y12
b12
b13
b14
y13
650
Accurate measured mass: 1568.8768
200
600
1,000
m/z
1,400
1,800
Putative mass tag from
Homo Sapiens: SLTLGIEPVSPTSLR
Calculated mass (1568.8773)
And measured retention time
Validated accurate mass tag (SLTLGIEPVSPTSLR)
Thus, the general idea is to create using MS/MS a data
base for accurate mass tags and retention times as a
reference base for quantitative proteomics
Analyses of urine proteome
Urine is available in large quantities –
ideal analyte for noninvasive diagnostic.
Possibility of biomarker discovery is attracting a big
attention.
1500 proteins!!! (from Mann’s group
Adachi et al. Genome Biology 2006, Volume 7, Issue 9, Article R80 )
Accurate mass tag retention time approach
Lab
Lab
FT MS
Clinic
ESI Q-TOF
ESI TOF
Statistics of the collected AMT tags in urine
proteom
233 LC-MS (liquid chromatography coupled with mass
spectrometry) runs totally:
(80% of men and 20% of women)
and
25 samples from each of 6 long term isolation experiments
volunteers (during 19 weeks) have been collected so far.
The number of peptides in the database 2758
The number of urine proteins in the database 840
Two kinds of sample donors
People “from street”
and
people in “special conditions”.
General
blood analysis
Examination of
internist
Blood pressure
measurement
Current control for urogenital and
other pathology including kidney
pathology, prostatitis, arterial
hypertension, diabetes
Decision to include a person
to the study group
Analysis of archival
information from
medical records
Control for treatment with
diuretics and excessive
consumption of fluids
Data recorded for each sample
1. Number
2. Name,
3. Date of birth
4. Sex
5. Date of urine collection
6. Time of urine collection
7. Current smoking status (+/-)
8. Sample volume
9. Clinical parameters (other diseases)
10.Results of testing for bilirubin, urobilinogen, ketones,
glucose, protein, blood, nitrite, pH, specific gravity, leukocytes
For “healthy people data base” subset
we need urine samples from persons under well
controlled diet and having healthy lifestyle?
In this case we can test urine temporal variability
and polymorphism
Those are people participating In long term
isolation experiments in the frame of space
research programs. April- July 2009.
Ground based experimental facility
April- July 2009
Urine collection
Centrifugation
Sample concentration
Amicon Ultra Ultracel-15 3 k
Desalting and major protein removal
Carboxymethylation and
trypsinolyses
LC MS analyses
Search engine: Mascot
Database: IPI.Human v.3.52
Parent Tolerance: ± 5.0 PPM (Monoisotopic)
Fragment Tolerance: ± 0.50 Da (Monoisotopic)
Fixed Modifications: Carbamidomethyl (C)
Variable Modifications: Oxidation(M)
Digestion Enzyme: Trypsin
Max Missed Cleavages: 2
Instrument type: Ion-trap
What is in the DB
•
•
•
•
•
•
•
•
•
•
Run, in which this peptide was identified
Peptide sequence
What protein does this sequence belong to
Mascot score
Modifications
Measured mass
Theoretical mass
Measured charge
RT, when the peptide began to elute from the column
RT, when the peptide finished elution
Retention time normalization
Normalization – time scale alignment for series of experiments
Several types of normalization are possible:
- By some added calibrant – external calibration (e.g.
Cytochrome C)
- By theoretically predicted RTs
- By peptides that are always present in your samples (for
example, peptides of digestion enzyme, etc.)
We have chosen the last one, as it is rather robust and
doesn’t require any additional sample treatment.
RTs are renormalized every time a new run is added to the
database.
Normalization for runs without MS/MS
HPLC is considered to be linear, so different masses should retain elution order
from run to run. We can use pivots and look for the same sequence of masses in the
run without MS/MS, our goal is to find the longest common subsequence.
878.1
1024.5
1575.1
2330.9
1150.3
Run 5
without
MS/MS
1
758.1
1024.5
1575.1
3
2
1150.3
Average
NET for a
peptide
Elution sequence of
known masses is
retained
758.1
NETs sorted by RT
No MS/MS run peak list sorted by RT
2330.9
1150.3
1150.3
878.1
1575.1
758.1
●
●
1024.5
1575.1
758.1
1024.5
●
●
Normalization of runs without MS/MS
RT correlation between all masses
within 5 ppm
70
• 2 runs of different urine samples
performed with 1 day interval
t2 (min)
60
50
• Plotted are RTs of all the masses
matching with 5ppm tolerance
40
30
y = 0.8988x + 3.704
R2 = 0.7774
20
• Correlation coefficient of linear
least squares fit is only 0.7774,
which is bad
10
20
25
30
35
40
45
50
55
60
65
70
t1 (min)
RT correlation of the longest common
subsequence
70
• Correlation coefficient of
linear least squares fit is
0.9996, which means we have
an almost perfectly linear
correlation between 2 datasets
t2 (min)
60
50
40
30
y = 0.9838x + 0.8356
R2 = 0.9996
20
10
20
25
30
35
40
45
50
55
t1 (min)
60
65
70
The total number of identified proteins in the database during
its creation/filling stage. Vertical blue arrows show steps of
equal protein count increase, the length of horizontal arrows
parallel to the abscissa axis is proportional to the time needed
to identify an additional protein.
700
600
500
400
300
200
100
0
0
20
40 of LC-MS
60 runs
Number
80
100
120
Smokers vs. non-smokers urine proteome
Current statistics of urinary proteome database
233 LC-MS (liquid chromatography coupled with mass
spectrometry) runs totally:
102 with samples from smokers,
131 with samples from non-smokers.
Using all peptides
Peptides
Proteins
Non-smokers
2527
762
Smokers
1893
627
Total
2758
840
Using all peptides
Peptides
Non-smokers
2527
Smokers
1893
Total
2758
Proteins
762
627
840
Peptides
865
1662
40%
Proteins
231
213
549
35%
78
Using all peptides
Odd
Even
Non-smokers
Peptides
2232
2306
2535
Proteins
445
467
506
Peptides
229
Proteins
2003
20%
303
49
406
21%
61
Using all peptides
Selection1
Selection2
Smokers
Peptides
1723
1588
1894
Proteins
365
337
400
Peptides
171
Proteins
1417
25%
306
63
302
25%
35
Differences in the numbers of observed proteins in urine
of smokers and nonsmokers participating in particular
biological process
Transport, homophilic cell adhesion, lipid
metabolic process, inflammatory response,
innate immune response, epidermis
development, defense response
!
! !
!
!
!
!