Supplementary - University of Oxford

Structure, Volume 23
Supplemental Information
Collision Cross Sections for Structural Proteomics
Erik G. Marklund, Matteo T.
Baldwin, and Justin L.P. Benesch
Degiacomi,
Carol
V.
Robinson,
Andrew
J.
Supplemental*Information*for*
Collision'cross*sections'for'structural'proteomics'
"
Erik"G."Marklund,"Matteo"T."Degiacomi,"
Carol"V."Robinson,"Andrew"J."Baldwin*,"Justin"L.P."Benesch*"
Department of Chemistry, Physical & Theoretical Chemistry Laboratory, University of
Oxford, South Parks Road, Oxford, Oxfordshire, OX1 3QZ, U.K.
"
"
*Correspondence"to"justin.benesch@chem.ox.ac.uk,"andrew.baldwin@chem.ox.ac.uk"
"
Contains: Supplemental Data (Tables S1-2, Supplemental Figures S1-3), Supplemental
Experimental Procedures, Supplemental References
"
!
1"
"
SUPPLEMENTAL'DATA'
'
(!"#)
a"
b"
!! %"
!!,!"#$%& %"
!!,!"# %"
!! %"
0.843"
1.051"
1.08"
0.10"
0.49"
0.95"
"
Table' S1,' related' to' Figure' 1:' Calibration' of' IMPACT' allows' for' accurate'
calculations.'a"and"b"are"parameters"for"calibration"of"IMPACT"against"TJM"obtained"by"
fitting" the" power" law"Ω !"# = !Ω!"#$%& ! "to" the" calculated" data" (Fig" 1B," Supplementary"
Figure" 1A)." !!" is" the" standard" deviation" of" the" observed" distribution" of" relative" error"
between" the" calibrated" IMPACT" and" TJM," which" includes" both" systematic" differences"
between" the" methods" and" statistical" uncertainty" from" the" MC" integration." !IMPACT" and"
!TJM"are"the"statistical"uncertainties"for"IMPACT"and"TJM"expressed"as"relative"standard"
deviations." From" these" quantities" we" can" estimate" the" discrepancy" between" TJM" and"
IMPACT"to"be"!! = 0.95%"in"the"limit"of"infinite"precision"(Supplementary"Experimental"
Procedures),"meaning"that"IMPACT"typically"can"predict"TJM"results"to"within"less"than"
1%"for"folded"proteins."
"
2"
"
B
Collision cross-section (TJM) / Å
2
Normalised wall time
Normalised standard error
Collision cross-section (IMPACT) / Å
2
A
Number of interleaved calculations
'
Figure'S1,'related'to'Figure'1:'Accuracy,'robustness,'and'wall'time.'A:"Similar"to"Fig."
1B"but"including"a"linear"fit"for"comparison."Both"the"linear"and"power"law"fits"contain"
the"same"number"of"degrees"of"freedom"(two),"The"fitted"power"law"yields"lower"errors"
than"the"linear"fit,"and"is"therefore"a"better"choice"for"calibrating"PA"against"TJM."B:"For"
10,000" probes" fired" in" total" per" run," distributed" over" N" interleaved" calculations," the"
relative" standard" error" of" mean" sr" normalised" by" the" target" precision" !" (blue)" and" the"
wall"time"required"for"convergence"normalised"by"the"rightmost"data"point"(red)."For"N"
<" 16," and" particularly" for" low" !," sr" is" underestimated" by" IMPACT" and" the" calculations"
stop"before"reaching"the"target"precision,"which"is"also"reflected"by"a"shorter"wall"time"
for" low" N." For" N" ≳" 32" however" IMPACT" correctly" estimates" the" standard" error" of" the"
calculations."
"
"
'
'
3"
"
Assembly'
PDB'code'
m'/'MDa'
Ω'/'Å2'
#atoms'
topt'(t0)'/'s'
70S"ribosome"
2WDK–N"
2.4"
40,975"
148,020"
1.3"(15.3)"
STNV"
2BUK*"
1.8"
33,627"
182,258"
1.2"(6.3)"
The"vault"
2ZUO,"2ZV4–5"
3.5"
285,372"
483,912"
2.9"(50.4)"
Adenovirus"
1VSZ"
89"
1,076,957"
5,543,280"
10.7"(47.4)"
"
Table'S2,'related'to'Figure'2:'Large'protein'complexes'used'for'assessing'the'effect'
of'octrees'on'IMPACT’s'performance."Four"large"macromolecular"assemblies"that,"due"
to"their"size,"are"challenging"targets"for"CCS"calculations"were"used"to"assess"the"effect"of"
octrees." topt" and" t0" are" the" wall" times" for" calculating" the" CCS" to" 1%" precision" using"
octrees" with" optimum" depth" and" without" octrees," respectively." *The" model" used"
contains"the"RNA"genome"modelled"into"the"crystal"structure"of"the"capsid"together"with"
water"and"salt"(Larsson,"2012).'
'
'
4"
"
'
A
B
C
D
2
1
0
'
Particles
Figure'S2,'related'to'Figure'2:'Flowchart'of'the'algorithm'and'heuristics'for'setting'
optimal' octree' depth.' A:"Overall"process"flow"of"IMPACT."B:"The"process"of"assessing"
convergence"from"several"independent"calculations."C:"The"MC"integration"with"octrees."
Incoming" probe" particles" are" first" tested" for" collision" with" the" bounding" box" that"
surrounds" the" structure," i.e." octree" level" d" =" 0," before" it" is" tested" for" collision" with" the"
box’" contents." At" any" level" of" the" octree" a" box" can" be" empty," contain" smaller" boxes"
representing"the"next"level"of"the"octree,"or"contain"atoms."IMPACT"recursively"traverses"
the" octree" to" find" atomhcontaining" boxes" that" intersect" the" probe" trajectory," and" only"
those" atoms" are" ever" tested" for" collision" with" the" probe." D:" The" optimal" octree" depth"
level,"based"on"wall"time,"increases"with"the"number"of"atoms"in"the"target."IMPACT"uses"
this"empirical"relation"to"choose"the"optimal"depth"limit"D"at"runtime."The"boundaries"
between"regions"in"which"IMPACT"choses"D"="0,"1,"and"2"are"indicated"with"dashed"lines."
(See"also"Fig"2)."
'
5"
"
"
10
2
10
Fisher et al.
3
0.48 Da/Å
3
0.37 Da/Å
PiQSi
0 Intensity 1
1.0
3
2
10
3
10
4
10
5
10
6
Mass / Da
10
7
10
8
0
0.5
Best fit
Sphere
10
3
10
4
10
5
Mass / Da
10
6
0
10
-3
4
1.0
10
Fisher et al.
3
0.48 Da/Å
3
0.37 Da/Å
PDBe
0 Intensity 1
0.5
5
2
-3
10
1 Counts 10
Density / Da Å
10
C
6
Density / Da Å
B
Collision cross-section / Å
2
A
10
3
10
4
10
5
Mass / Da
10
6
'
Figure' S3,' related' to' Figure' 3:' Normalisation' of' cross' sections' links' them' to'
effective'gas*phase'density.'A:"CCS"as"a"function"of"mass"for"all"structures"in"the"PiQSi"
database."The"fitted"line"(black)"is"also"the"normalisation"used"for"the"shape"factor"in"Fig."
3C."The"red"line"represents"the"CCS"for"perfect"spheres"as"a"function"of"mass,"assuming"
the"same"density"as"for"proteins"(Fischer"et"al.,"2004)"and"a"1.4"Å"radius"for"the"buffer"
gas"particles."B' and' C:"Effective"gashphase"densities"!eff"for"all"biological"assemblies"in"
PDBe"and"PiQSi,"respectively,"determined"by"assuming"a"spherical"shape"for"the"proteins"
(Bush"et"al.,"2010;"Kaddis"et"al.,"2007;"Ruotolo"et"al.,"2008)."For"both"datasets"!eff"closely"
matches" those" derived" from" IMhMS" (Bush" et" al.," 2010;" Kaddis" et" al.," 2007)," but" are"
considerably" smaller" than" the" density" inferred" from" crystal" structures" (Fischer) et) al.,)
2004))even)though)both)sets)of)structures)used)here)were)largely)determined)through)
X?ray)crystallography.)This)implies)that)the)effective)gas?phase)density)is)artificially)low)
because)of)the)assumption)of)spherical)proteins.)
)
)
6"
"
)
SUPPLEMENTAL'EXPERIMENTAL'PROCEDURES'
Projection'approximation'and'the'formulation'of'a'stopping'criterion'
The" PA" equates" the" CCS," Ω," to" the" rotationally" averaged" projected" area" of" the" target,"
taking"into"account"the"radius"of"the"bufferhgas"particles."This"area"is"typically"calculated"
through"MC"integration"where"probes"representing"the"buffer"gas"are"fired"towards"the"
randomly"rotated"target."If"we"let"n"probes"be"fired"at"random"within"a"region"of"area"A"
that" encloses" the" projection" of" the" target," and" with" each" probe" hitting" the" target" with"
probability" ph=Ω/A," then" we" can" make" an" estimate" Ω’" of" the" CCS" from" the" fraction" of"
probes"that"hit:"
"
ℎ
Ω = !!! = lim ! "
!→! !
Eqn"S1"
The"required"size"of"n"for"Ω’"to"converge"to" Ω"within"a"relative"error" 𝜏"is"not"known"a"
priori." Firing" additional" probes" beyond" what" is" needed" for" results" to" converge" is"
however" a" waste" of" computing" time." Here," we" measure" consensus" among" several"
independent"replica"CCS"calculations"run"concurrently"(Williams"et"al.,"2009)."We"define"
the"consensus"as"the"relative"standard"error"of"the"mean,"sr,"which"is"evaluated"until"it"is"
below" a" prehdefined" threshold" 𝜏." This" has" the" advantage" of" connecting" the" stopping"
criterion"to"a"measure"of"precision.""
Relying"on"consensus"runs"the"risk"of"stopping"the"calculations"prematurely,"as"for"small"
n" there" is" a" chance" that" the" N" independent" estimates" are" close" to" their" common" mean"
⟨Ω’⟩,"yet"still"far"from"the"true"Ω."Empirically,"with"N"≳"32"IMPACT"was"found"to"avoid"
7"
"
stopping" prematurely" and" robustly" estimated" the" standard" error" of" mean"
(Supplementary" Figure" S2B)." To" further" mitigate" this" effect," IMPACT" fires" a" tuneable"
minimum"number"of"probes"before"assessing"convergence."
"
Proof'that'convergence'depends'on'number'of'probes'fired'
Any"sequence"of"probes"fired"at"the"target"from"a"certain"direction"is"a"Bernoulli"process."
As"such"the"number"of"hits"h"is"binomially"distributed,"and"after"n"probes"the"variance"of"
h"is"!! ! = !!! 1 − !! ."By"combining"this"with"Eqn"S1"we"get"the"variance"for"Ω,"which"
is"! ! = !"! !
"
!
= AΩ − Ω!
!."Hence"the"relative"standard"deviation"is"given"by:"
!! = ! Ω =
! − Ω !Ω"
Eqn"S2"
For"large"n"the"binomial"distribution"can"be"approximated"by"a"normal"distribution,"with"
mean"Ω"and"standard"deviation"𝜎."We"scale"and"shift"the"distribution"to"have"mean"0"and"
standard"deviation"!r"in"order"to"analyse"the"likelihood"of"having"the"Ω’"correct"within"a"
relative"precision"𝜏."The"total"probability"p"that"the"estimate"is"within"Ω𝜏"of"Ω"is"obtained"
by" integrating" the" scaled" and" shifted" distribution" from" –𝜏" to" 𝜏," which" yields" the"
probability"for"a"single"estimate"to"be"correct"within"the"specified"precision:"
""
! = erf !
Eqn"S3"
!! 2 "
To"consider"the"average"of"N"separate"estimates,"we"define"ℵ ≡ !","i.e."the"total"number"
of" probes" fired" in" all" N" independent" calculations" combined," and" note," with" the" help" of"
Eqn" S2," that" the" relative" standard" error" of" mean" for" Ω’" is"!! = !!
!=
A − Ω ℵΩ."
Analogous"to"Eqn"S3,"the"probability"ps"of"meeting"the"stopping"criterion"then"becomes"
8"
"
"
!! = !"#
!
!!
ℵΩ
= !"# !
2 !−Ω
2
Eqn"S4"
"
and" depends" on" the" total" number" of" probes" that" have" been" fired" and" is," for"! ≳ 32"
(Supplemental"Figure"S1B),"independent"of"the"number"of"separate"estimates"for"Ω."
"
Benchmarking'the'robustness'of'IMPACT'
To"assess"how"IMPACT"performance"and"precision"are"affected"by"various"parameters,"
we" calculated" the" CCS" of" the" asymmetric" unit" from" a" crystal" structure" of" the" Norwalk"
virus" capsid" (NV," PDB" code" 1IHM)" for" all" combinations" of" N," 𝜏," and" maximum" octree"
depth" D." The" parameter" ranges" were" N" ∈" {2," 4," 8," 16," 32," 64," 128}," 𝜏 ∈" {0.01," 0.005,"
0.001}," and" D" ∈" [0,4]" (Figure" 2B," Supplementary" Figure" S2AhC)." We" repeated" the"
calculations" 200" times" for" each" parameter" combination." To" address" the" fraction" of" the"
wall"time"spent"in"different"parts"of"IMPACT,"we"calculated"the"CCS"of"NV"another"1,000"
times"with"timing"functions"turned"on."For"NV,"the"wall"time"to"calculate"the"CCS"without"
octrees"was"0.18"s;"with"6%"of"the"time"spend"on"I/O"operations,"11%"on"rotations,"and"
83%"on"the"MC"bombardment."With"octrees"(D"="2)"the"wall"time"more"than"halved"to"
0.071" s," with" 17%," 6%," and" 77%" spent" on" I/O," octree" management," and" MC"
bombardment"(including"rotations)."
"
Benchmarking'the'accuracy'of'IMPACT'
9"
"
We" ran" IMPACT" on" a" data" set" containing" 428" protein" structures" used" previously" for"
benchmarking"CCS"calculations"(Bleiholder"et"al.,"2011)."This"was"extended"with"a"set"of"
proteins" that" is" commonly" used" for" calibration" of" travellinghwave" IM" instrumentation"
(Bush"et"al.,"2010)"and"a"complete"satellite"tobacco"necrosis"virus"(STNV)"virus"structure"
(Supplementary"Table"S1)"in"order"to"encompass"a"larger"mass"range."Some"structures"
in" the" data" set" consisted" of" nonhcontiguous" clusters" of" atoms," which" is" a" highly"
unrealistic"scenario"in"an"ion"mobility"(IM)"experiment."Consequently,"to"filter"out"such"
structure"models,"we"only"consider"structures"that"consisted"of"a"single"cluster"with"all"
atoms"within"4"Å"from"another"atom."This"excluded"six"structures"from"the"initial"set"of"
428" (Bleiholder" et" al.," 2011):" PDB" codes" 1GO6," 1H1N," 1J0P," 1M0M," 1P7W," and" 1W3M."
The" target" precision" in" the" IMPACT" calculations" was" set" to" 0.1%" to" allow" for" more"
confident"separation"of"accuracy"from"precision"(see"below)."
For" each" structure," in" order" to" compare" the" IMPACT" results" with" TJM," we" invoked"
MOBCAL"at"least"50"times,"except"for"STNV,"which"needed"only"20"times"to"converge"to"
sufficient"precision."For"structures"where"the"standard"error"of"the"mean"CCS"was"larger"
than"0.5%,"we"rehran"them"with"MOBCAL"again"in"batches"of"50"runs"until"the"standard"
error"for"each"structure"was"below"0.5%."We"initiated"each"MOBCAL"run"with"a"different"
random" seed" and" the" results" pooled" to" give" average" TJM" and" PA" CCS" values" with"
associated" error" estimates." To" validate" IMPACT" we" also" recalculated" the" CCSs" for" this"
dataset"using"the"same"atomic"radii"as"in"the"PA"implementation"in"MOBCAL,"revealing"a"
good"match"between"the"MOBCAL"and"IMPACT,"with""a"correlation"coefficient"R2>0.999"
and"an"RMSD"of"0.2%."
"
Deconvoluting'accuracy'and'precision'from'an'observed'error'distribution'
10"
"
We" define" the" relative" error" that" describes" the" observed" discrepancy" between" a" CCS"
calculated"with"IMPACT"(calibrated)"and"TJM"as:"
"
!≡
Ω!"#$%& − Ω !"#
Ω!"#$%&
=
− 1"
Ω !"#
Ω !"#
Eqn"S5"
Our"calibration"of"IMPACT"ensures"that)Ω!"#$%& Ω !"# ≈ 1,)and"hence) ! ≈ 0.)The)error)
is) distributed) around) 0) however) (Fig" 1B)," which" can" be" attributed" to" two" distinct"
sources" of" error:" underlying" differences" between" TJM" and" IMPACT," and" random" error"
from"the"MC"integration."The"former"determines"the"accuracy"of"IMPACT,"whereas"the"
latter"is"the"combined"precision"of"the"TJM"and"IMPACT"calculations."Their"variances,"!! "
and"!! ," subscripts" denoting" accuracy" and" precision," combine" into" the" variance" of" the"
relative"error:"
!! ! = !! ! + !! ! "
"
Eqn"S6"
Firing" a" very" large" number" of" probes" would" make"!! "vanishingly" small," but" the" timeh
consuming"TJM"calculations"make"this"approach"impractical."We"can"however"use"error"
propagation"of"Eqn"S5"to"obtain"!! ,"and"subsequently"get"!! "from"Eqn"S6."The"first"term"
is"a"quotient,"for"which"the"relative"variance"is"equal"to"the"sum"of"the"relative"variances"
for" the" nominator" and" denominator," i.e.) !!
!
! !
!
= !! !
!
+ !! ! ! ." The" last"
term"of"Eqn"S5"is"unity"and"has"therefore"no"impact"on)!! ! ."Hence)we)have)
"
!! !
Ω!"#$%& Ω !"#
!
!!"#$%&
=
Ω!"#$%&
!
!!"#
+
Ω !"#
!
."
Eqn"S7"
The"powerhlaw"calibration"does"not"preserve"the"precision"that"was"imposed"on"the"raw"
PA" calculations" with" IMPACT," but" the" error" propagation" yields)!!"#$%& Ω!"#$%& =
11"
"
(!"#)
(!"#)
!!!"#$%& Ω!"#$%& .)Combining"this"with"Eqs"S6"and"S7,"and"noting"that"Ω!"#$%& Ω !"# ≈
1)to"within"a"few"per"cent,"we"get"an"expression"for"assessing"the"accuracy"of"IMPACT"
with"regards"to"TJM"from"an"observed"distribution"of"relative"errors:"
"
(!"#)
!! =
!! ! −
!"!"#$%&
(!"#)
Ω!"#$%&
!
!!"#
−
Ω !"#
!
."
Eqn"S8"
If"CCSs"calculated"by"using"IMPACT"are"to"be"employed"in"structure"modelling,""this"
calculation"error"should"therefore"be"combined"with"the"experimental"uncertainty"to"
establish"an"acceptable"CCS"difference"between"a"model"and"experiment,"or"restraining"
potential."
"
Decoupling'the'evaluation'of'the'rotation'matrix'from'the'rotation'of'atoms'
Affine" transformations," such" as" rotation," of" an" object" consisting" of" many" vertices" are"
dramatically"faster"if"the"transformation"matrix"R"is"evaluated"prior"to"its"application"to"
object"coordinates."For"3D"rotation"around"the"y"and"z"axes"only,"
"
"
cos ! cos !
R = sin ! cos !
− sin !
− sin !
cos !
0
cos ! sin !
sin ! sin ! ."
cos !
Eqn"S9"
"
When"R"is"applied"to"the"coordinates"r"of"an"atom"(in"the"case"of"molecular"objects)"to"
generate"the"rotated"coordinates"r’,"i.e."r’T=Rr,"the"matrixhvector"product"comprises"the"
evaluation"of"5"trigonometric"functions,"5"multiplications,"and"2"additions"each"for"x’"and"
12"
"
y’."The"rotated"z’"coordinate"is"irrelevant"for"the"PA,"and"its"evaluations"can"be"omitted"
from"the"calculations."Sine"and"cosine"functions"take">100"clock"cycles"to"complete:"by"
contrast,"multiplications"and"additions"take"≈2."If"the"trigonometric"elements"in"R"have"
already" been" evaluated," its" application" of" the" rotation" matrix" only" comprises" the"
evaluation" of" 3" multiplications" and" 2" additions" for" each" x’" and" y’." When" thousands" of"
coordinates" are" rotated," this" order" of" evaluation" makes" subsequent" rotation" of" the"
coordinates"considerably"faster"than"if"the"elements"of"R"are"evaluated"again"for"every"
coordinate."As"such,"prehevaluation"of"R"yields"rotations"that"take"about"20"clock"cycles"
per" coordinate," compared" to" about" 2,000" clock" cycles" per" coordinate." Things" like"
compiler"optimizations"make"the"gain"from"prehcalculating"the"rotation"matrix"difficult"
to"determine"precisely,"but"the"effect"is"demonstrably"significant"(Williams"et"al.,"2009)."
"
The'effect'of'octrees'on'the'speed'of'the'calculation'
To"assess"the"effect"of"octrees"(Fig"2A,"Supplementary"Figure"S2AhC),"we"processed"the"
PiQSi" data" set," comprising" 1,755" manually" curated" structures" from" the" PDB" (Levy,"
2007),"with"IMPACT."100,000"probes"were"fired"at"every"structure"for"a"range"of"octree"
depths"! ∈ 0,3 ."To"see"the"effect"of"changing"D"with"larger"structures"we"selected"a"set"
of" challenging" high" molecular" weight" complexes;" the" Thermus" thermophilus" 70S"
ribosome" (Voorhees" et" al.," 2009)," Satellite" Tobacco" Necrosis" Virus" (STNV)" with" the"
genome"modelled"into"the"capsid"(Jones"and"Liljas,"1984;"Larsson,"2012),"an"intact"vault"
particle" from" rat" liver" (Tanaka" et" al.," 2009)," and" the" Adenovirus" capsid" (Reddy" et" al.,"
2010)" (Table" S1," Fig" 2B);" which" we" ran" through" IMPACT" at"! ∈ 0,5 ," 100" times" per"
value"of"D.""
13"
"
Overall,"the"analysis"revealed"a"clear"dependence"on"the"number"of"atoms"with"optimum"
octree"depth"(Supplemental"Figure"S2D)."Above"approximately"2×103"atoms"and"1×104"
atoms," Dopt=1" and" Dopt=2," respectively." For" higher" D" the" data" is" too" sparse" for" reliable"
estimation" of" Dopt." Based" on" this" empirical" relation" Dopt" is" automatically" selected" when"
IMPACT"is"run."
"
Benchmarking'computational'performance'of'IMPACT'
In"another"batch"of"calculations,"we"used"IMPACT,"CCSCalc,"and"MOBCAL"to"compute"the"
CCS"of"NV"in"order"to"benchmark"IMPACT’s"performance."To"estimate"the"average"wall"
time" we" ran" IMPACT" 1000" times." To" circumvent" CCSCalc’s" fixed" random" seed," which"
yields" exactly" the" same" CCS" and" approximately" the" same" wall" time" on" every" run" for" a"
given" structure," we" generated" a" series" of" NV" structures" by" rotating" NV" 5" degrees" at" a"
time"around"the"[1,1,1]"axis,"the"rotation"angle"ranging"from"0"to"180."We"ran"CCSCalc"
from" the" command" line" with" the" convergence" threshold" set" to" 1%" for" each" structure,"
allowing" for" the" determination" of" an" average" wall" time." Note" that" CCSCalc" employs" a"
different" convergence" criterion," which" is" not" based" on" the" standard" error" of" mean."
However," emulating" CCSCalc’s" convergence" criterion" with" IMPACT" yielded" a" similar"
difference"in"performance."Using"MOBCAL,"we"generated"500"separate"TJM"calculations,"
with" 1000" trajectories" for" each," and" pooled" them" into" block" averages." We" varied" the"
block" size" until" the" standard" error" of" the" mean" CCS" within" the" blocks" was" below" 1%."
From" the" resulting" block" size" and" the" time" required" for" a" single" TJM" calculation" we"
estimated"the"wall"time"for"TJM."Analogously,"we"also"pooled"100"separate"exact"hardh
sphere"scattering"(EHSS)"and"PA"calculations"with"25,000"trajectories"per"run"into"block"
14"
"
averages."As"such,"despite"a"rugged"CCS"distribution"for"TJM"(Bleiholder"et"al.,"2011),"we"
could" still" extract" a" converged" standard" error" of" mean" and" therefore" also" a" good"
estimate"for"the"time"to"reach"1%"error"for"these"methods."The"resulting"average"wall"
times"are"shown"in"Fig."2C."
"
Calculating'the'CCS'of'all'biological'assemblies'in'the'PDBe'and'PiQSi'
We" processed" all" biological" units" in" the" PDBe" using" IMPACT," with" automatic" octree"
depth" determination." The" dataset" comprised" 317,424" files" containing" 266,527" models,"
of" which" 266,516" were" at" least" part" biomolecular." When" a" file" contained" multiple"
models," i.e." for" NMR" ensembles," we" calculated" the" CCS" of" individual" molecules"
separately."We" processed" the" PiQSi" data" set" in" an" identical" manner," with" the" structure"
2ULL" omitted" from" subsequent" analysis" since" it" was" found" to" contain" multiple"
superimposed" structures" not" separated" into" individual" models." We" calculated" the"
masses" of" every" biological" assembly" in" the" PDBe" and" PiQSi" as" the" sum" of" all" residue"
masses" based" on" the" Chemical" Compound" Directory," corrected" for" the" mass" loss"
corresponding" to" one" water" molecule" per" peptide" or" phosphodiester" bond" formed"
during"polymerisation."
'
Calculating'an'effective'density'from'a'collision'cross*section'
The" solution" density" of" proteins," ρs," can" be" inferred" from" crystal" structures" through"
Connolly"volumes"(Quillin"and"Matthews,"2000)"or"Voronoi"diagrams"(Tsai"et"al.,"1999)."
The"gashphase"density"on"the"other"hand"is"sometimes"estimated"under"the"assumption"
15"
"
that"proteins"are"approximately"spherical"(Bush"et"al.,"2010;"Kaddis"et"al.,"2007;"Ruotolo"
et"al.,"2008),"relating"the"CCS"to"the"radius"as"Ω = !" ! ."Since"! = 4!! ! 3"and"! = ! !,"
we"can"express"the"CCS"as:"
"
Ω = ! 3! !4!
!
Eqn"S10"
!"
Thus"an"effective"density,"!eff,)can)be)inferred)from)Ω)and) m)under)this)assumption.)The)
CCS)for)a)perfect)sphere)as)formulated)above)does)not)take)into)account)the)finite)size)of)
the) buffer) gas) particles) however.) If) the) latter) have) a) radius) rg,) we) get)
Ω = ! 3! !4!
!
!
!
+ !! ,) yielding) CCS) that) are) more) easily) compared) with) those)
calculated)for)atomistic)structures,)especially)for)structures)with)low)mass."
The"effective"density"inferred"via"Eqn"S10"from"experimental"CCS"has"been"reported"to"
be" approximately" a" factor" two" lower" than" the" density" !s" for" crystal" structures." To" see"
whether" this" difference" arises" from" the" assumption" of" spherical" proteins" we" inspected"
the" quotient"Ω Ω!"!!"! ," where"Ω"is" the" CCS" calculated" from" a" native" structure," and"
Ω!"!!"! "is"the"CCS"we"expect"for"a"sphere"of"the"same"mass"and"a"density"!s,"taking"into"
account" the" buffer" gas" radius." Using" Eqn" S10," we" get"Ω Ω!"!!"! = !! !!""
!
!
," which"
we"rearrange"into:"
"
!!"" =
!!
Ω Ω!"!!"!
!
!
)
Eqn"S11"
."
Eqn" S11" allows" us" to" emulate" how"!!"" "was" inferred" from" the" CCS" under" a" spherical"
assumption"(Bush"et"al.,"2010;"Kaddis"et"al.,"2007;"Ruotolo"et"al.,"2008).""
"
16"
"
Calculating'the'collision'cross*section'of'EM'density'maps.'
As"a"test"case"for"computing"the"CCS"of"nonhatomistic"structural"data"with"IMPACT,"we"
chose"the"density"map"of"GroEL"solved"by"single"particle"reconstruction"at"a"resolution"
of"5.4"Å"(EMD"code"1457)"(Stagg"et"al.,"2008)"(Figure"4A)."We"constructed"bead"models"
from"the"map"by"replacing"all"voxels"with"an"electron"density"above"a"prehset"threshold"
by" spheres" with" volumes" corresponding" to" the" EM" grid" spacing." We" performed" this"
process"at"different"intensity"thresholds"ranging"from"0"to"10.5"to"generate"500"models,"
for"each"of"which"the"CCS"was"calculated"with"IMPACT."
"
Calculating'the'collision'cross*section'of'SAXS'bead'models'
We"generated"SAXS"bead"models"using"the"ATSAS"package"(Svergun"et"al.,"1995;"Volkov"
and"Svergun,"2003)"as"follows."First,"we"simulated"SAXS"curve"from"a"crystal"structure"
(PDB" code" 1OEL)" using" CRYSOL." Then" we" generated" a" distance" distribution" from" the"
SAXS" curve" with" GNOM" followed" by" construction" of" 10" separate" ab<initio" models" with"
GASBORI." We" superimposed" these" models" and" averaged" them" with" DAMSUP" and"
DAMAVER,"and"filtered"the"resulting"average"model"using"100"different"target"volumes"
ranging"from"748,200"Å3"to"1,870,500"Å3"using"DAMFILT.""We"used"IMPACT"to"calculate"
the"CCS"for"each"of"these"models,"setting"the"bead"radius"to"4.5"Å,"which"corresponds"to"
half"the"bead"spacing"of"the"filtered"models."
"
Calculating'the'collision'cross*sections'of'NMR'ensembles'
17"
"
We"analysed"the"NMR"ensembles"2K39"(Lange"et"al.,"2008)"and"2KOX"(Bryn"Fenwick"et"
al.,"2011)"of"ubiquitin"using"default"parameters,"yielding"one"CCS"distribution"for"each"of"
the" two" ensembles." We" used" the" fullhwidthhathhalfhmaximum" of" an" experimental" CCS"
distribution" for" the" 7+" charge" state" of" ubiquitin" (Wyttenbach" and" Bowers," 2011)" to"
produce"a"Gaussian"distribution"representing"the"experimentally"observed"peak"width."
We" generated" another" Gaussian" using" a" resolving" power" of" 130," for" a" single"
conformation" (Koeniger" et" al.," 2006)." To" facilitate" comparison" of" peak" shapes," we"
normalised"each"distribution"by"their"respective"average"CCS"(Fig"4B)."
"
Calculating'the'collision'cross*sections'of'molecular*dynamics'trajectories'
To"test"whether"IMPACT"would"be"capable"of"calculating"CCS"onhthehfly"throughout"an"
MD" simulation" we" analysed" a" 15" ns" trajectory" from" a" simulation" of" lysozyme" in"vacuo"
(Marklund" et" al.," 2009)" by" calculating" the" CCS" with" IMPACT" at" 10" ps" intervals" to" a"
precision"of"𝜏"="0.5"%"(Figure"4ChD)."To"see"if"the"variations"in"CCS"were"reflected"in"the"
radius" of" gyration," RG," for" the" protein," we" calculated" RG" for" each" conformation" in" the"
trajectory"using"the"Gromacs"simulation"package"(Pronk"et"al.,"2013)."
"
"
18"
"
SUPPLEMENTAL'REFERENCES'
Jones," T.A.," and" Liljas," L." (1984)." Structure" of" satellite" tobacco" necrosis" virus" after"
crystallographic"refinement"at"2.5"A"resolution."J"Mol"Biol"177,"735h767."
Larsson," D." (2012)." Exploring" the" molecular" dynamics" of" proteins" and" viruses" (Acta"
Universitatis"Upsaliensis)."
Pronk," S.," Pall," S.," Schulz," R.," Larsson," P.," Bjelkmar," P.," Apostolov," R.," Shirts," M.R.," Smith,"
J.C.,"Kasson,"P.M.,"van"der"Spoel,"D.,"et"al."(2013)."GROMACS"4.5:"a"highhthroughput"and"
highly"parallel"open"source"molecular"simulation"toolkit."Bioinformatics"29,"845h854."
Quillin,"M.L.,"and"Matthews,"B.W."(2000)."Accurate"calculation"of"the"density"of"proteins."
Acta"Crystallogr"Sect"D:"Biol"Crystallogr"56,"791h794."
Reddy,"V.S.,"Natchiar,"S.K.,"Stewart,"P.L.,"and"Nemerow,"G.R."(2010)."Crystal"structure"of"
human"adenovirus"at"3.5"Å"resolution."Science"329,"1071h1075."
Stagg," S.M.," Lander," G.C.," Quispe," J.," Voss," N.R.," Cheng," A.," Bradlow," H.," Bradlow," S.,"
Carragher," B.," and" Potter," C.S." (2008)." A" testhbed" for" optimizing" highhresolution" single"
particle"reconstructions."J"Struct"Biol"163,"29h39."
Tanaka,"H.,"Kato,"K.,"Yamashita,"E.,"Sumizawa,"T.,"Zhou,"Y.,"Yao,"M.,"Iwasaki,"K.,"Yoshimura,"
M.,"and"Tsukihara,"T."(2009)."The"structure"of"rat"liver"vault"at"3.5"Å"resolution."Science"
323,"384h388."
Tsai,"J.,"Taylor,"R.,"Chothia,"C.,"and"Gerstein,"M."(1999)."The"packing"density"in"proteins:"
standard"radii"and"volumes."J"Mol"Biol"290,"253h266."
Voorhees,"R.M.,"Weixlbaumer,"A.,"Loakes,"D.,"Kelley,"A.C.,"and"Ramakrishnan,"V."(2009)."
Insights"into"substrate"stabilization"from"snapshots"of"the"peptidyl"transferase"center"of"
the"intact"70S"ribosome."Nat"Struct"Mol"Biol"16,"528h533."
19"
"