MS-DIAL tutorial - Platform for RIKEN Metabolomics

MS-DIAL Tutorial
Last edited in 2015/5/01
ABSTRACT
Novel mass spectrometers perform ultra-fast, accurate data acquisition on the MS and
MS/MS levels without selecting specific precursor ions (such as SWATH approaches), or by
integrating different collision energy levels in MS/MS spectral acquisition (such as MSE or all-ions
approaches). Such data-independent MS/MS approaches provide richer information contents
compared to classic data-dependent MS/MS experiments.
MS-DIAL aims to provide total solutions to not only data-dependent MS/MS but also data
“independent” MS/MS experiments in metabolomics, lipidomics, and proteomics research. It
features (1) spectral de-convolution for data-independent MS/MS, (2) streamlined criteria for peak
identification, (3) support of all data processing steps from raw data import to statistical analysis, and
(4) user-friendly graphic user interface.
MS-DIAL has been developed as the collaborative work between Prof. Masanori Arita team
(RIKEN, Reifycs Inc.) and Prof. Oliver Fiehn team (UC Davis) supported by the JST/NSF SICORP
“Metabolomics for the low carbon society” project.
Hiroshi Tsugawa
RIKEN Center for Sustainable Resource Science
hiroshi.tsugawa@riken.jp
MS-DIAL screenshot
Table of Contents
Required software programs and files ................................................................................................... 4
Downloading the ABF converter from Reifycs Inc. ............................................................................ 5
File conversion ................................................................................................................................... 6
The result of ABF converter: Centroid or Profile? ............................................................................. 7
MSP format MS/MS library ............................................................................................................... 8
Text format retention time and accurate mass library for post identification ................................ 10
Starting MS-DIAL ............................................................................................................................... 11
Starting up your project ................................................................................................................... 12
Importing ABF files .......................................................................................................................... 14
Setting parameters........................................................................................................................... 15
Data collection tab ........................................................................................................................ 15
Peak detection tab ........................................................................................................................ 16
Deconvolution tab ......................................................................................................................... 17
Identification tab .......................................................................................................................... 18
Adduct tab ..................................................................................................................................... 20
Alignment tab ............................................................................................................................... 21
MS-DIAL viewer .................................................................................................................................. 22
Mouse function ................................................................................................................................. 22
Peak viewer ...................................................................................................................................... 23
Display filter..................................................................................................................................... 24
Alignment viewer ............................................................................................................................. 26
MS/MS spectrum viewer .................................................................................................................. 27
Compound search................................................................................................................................. 30
Normalization and Statistical analysis ............................................................................................... 32
Menu .................................................................................................................................................... 33
File .................................................................................................................................................... 33
New project ................................................................................................................................... 33
Open project .................................................................................................................................. 33
Save project ................................................................................................................................... 34
Save parameter setting................................................................................................................. 34
Data processing ................................................................................................................................ 36
Identification .................................................................................................................................... 37
View .................................................................................................................................................. 38
Option ............................................................................................................................................... 40
Export ............................................................................................................................................... 41
Software Environments


Microsoft Windows XP, Vista, 7 or 8
.NET Framework 4.0 or later
Required software programs and files
 Reifycs Analysis Base File Converter (ABF file converter) *
Download link: http://www.reifycs.com/english/AbfConverter/index.html
 MS-DIAL
Download link: http://prime.psc.riken.jp/Metabolomics_Software/MS-DIAL/index.html


Reference library for compound identification (msp format file)
Example library link: http://prime.psc.riken.jp/Metabolomics_Software/MS-DIAL/index.html
Demonstration file
Download link: http://prime.psc.riken.jp/?action=drop_index
*MS-DIAL imports “analysis base file (abf)” format. The file converter is freely available from the
above link. ABF file converter and MS-DIAL have been tested on the MS platforms from Agilent
Technologies, AB Sciex, Bruker Daltonics, Waters, and Thermo Fisher Scientific.
*MS-DIAL have been validated as the below conditions:
Data dependent MS/MS acquisition
Agilent Technologies, AB Sciex, Bruker Daltonics, Waters, and Thermo Fisher Scientific
Data independent MS/MS acquisition
Agilent Technologies (All-ions), AB Sciex (SWATH), Waters (MSE) and Thermo Fisher Scientific
(All Ion Fragmentation) (Positive/Negative switching mode is not tested yet.)
For LECO citius
Please convert raw files to mzML via the vendor’s software. Then, convert the mzML files to ABF
files with ABF file converter.
*2015/5/1: now we are fixing the converter program for Waters-MS. The problem will be fixed as
soon as possible.
Downloading the ABF converter from Reifycs Inc.
1. Go to http://www.reifycs.com/english/AbfConverter/.
2. Check the requirements and license terms, and download the converter.
File conversion
1. Start “AnalysisBaseFileConverter.exe”.
2. Drag & drop MS vendor files into this program.
3. Click “Convert”.
4. The ABF files are generated in the same directory as the raw data files.
The result of ABF converter: Centroid or Profile?
As long as we use the default settings in each MS instrument, the ABF converter will export
the vendor's file as:
AB Sciex Q-TOF: Profile
Thermo Q-Exactive: Profile
Agilent LC-QTOF: Centroid
Bruker LC-QTOF and FT-ICR: Centroid
Waters LC-Xevo QTOF or Synapt: Centroid
mzML: depends on the export method by the ProteoWizard program etc.
If the centroid datum is stored in the vendor's raw file (for example, in .D folder), the ABF
converter tries to export the centroid datum instead of profile datum as the ABF file. However, it's
much better to check the result in MS-DIAL by the following way:
1. Try to start MS-DIAL project as 'Centroid' mode by only one file.
2. See the MS1 or MS2 spectrum in MSDIAL peakviewer and check the 'Shape' of the spectrum.
3. If the shape would be like-Profile mode, please re-starts your project as 'Profile' mode.
MSP format MS/MS library
MS-DIAL supports the MSP (http://www.nist.gov/srd/upload/NIST1a11Ver2-0Man.pdf)
format in ASCII text. In addition, the software can utilize “RETENTIONTIME:”,
“PRECURSORTYPE: ”, and “FORMULA:” information for metabolite identification (cases are
ignored). Please add retention time information by minute [min] scale if available. The adduct ion
information, i.e. here ‘Precursor type’, will be used for the adduct ion search algorithm (also see the
adduct format*.).
* Adduct ion format: [M+Na]+, [M+2H]2+, [M-2H2O+H]+, [2M+FA-H]-, etc.
1. The parentheses ‘[’ and ‘]’ must be used to bracket the ion information.
2. The char + and - must be required after ']' and the number must be written before + or -.
3. When you want to define the organic formula like C6H12O5, you have to write it without any
replicate elements or parentheses like [M+C2H5COOH-H] or [M+H+(CH3)3SiOH].
4. The beginning figure of organic formula like '2'H2O is recognized as the H2O × 2. Again, never
use 2(H2O) for that.
5. Sequential equations are acceptable: [2M+H-C6H12O5+Na]2+ (very apt.)
6. MS-DIAL accepts some abbreviations or common organic formulas for adduct types as follows.
For Acetonitrile: ACN, CH3CN
For Methanol: CH3OH
For Isopropanol: IsoProp, C3H7OH
For Dimethyl sulfoxide: DMSO
For Formic acid: FA, HCOOH
For Acetic acid: Hac, CH3COOH
For Trifluoroacetic acid: TFA, CFCOOH
Text format retention time and accurate mass library for post identification
MS-DIAL also supports the tab delimited text format library for peak identification by
means of retention time and MS1 accurate mass information. The identification process is performed
after the peak identification based on MSP format library is finished. That’s why we call this
identification processing “post identification”. First row should include a header information. First,
second, and third columns should be metabolite name, accurate mass [Da], and retention time [min],
respectively. This library can be made by Microsoft excel easily. Please add compounds information
and save as “Tab delimited text format” in Microsoft excel. This option is useful for internal standard
identifications etc. (Even if you don’t have MS/MS libraries, the peak identification based on
retention time and accurate mass is available from this option.)
Starting MS-DIAL
1. Starting up your project
2. Importing Abf files
3. Setting parameters
4. Running the software (1-2 min / sample)
*The tutorial uses 23 demonstration files and the lipid reference library which are downloadable
from the above link. The common measurement conditions of the demonstration files were as
follows.
Liquid chromatography: total 15 min run per sample with Waters Acquity UPLC CSH C18 column
(100×2.1 mm; 1.7 μm).
Mass spectrometer: SWATH method with negative ion mode.
MS1 accumulation time, 100 ms
MS2 accumulation time, 10 ms
Collision energy, 45 V
Collision energy spread, 15 V
Cycle time, 731 ms
Q1 window, 21 Da
Mass range, m/z 100-1250
Starting up your project
1. File -> new project
2. Set your project file path to the directory of your ABF files
3-1. Select your method type from either “data-dependent” or “independent.”
In the case of SWATH data-independent analysis, the experiment file can be made at PeakView
(Show->sample information). Never change its format, please. (“SCAN” and “SWATH” should be
capital.) Even if you want to use a data-independent analysis different from SWATH, please keep using
the word “SWATH” and change the m/z range information only.
3-2. (data-independent mode only) Make an experiment file* and select it. To follow this tutorial,
please select ABSciex_Experiment_Information_CSH21Da.txt.
4. Choose data type either from “profile” or “centroid.”
5. Choose ion mode either from “positive” or “negative.”
6. Choose target omics either from “metabolomics” or “lipidomics.”
If you select ‘lipidomics’ project, you do not have to prepare NIST MSP format library. What you have to
do is to select what you want to find in your data sets. On the other hand, when you select ‘metabolomics’
project, your own MSP file will be required for compound identifications.
Importing ABF files
1. Select ABF files
2. If the file is a “quality control” sample for peak alignment, then set the type as such.
Note: Please finalize your file name here, because you cannot change it later.
Setting parameters
Data collection tab
Data collection parameters: You can set analysis ranges (RT and MS1 axis). For example, if your
expected data range is 0.5-10 min for 100-1250 Da, so set the parameters.
Centroid parameters: After the peak detection algorithm is applied along the MS axis with a very
low threshold, MS-DIAL performs spectral centroiding. By default, mass spectrum of ±0.01 and
±0.1 Da range from each peak top is integrated in MS1 and MS2, respectively. MS-DIAL provides
another option to skip the peak detection before centroiding. To choose this option, tick the checkbox
of “peak detection-based”. This option integrates all spectral signal. If the accumulation time is not
enough to do the centroiding, this option is useful in capturing low-intensity spectra.
Peak detection tab
Peak detection parameters: Linear-weighted moving average is used for the peak detection by
default to accurately determine the peak left- and right edges. The recommended smoothing level is 1
or 2. MS-DIAL provides two simple thresholds: minimum values for peak width and height. Peaks
below these thresholds are ignored.
be more than 20,000.
For FT-ICR or Orbirap data, the minimum peak height should
Peak spotting parameters: The width of mass slice is set here. From our experience, 0.1 or 0.05 is
suitable for Agilent Q-TOF, AB Sciex TripleTOF, and Thermo Q-Exactive. If you already know
un-wanted m/z peaks from columns or solvent, you can specify them in the “Exclusion mass list.”
Deconvolution tab
Baseline correction and de-convolution parameters: Please do not manipulate default values unless
you fully understand the deconvolution process. The details are described in Supplementary Note 1.
If you want to remove the product ions after the focused precursor ion (recommended for
metabolomics and lipidomics), check “Exclude after precursor.”
Identification tab
Database: Set your MSP file here. (Tutorial data: LipidBlast_Nega_Algae_vs5.msp. If you select
‘metabolomics project’.) In the case that you selected ‘lipidomics’ project, please select what you
want to find in your data sets for lipid profiling.
Parameters: If you put retention time (RT) information in your MSP file, set the RT tolerance value
(default is 0.5). For example, the tutorial data (LipidBlast_Nega_Algae_vs5.msp) include the RT
information optimized for our 15 min LC method. If suitable RT information is unavailable, set the
tolerance 100 or larger (larger than your LC time).
The two mass tolerances for MS1 and MS2 are required for the compound search and they are
dependent on your instrument performance.
The cutoff of the identification score should be greater than 0.7 or 0.8.
Text file: If you want to perform “post identification” processing, set your text file here. (Tutorial
data: Lipid_Nega_IS_PostIdentification_vs1.txt)
Parameters: The meanings of parameters are the same as MSP based identification.
Advanced library search option: The options for your library search are defined here. In the current
program (2014/11/30), there are two options for the library search.
MS/MS tab:
1. Relative abundance cut off: the mass spectrum peak less than the user-defined value will not be
used for the MS/MS similarity calculation.
Post ident. Tab:
2. Only report the top hit: Since some chromatogram peaks will be annotated as the same
compound from the identification algorithm, this option allows us to determine only one
candidate from such multiple results by means of the identification score.
Adduct tab
Adduct ion setting: You can tick the adduct ions and charge values to be considered.
Alignment tab
Parameters: If you already have a suitable quality control (QC) data, typically a mixed sample data,
then specify the QC file here. All sample data will be aligned to this QC file. The RT and MS1
tolerances for peak alignment depend on your chromatographic conditions. Do not change these
parameters unless you know procedure details. If you want to remove specific peaks that are not
fully detected in the alignment, specify the peak count filter. For example, the tutorial data include at
least 4 biological replicates with the same peak information and the total number of data is 23. Then,
you may set the peak count filter as (4/23)*100 = 17.4 %. This means peaks will be removed when
they include missing values for more than 17.4%.
If you can prepare many QC sample data, tick the “QC at least filter” box. Then a peak will be
removed if it is missing in any of the QC samples. The “Gap filling option” must be always
checked.
Note: When you execute the compound identification, the representative spectra with identification
results are automatically determined from samples as spectra of the highest identification scores.
MS-DIAL viewer
Mouse function
A) Mouse right click (or hold) and move: zoom in and out
B) Mouse left click (or hold) and move: select and scroll
C) Mouse left double click: reset range and select files in the file navigator
D) Mouse wheel: zoom in and out
E) Right click: popup context menu
Peak viewer
In the main viewer of MS-DIAL, the detected peak information is shown in the center window by
mouse left double click of the file name in the File navigator. In the center window, each spot
denotes the detected peak information: blue spots describe peaks of lower abundance in the sample,
red spots describe peaks of higher abundance, and green spots describe peaks of middle abundance.
The left window displays the MS1 spectrum of the focused peak and the upper window displays the
extracted ion chromatogram of the focused peak. The right window displays the MS/MS spectrum
(blue or green) and the reference MS/MS spectrum (red). Other peak information is displayed in the
top-right of this window.
Display filter
Label: You can check the peak information such as retention time, accurate mass, metabolite name,
adduct ion name and isotope ion in the center window of MS-DIAL. Shown below are examples.
Height filter: This filter is used to check the peak abundance. Each peak is assigned a rank with
respect to its peak abundance in the focused sample.
Display filter
1. “Identified” shows only identified peaks with the MS/MS spectrum.
2. “Annotated” shows only identified peaks without the MS/MS spectrum.
3. “Molecular ion” shows de-isotoped molecular ions only.
4. “MS/MS” shows only peaks having the MS/MS spectrum.
Alignment viewer
Alignment viewer: Each spot shows an aligned spot including all retention time, accurate mass,
intensity, and MS/MS spectrum of all samples. As in the Peak viewer, red, blue, and green
“alignment” spot denotes higher, lower, and middle abundance (on average) in the alignment,
respectively. By clicking each spot, you can check all retention times and accurate masses of aligned
samples. The green spot is associated with the “detected” flag, showing whether all samples contain
the spot. The red spot is associated with the “interpolated” flag, showing whether the software
program augmented originally missing values. More details are shown in Supplementary Note 1.
MS/MS spectrum viewer
This viewer is prepared for data independent MS/MS analysis except for the Act. vs. Ref. window.
Act. vs. Ref.: The upper spectrum (blue) displays the centroided information of the MS/MS spectrum.
The lower spectrum (red) displays the reference MS/MS spectrum. In case of data independent
MS/MS analysis, de-convoluted MS/MS spectrum can be displayed by clicking the de-convolution
icon
.
MS2 Chrom.: The MS/MS chromatograms inside the sky-blue rectangle in the center window are
displayed.
This icon displays the raw MS/MS chromatograms.
This icon displays the de-convoluted MS/MS chromatograms.
This icon displays both the raw and de-convoluted MS/MS chromatograms.
Raw vs. Decon.: The upper and bottom windows display the raw and de-convoluted MS/MS
spectrum, respectively.
Rep. vs. Ref.: In combination with the alignment viewer, the window compares a representative
MS/MS spectrum and a reference MS/MS spectrum. The representative MS/MS is automatically
selected as the spectrum of the highest identification score for all samples aligned to the focused
alignment spot.
Compound search
The automatic identification process cannot escape from mis-identification. MS-DIAL
provides the user-interface so that users can manually correct the identification result. In this option,
you can customize the identification criteria into three levels: “confident”, “unsettled”, and
“unknown.” For example, in the phospholipid identification, we often determine only the cumulative
composition such as PC 36:1 without positions of acyl chains, e.g. PC(18:0/18:1). You can add
“unsettled” tag to such peaks as the signpost to comment that “we only checked the cumulative
composition”.
Information of identification is available not only in the “peak viewer” but also in the
“alignment viewer”. Although you only see representative spectra from all samples in the alignment
viewer, it is very helpful to make a data matrix and to check your peak identification result.
A) Mouse double click in each row of the library information to show identification details.
B) Add a tolerance value for identification and click the “Search” button.
C) You can select either “A: Confident”, “B: Unsettled” or “C: Unknown.”
Normalization and Statistical analysis
A) Data normalization by internal standards or LOESS algorithm
B) Principal component analysis
A) If you want to use internal standards to normalize your peak list, you have to set the IS
information in Option menu. MS-DIAL also supports LOESS and cubic spline algorithm to
normalize batch or amplitude drifts. In order to use the LOESS algorithm, you have to set
“quality control” and “analytical order” information correctly in the Option menu.
B) If you want to use the other statistics, please go to PRIMe web site:
http://prime.psc.riken.jp/Metabolomics_Software/StatisticalAnalysisOnMicrosoftExcel/index.ht
ml
Menu
File
New project
When you start a project, use this option and see the document of MS-DIAL start-up as
described above.
Open project
The project file is saved as MTD file format automatically whenever you perform data
processing method. The manual save is described below. You can re-start your project from the MTB
file. The manual curation of peak identification result is highly recommended. In addition, the
internal standard information can be set from Identification menu.
Save project
Although your project is saved automatically whenever you do the data processing method, this
program is not saved after your manual modification such as the curation of identification result,
internal standard setting, and file or class information setting. Therefore, you have to save your
project from this option after your modification.
Save parameter setting
Your data processing parameter can be saved as MED format file. When you want to use
your method file for your data processing method, select your MED format file in the data
processing setting.
Data processing
All processing: reruns all the processing steps with current parameters.
Adduct ion picking, De-convolution, Identification, Alignment: runs each process independently.
Identification
Identification setting: You can manually correct identification result. This option may be useful to
check internal standards which are not included in the reference library.
View
Display total ion chromatogram: You can see the total ion chromatogram of the focused sample.
Display extracted ion chromatogram: You can see the extracted ion chromatograms which you want
to display for the focused sample.
Option
You can set properties of aligned peaks and files. In the file properties, you can reset file
type, class ID, or analytical order (but not the file name). If you clear the check box of the “Included”
column, the corresponding data are no longer used in the statistical analysis. In the alignment
properties, you can set internal standard information for each aligned peak. Please make sure to
assign “alignment ID” in the “internal standard” column.
Export
A) Peak list export
B) Alignment result export
C) Context menu strip
A) Peak list export: You can get the peak list information of each sample including retention time,
m/z, MS/MS spectra information, and so on. Available formats are MSP, MGF or Text.
Step1.
Step2.
Step3.
Step4.
Choose an export folder path.
Choose files which you want to export.
Select export format.
Push the export button.
B) Alignment result export: You can get data matrix or spectral information.
Step1.
Step2.
Step3.
Choose an export folder path.
Choose an alignment file which you want to export.
Select export format if you want to export the representative spectra.
Step4.
Push the export button.
C) Context menu strip: You can pop-up the context menu strip by a mouse right-click to export
spectra, chromatograms, or PCA results as the ASCII, Bitmap image, or Vector image.