Deep learning the collisional cross sections of the peptide universe from a million experimental values

GND
1315806932
ORCID
0000-0003-4729-175X
Zugehörigkeit
Functional Proteomics, Jena University Hospital, Jena, Germany
Meier, Florian;
Zugehörigkeit
Institute of Computational Biology, Helmholtz Zentrum München—German Research Center for Environmental Health, Neuherberg, Germany
Köhler, Niklas D.;
ORCID
0000-0002-2733-7899
Zugehörigkeit
Department Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany
Brunner, Andreas-David;
Zugehörigkeit
Institute of Computational Biology, Helmholtz Zentrum München—German Research Center for Environmental Health, Neuherberg, Germany
Wanka, Jean-Marc H.;
Zugehörigkeit
Department Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany
Voytik, Eugenia;
ORCID
0000-0003-3320-6833
Zugehörigkeit
Department Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany
Strauss, Maximilian T.;
ORCID
0000-0002-2419-1943
Zugehörigkeit
Department of Mathematics, TU München, Munich, Germany
Theis, Fabian J.;
ORCID
0000-0003-1292-4799
Zugehörigkeit
NNF Center for Protein Research, Faculty of Health Sciences, University of Copenhagen, Copenhagen, Denmark
Mann, Matthias

The size and shape of peptide ions in the gas phase are an under-explored dimension for mass spectrometry-based proteomics. To investigate the nature and utility of the peptide collisional cross section (CCS) space, we measure more than a million data points from whole-proteome digests of five organisms with trapped ion mobility spectrometry (TIMS) and parallel accumulation-serial fragmentation (PASEF). The scale and precision (CV < 1%) of our data is sufficient to train a deep recurrent neural network that accurately predicts CCS values solely based on the peptide sequence. Cross section predictions for the synthetic ProteomeTools peptides validate the model within a 1.4% median relative error ( R  > 0.99). Hydrophobicity, proportion of prolines and position of histidines are main determinants of the cross sections in addition to sequence-specific interactions. CCS values can now be predicted for any peptide and organism, forming a basis for advanced proteomics workflows that make full use of the additional information.

Zitieren

Zitierform:
Zitierform konnte nicht geladen werden.

Rechte

Rechteinhaber: © The Author(s) 2021

Nutzung und Vervielfältigung: