MAD HATTER Correctly Annotates 98% of Small Molecule Tandem Mass Spectra Searching in PubChem

Hoffmann, Martin A.; Kretschmer, Fleming; Ludwig, Marcus; Böcker, Sebastian

doi:10.3390/metabo13030314

Artikel / Aufsatz Di., 21. Feb.. 2023 CC BY 4.0

Veröffentlicht

MAD HATTER Correctly Annotates 98% of Small Molecule Tandem Mass Spectra Searching in PubChem

Hoffmann, Martin A.; Kretschmer, Fleming ; Ludwig, Marcus ; Böcker, Sebastian

Metabolites provide a direct functional signature of cellular state. Untargeted metabolomics usually relies on mass spectrometry, a technology capable of detecting thousands of compounds in a biological sample. Metabolite annotation is executed using tandem mass spectrometry. Spectral library search is far from comprehensive, and numerous compounds remain unannotated. So-called in silico methods allow us to overcome the restrictions of spectral libraries, by searching in much larger molecular structure databases. Yet, after more than a decade of method development, in silico methods still do not reach the correct annotation rates that users would wish for. Here, we present a novel computational method called Mad Hatter for this task. Mad Hatter combines CSI:FingerID results with information from the searched structure database via a metascore. Compound information includes the melting point, and the number of words in the compound description starting with the letter ‘u’. We then show that Mad Hatter reaches a stunning 97.6% correct annotations when searching PubChem, one of the largest and most comprehensive molecular structure databases. Unfortunately, Mad Hatter is not a real method. Rather, we developed Mad Hatter solely for the purpose of demonstrating common issues in computational method development and evaluation. We explain what evaluation glitches were necessary for Mad Hatter to reach this annotation level, what is wrong with similar metascores in general, and why metascores may screw up not only method evaluations but also the analysis of biological experiments. This paper may serve as an example of problems in the development and evaluation of machine learning models for metabolite annotation.

Vorschau

Einordnung

Herausgeber(in):: Ji, Hongchao; Fan, Xiaqiong; Moseley, Hunter N. B.
Erschienen in:: Metabolites
Bd. 13, H. 3 (21.02.2023)Art.-Nr.:314, S. 1-15
Band:: 13
Heft:: 3
Datum der Veröffentlichung:: 21.02.2023
URN:: urn:nbn:de:gbv:27-dbt-20230308-051424-001
DOI:: 10.3390/metabo13030314
Sprache:: Englisch
Ressourcentyp:: Text
Umfang:: 15 Seiten
Schlagwörter:: metabolite annotation; molecular structure; tandem mass spectrometry; in silico methods; database search; metascores; parody paper
DDC-Sachgruppe der DNB:: 570 Biowissenschaften, Biologie
Einrichtung:: Friedrich-Schiller-Universität Jena, Fakultät für Mathematik und Informatik
Anmerkung:: Zweitveröffentlichung

auf die Merkliste

Zitieren

Zitierform:

10.3390/metabo13030314
Zitier-Link kopieren

Rechte

Nutzung und Vervielfältigung:

Export

BibTeX, Endnote, MODS, MARCXML, RIS, ISI, PICA, DC, CSV