Abstract Title: | Fragmentation tree prediction based on molecular fingerprints |
Presenter Name: | Ms Viktoriia Turkina |
Co-authors: | Ms Denice van Herwerden Dr Jake W. O'Brien |
Company/Organisation: | Van ’t Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam |
Country: | Netherlands |
Abstract Information :
High-resolution nontargeted mass spectrometry has been gaining popularity as it is able to provide comprehensive information about both known and unknown compounds presented in a sample in a single analysis. The identification of the latter however remains an analytical workflow bottleneck. The conventional method is based on a comparison of experimental spectra to the reference spectra. Since this approach is inherently limited to the quantity of spectra in spectral libraries, where coverage of existing chemicals is still relatively small, the development of new solutions is required. In order to tackle this issue in silico methods have been introduced into the field. Such methods exploit information from molecular structure databases, which are several orders of magnitude larger than the spectral ones. Currently, both approaches use MS-to-compound matching and compound-to-MS matching. The algorithms based on the first approach extract the most likely features from the acquired MS spectrum and look up the chemical structure in databases via feature matching (e.g. CSI:FingerID, Metfrag), whereas the second approach predicts the MS spectra of chemicals. The latter allows the expansion of available reference spectral libraries and empowers the identification of unknowns by comparing the experimental and predicted reference spectra (e.g. CFM-ID). Despite the continuous improvement of existing algorithms and the development of new ones, the inaccuracy of identification is still one of the greatest challenges to be addressed. Therefore this work is dedicated to the development of a fragmentation tree prediction model based on machine learning. Structures (i.e. SMILES) of compounds from publicly available databases were converted into molecular fingerprints. Using a combination of computed fingerprints and mass spectra probabilities of specific cumulative neutral losses (CNLs) occurrence were calculated. These probabilities were converted to thresholds to define the most likely CNLs. Finally, for performance evaluation, the results were compared with those obtained using the CFM-ID method, which is considered the best method for predicting spectra.