PEFTEC 2017

PEFTEC 2019 - Abstract

Abstract Title: Themis: batch pre-processing for ultrahigh resolution petroleomics data
Abstract Type: Poster
Session Choice: Analytical Techniques: Mass Spectrometry
Presenter Name: Mr Remy Gavard
Co-authors:Dr Mark Barrow
Dr David Rossell
Dr Simon Spencer
Company/Organisation: University of Warwick
Country: United Kingdom

Abstract Information :

Using Fourier transform ion cyclotron resonance mass spectrometry (FTICR MS), scientists are able to determine an unprecedented number of components in crude oil. The statistical tools required to analyse the mass spectra struggle to keep pace with advancing instrument capabilities and increasing quantities of data. Today, most ultrahigh resolution analyses for petroleum samples are based on very limited numbers of mass spectra per sample. Today, as researchers often base findings on single experiments with labour-heavy approaches, it can be challenging to monitor repeatability and differentiate between noise and true signals. As a result, mistakes and false positive findings can be common. One of the difficulties faced is the reliable differentiation of reliable peaks from noise; if selecting peaks by signal-tonoise ratio alone, it is common that genuine peaks can be removed if the threshold is too high, or that noise peaks result in false positives if the threshold is set too low.

At first glance, false positive peaks often appear in a single mass spectrum while reliable peaks will appear in multiple (if not all) samples. By combining information across datasets, we can get more reliable information with a smaller margin for error. We present a new algorithm developed in R, named Themis, to jointly pre-process replicate measurements of a complex sample. This improves consistency as a preliminary step to assigning chemical compositions, and the algorithm has a quality control criterion. Through the use of peak alignment and an adaptive mixture modelbased strategy, it is possible to distinguish true peaks from noise.

We applied Themis to a variety of crude oils and naphthenic acid samples. These results demonstrated a more effective removal of noise-related peaks and the preservation and improvement of the chemical composition profile. Applied to the NIST crude oil sample, the use of Themis resulted in a decrease from more than 16000 peaks to 2260 peaks but didn't changed the compositional assignment of the high intensity N1 class and the root mean square (RMS) improved from 0.24 ppm to 0.22 pm. The low intensity NS class saw an improvement in its compositional assignment with well distributed series, removal of isolated assignments and a reduction of the RMS from 0.38 ppm to 0.21 ppm.

Themis, therefore, affords greater success with the assignment of chemical compositions to low-intensity peaks using petroleomics software. In addition, improved monitoring of data quality and handling of replicate datasets will allow researchers to increase processing of larger numbers of samples with greater confidence. The algorithm will soon be made available for academic use via a web server.