|Abstract Title:||Themis: batch pre-processing for ultrahigh resolution petroleomics data|
|Session Choice:||Analytical Techniques: Mass Spectrometry|
|Presenter Name:||Mr Remy Gavard|
|Co-authors:||Dr Mark Barrow|
Dr David Rossell
Dr Simon Spencer
|Company/Organisation:||University of Warwick|
Abstract Information :
Using Fourier transform ion cyclotron resonance mass spectrometry (FTICR MS),
scientists are able to determine an unprecedented number of components in crude
oil. The statistical tools required to analyse the mass spectra struggle to keep pace
with advancing instrument capabilities and increasing quantities of data. Today, most
ultrahigh resolution analyses for petroleum samples are based on very limited
numbers of mass spectra per sample. Today, as researchers often base findings on
single experiments with labour-heavy approaches, it can be challenging to monitor
repeatability and differentiate between noise and true signals. As a result, mistakes
and false positive findings can be common. One of the difficulties faced is the
reliable differentiation of reliable peaks from noise; if selecting peaks by signal-tonoise
ratio alone, it is common that genuine peaks can be removed if the threshold is
too high, or that noise peaks result in false positives if the threshold is set too low.
At first glance, false positive peaks often appear in a single mass spectrum while reliable peaks will appear in multiple (if not all) samples. By combining information across datasets, we can get more reliable information with a smaller margin for error. We present a new algorithm developed in R, named Themis, to jointly pre-process replicate measurements of a complex sample. This improves consistency as a preliminary step to assigning chemical compositions, and the algorithm has a quality control criterion. Through the use of peak alignment and an adaptive mixture modelbased strategy, it is possible to distinguish true peaks from noise.
We applied Themis to a variety of crude oils and naphthenic acid samples. These results demonstrated a more effective removal of noise-related peaks and the preservation and improvement of the chemical composition profile. Applied to the NIST crude oil sample, the use of Themis resulted in a decrease from more than 16000 peaks to 2260 peaks but didn't changed the compositional assignment of the high intensity N1 class and the root mean square (RMS) improved from 0.24 ppm to 0.22 pm. The low intensity NS class saw an improvement in its compositional assignment with well distributed series, removal of isolated assignments and a reduction of the RMS from 0.38 ppm to 0.21 ppm.
Themis, therefore, affords greater success with the assignment of chemical compositions to low-intensity peaks using petroleomics software. In addition, improved monitoring of data quality and handling of replicate datasets will allow researchers to increase processing of larger numbers of samples with greater confidence. The algorithm will soon be made available for academic use via a web server.