Abstract Title: | Replicates of Complex Mixtures in Ultra-High Resolution Mass Spectrometry Could Help Pave The Way to Big Data |
Abstract Type: | Seminar |
Session Choice: | Big Data, The Last Hyphenation |
Presenter Name: | Mr Remy Gavard |
Co-authors: | Dr Mark Barrow Dr Simon Spencer Dr David Rossell |
Company/Organisation: | University of Warwick |
Country: | United Kingdom |
Abstract Information :
By using Fourier transform ion cyclotron resonance mass spectrometry (FTICR MS),
scientists are able to determine an unprecedented number of components in crude oil. The
statistical tools required to analyse the mass spectra struggle to keep pace with advancing
instrument capabilities and increasing quantities of data. Today, we are facing "fat data" as
we have lots of attributes but no "tall data" as there is a limited amount of exploitable training
samples. This is because most ultrahigh resolution analyses for complex mixture samples
are based on single, labour-intensive, experiments. As a result, it can be challenging to
monitor repeatability and differentiate between noise and true signals. Another factor
contributing to the low number of training samples available is that the data analysis is
usually performed once for the purpose of a specific investigation but may not be stored for
later use. In order to be able to develop methods to exploit greater numbers of samples, we
need to ensure the consistency, the reliability and the organisation of MS data.
We present a new algorithm developed in R, named Themis, to jointly pre-process replicate
measurements of a complex sample. False positive peaks with low intensity can arise
throughout a single mass spectrum due to the presence of noise. The locations of these
peaks are not consistent between replicate samples, due to the randomness of the noise.
Researchers are typically faced with a trade-off; it is important to set peak picking thresholds
low enough to avoid omission of genuine peaks, but setting the threshold sufficiently low can
result in large numbers of noise peaks being included too. By combining information across
datasets, we determine and reduce false positive peaks with a smaller margin for error. This
enables true peaks of low intensity to be extracted from the background noise and improves
consistency as a preliminary step to assigning chemical compositions and data analysis.
Through the use of peak alignment and an adaptive mixture-model-based strategy, it is
possible to distinguish true peaks from noise and obtain more reliable datasets for further
use.
We applied Themis to a variety of crude oils and naphthenic acid samples. These results
demonstrated a more effective removal of noise-related peaks and the preservation and
improvement of the chemical composition profile. Themis enabled the isolation of peaks that
would have otherwise been discarded using traditional peak picking (based upon signal-tonoise
ratio alone) for a single spectrum, and therefore Themis ensures the inclusion of
information that would typically be lost, while also reducing data set sizes.
Themis affords greater success with the assignment of chemical compositions to lowintensity
peaks using petroleomic software. In addition, improved monitoring of data quality
and handling of replicate datasets will allow researchers to process larger numbers of
samples with greater confidence. This, in turn, will enable larger scale data analysis
methods, which inform decision making.