HTC-15 - Abstract

Abstract Title: Replicates of Complex Mixtures in Ultra-High Resolution Mass Spectrometry Could Help Pave The Way to Big Data
Abstract Type: Seminar
Session Choice: Big Data, The Last Hyphenation
Presenter Name: Mr Remy Gavard
Co-authors:Dr Mark Barrow
Dr Simon Spencer
Dr David Rossell
Company/Organisation: University of Warwick
Country: United Kingdom

Abstract Information :

By using Fourier transform ion cyclotron resonance mass spectrometry (FTICR MS), scientists are able to determine an unprecedented number of components in crude oil. The statistical tools required to analyse the mass spectra struggle to keep pace with advancing instrument capabilities and increasing quantities of data. Today, we are facing "fat data" as we have lots of attributes but no "tall data" as there is a limited amount of exploitable training samples. This is because most ultrahigh resolution analyses for complex mixture samples are based on single, labour-intensive, experiments. As a result, it can be challenging to monitor repeatability and differentiate between noise and true signals. Another factor contributing to the low number of training samples available is that the data analysis is usually performed once for the purpose of a specific investigation but may not be stored for later use. In order to be able to develop methods to exploit greater numbers of samples, we need to ensure the consistency, the reliability and the organisation of MS data.

We present a new algorithm developed in R, named Themis, to jointly pre-process replicate measurements of a complex sample. False positive peaks with low intensity can arise throughout a single mass spectrum due to the presence of noise. The locations of these peaks are not consistent between replicate samples, due to the randomness of the noise. Researchers are typically faced with a trade-off; it is important to set peak picking thresholds low enough to avoid omission of genuine peaks, but setting the threshold sufficiently low can result in large numbers of noise peaks being included too. By combining information across datasets, we determine and reduce false positive peaks with a smaller margin for error. This enables true peaks of low intensity to be extracted from the background noise and improves consistency as a preliminary step to assigning chemical compositions and data analysis. Through the use of peak alignment and an adaptive mixture-model-based strategy, it is possible to distinguish true peaks from noise and obtain more reliable datasets for further use.

We applied Themis to a variety of crude oils and naphthenic acid samples. These results demonstrated a more effective removal of noise-related peaks and the preservation and improvement of the chemical composition profile. Themis enabled the isolation of peaks that would have otherwise been discarded using traditional peak picking (based upon signal-tonoise ratio alone) for a single spectrum, and therefore Themis ensures the inclusion of information that would typically be lost, while also reducing data set sizes.

Themis affords greater success with the assignment of chemical compositions to lowintensity peaks using petroleomic software. In addition, improved monitoring of data quality and handling of replicate datasets will allow researchers to process larger numbers of samples with greater confidence. This, in turn, will enable larger scale data analysis methods, which inform decision making.