|A paradigm shift for (big) data analysis in chromatography: on the use of Bayesian statistics
|Big Data Chemometrics and Method development (In-Silico) (KVCV)
Abstract Information :
Data analysis methods applied to chromatographic data, including base-line correction, peak detection, alignment and peak tracking, calibration and/or classification are a routine part of most modern analytical workflows. With the emergence of hyphenation (especially high-resolution mass spectrometry) and two-dimensional methods (e.g. LCxLC) new challenges for the data analysis are emerging. We are witnessing a boom of the amount of data to be processed, so we can start to talk about Big Data in Analytical chemistry. Analysing these enormous and complex quantities of data becomes a tremendous challenge, especially because of the need to do it automatically. Traditionally, chromatographic data has been processed using the so-called frequentist approach. With this approach, we get just a final answer about the hypotheses we are testing, but we have no information about its probability of being true.
Contrary to the frequentist approach, Bayesian statistics offers a very interesting alternative, estimating the probabilities of the processes mentioned above. This way of thinking opens a new world of possibilities, especially in the area of automated massive data treatment. In this way, the chromatographer has no longer to "trust" the results of the data analysis, but (s)he has to decide on the different configurations that explain the data, based on the probabilities of each one.
We have applied this way of thinking to a broad range of situations. One example concerns toxicological screening, in which the probabilities of a list of compounds being present in the sample, analysed with LC-MS. Using a Bayesian approach, it is easy to build up evidence about the presence/absence of a compound by taking into account adduct formation, isotope ratios, retention times and mass values, resulting in more accurate values of probability. Another example is a Bayesian view of the well-known peak tracking methods. In (traditional) peak tracking methods, peaks of the same compound are recognized in different chromatographic conditions. A Bayesian thinking approaches the problem in a probabilistic way, i.e. assigning different possibilities of peaks to the different compounds available.
In my opinion, the use of Bayesian statistics to deal with massive data treatment in chromatography constitutes a shift in the way we think about data analysis. Basically, we are proposing to work with probabilities of hypotheses (and update them as long as more information/data is taken into account), opposed to deliver the final answer to the chromatographer.