|Type of Publication:||Phd Thesis|
|University:||Wright State University|
Nuclear magnetic resonance (NMR) based metabolomics is a developing research field with broad applicability, including the identification of biomarkers associated with pathophysiologic changes, sample classification based on the type of toxic exposure, and clinical diagnosis. Intrinsic to these applications is the need for statistical and computational techniques to facilitate the associated data analysis. Further, a typical 1H NMR spectrum of pure proteins, biofluids, or tissue may contain thousands of resonances (i.e., peaks), thus, a pure visual inspection is insufficient to fully utilize the spectral information. Common practice within the metabolomics community is to evaluate and validate novel algorithms on empirical and simplified simulated data. Empirical data captures the complex characteristics of experimental data; however, evaluations on empirical data often rely on indirect performance metrics because the optimal or correct output is difficult to obtain a priori. To overcome the drawback of relying on indirect performance metrics, researchers often evaluate their algorithms on simplified simulated data. The conclusions obtained on this type of data can be difficult to generalize to true experimental data. This dissertation combines the advantages of both empirical and simplified simulated data by generating exacting synthetic data sets that emulate the salient features of experimental data. The analysis of NMR metabolic spectroscopic data can be divided into four steps: (1) standard post-instrumental processing of spectroscopic data; (2) quantification of spectral features; (3) normalization and scaling; and (4) multivariate statistical modeling of data. Quantification of spectral features, step (2), is a key step in the development of classification algorithms and biomarker identification (i.e., pattern recognition). Algorithms for spectral quantification are designed to enhance the efficacy of pattern recognition and multivariate statistical techniques for metabolomics. This is accomplished by reducing the dimensionality of the spectra, while retaining salient information and mitigating peak misalignment. This dissertation develops two novel spectral quantification techniques: Gaussian binning and dynamic adaptive binning. Gaussian binning utilizes a kernel-based binning algorithm to decrease the sensitivity to peak misalignment. Dynamic adaptive binning optimizes the bin boundaries through an objective function using a dynamic programming strategy. Both Gaussian binning and dynamic adaptive binning are compared to common spectral binning techniques by analyzing their ability to reduce the probability of peaks spanning bin boundaries and increase the interpretability of the results. Finally, a case study is presented to show the ability of dynamic adaptive binning and Gaussian binning to enhance the analysis of a 1H NMR-based experiment to monitor rat urinary metabolites following exposure to the toxin α-naphthylisothiocyanate.