# Analysis of Biofuels Using Gas Chromatography and Multivariate Statistical Methods

### Program in Applied Mathematics Colloquium

**Analysis of Biofuels Using Gas Chromatography and Multivariate Statistical Methods**

**Location:**MATH 501

**Presenter:**Edward Soares, Mathematics and Computer Science, College of Holy Cross

Gas chromatography (GC) is a technique used in analytical chemistry for separation and analysis of compounds that can be vaporized without decomposition. In particular, separation of different components of a mixture or the relative amounts of such components may be achieved. When applied to biodiesel fuels, GC allows us to identify the fatty acid methyl esters (FAMEs) present in the sample, whose presence depends on the underlying feedstock type. Typically, a set of replicate chromatograms is measured for each sample, and several biofuel classes are compared together. Each chromatogram quantifies molecular abundance at each of several retention times, which in turn correspond to particular carbon chains. Variation in peak height relative to retention time serves to differentiate biofuel classes. However, inherent in GC is measurement error in the form of variation in peak location (known as drift), which must be removed prior to chemometric analysis. This is usually accomplished by aligning each chromatogram to that of a reference sample. Another important factor to consider in GC is the type of column used, which is essentially a tube coated on the inside with a polymer. A carrier gas (mobile phase) provides a mechanism for the sample to interact with the column (stationary phase) and the data to be measured. This causes each compound to elute at a different retention time and thus different FAMEs may be identified. Longer columns yield better separation of FAMEs at the cost of longer data acquisition time. In the first part of this talk, I will discuss a method of optimizing chromatogram alignment using the Hotelling trace criterion (HTC). The HTC can be thought of as the multivariate and multi-class extension of the square of the two-sample t-statistic. Large values for the HTC correspond to better separation of groups of principal component scores for each class. In the second part of this talk, I will discuss an approach to quantifying equivalence between longer and shorter columns using the Mahalanobis distance. The Mahalanobis distance measures the distance between the multivariate means of two probability distributions while accounting for variation and the correlation between random variables. I will show that a shorter, faster column can produce statistically equivalent data, relative to a longer, slower column. Finally, I will briefly discuss current/future work on identification of an optimal reference sample for chromatogram alignment.