Multivariate SPC for process spectroscopy: T-squared and Q-residual

Univariate statistical process control plots one number over time and asks whether the latest reading sits inside three sigma of the mean. It works when the process is summarised by a single variable. It breaks when the process is summarised by a near-infrared or Raman spectrum with a thousand wavelength channels, each correlated with several of its neighbours. A control chart per channel produces a thousand charts, a near-certain false alarm somewhere on every sample, and no useful picture of whether the process actually drifted.

Multivariate statistical process control - MSPC - is the answer that chemometricians settled on in the early 1990s and that has not been displaced since. The mechanics are simple enough to explain in a page and stable enough that the same two diagnostic statistics, T-squared and Q-residual, still anchor every modern process monitoring system, from the open-source PCA-based monitors built on scikit-learn to the proprietary platforms shipped by vendors of inline analysers. This piece sets out what the two statistics measure, how the limits are calculated, and where the practical pitfalls sit. It assumes a working familiarity with PCA in the form covered in our PCA vs PLS guide.

The core idea

MSPC begins from a PCA model trained on a reference set of spectra that represent the process running in control. The reference set defines a hyperplane in spectral space: the subspace spanned by the retained principal components. Every routine measurement is then projected onto that hyperplane, and two distances are computed.

The first distance, Hotelling’s T-squared, measures how far the projection sits from the centre of the model along the retained components. It is the multivariate analogue of a z-score, weighted by the inverse covariance of the scores so that components with little variance contribute proportionally more if a sample swings along them. A large T-squared means the sample is unusual in the directions the PCA model captures - the process has moved within the spectral subspace the calibration spans.

The second distance, the Q-residual (also called squared prediction error, or SPE), measures how far the original spectrum lies from the hyperplane itself. It is the sum of squared residuals after the spectrum has been reconstructed from its scores and loadings. A large Q-residual means the sample contains spectral features that the calibration set did not see at all - the process has moved out of the spectral subspace.

The two statistics are independent in a useful way. T-squared rises when the process drifts in a familiar direction; Q-residual rises when something new appears. The Kresta, MacGregor, and Marlin paper of 1991 set out this decomposition for a continuous chemical process, and a generation of subsequent work has done little more than adapt it to new measurement modalities.

Control limits

A control chart needs a limit. For T-squared, the limit is computed from an F-distribution with degrees of freedom set by the number of retained components and the number of calibration samples. The standard practice is to use a 95% limit for warning and a 99% limit for action; both are routinely shown on the chart. For Q-residual, the limit is computed from a chi-squared approximation due to Jackson and Mudholkar, parameterised by the eigenvalues of the unretained components. The same 95%/99% convention applies.

Both limits depend on the calibration set’s representativeness. If the calibration contains a single batch’s worth of variation, the limits will be tight, and any normal seasonal drift will trip them. If the calibration spans the realistic operating envelope - several batches, several operators, the range of feedstock variation the plant actually sees - the limits will be useful. This is the unglamorous part of MSPC, and it is where most of the work sits. The chemometric literature is unanimous that the calibration set determines the model’s performance more than the choice of statistic.

The two-phase logic from classical SPC carries over. Phase 1 is the design phase: the calibration set is screened for outliers, the limits are computed, and the model is locked. Phase 2 is the routine monitoring phase: new spectra are projected, the two statistics are computed, and the chart is read.

Contribution plots

A T-squared or Q-residual flag tells the operator something has changed but not what changed. Contribution plots fill that gap. Each statistic can be decomposed into contributions from individual wavelength channels: for Q-residual the decomposition is exact, for T-squared it follows the form set out by Westerhuis, Gurden, and Smilde in 2000. The plot shows which channels drove the alarm, and a chemist who knows the spectrum can usually identify the underlying change - a new impurity band, a shift in particle scattering, a calibration lamp drift in the instrument itself.

Contribution plots are diagnostic, not confirmatory. A spike at a wavelength known to belong to an analyte band is suggestive, but the only way to confirm a concentration excursion is to predict the concentration with a separate PLS model and compare against specification. Conflating the descriptive MSPC layer with the predictive PLS layer is one of the failure modes covered in our PCA vs PLS piece.

Batch processes

The Nomikos and MacGregor extension of 1994 generalises MSPC to batch processes, where each batch produces a time-resolved tensor of spectra rather than a single observation. The tensor is unfolded into a two-dimensional matrix and a multiway PCA model is built. T-squared and Q-residual statistics for the batch as a whole, and for the batch’s evolution against an in-control trajectory, follow the same logic. Batch MSPC underpins most pharmaceutical fermentation and crystallisation monitoring sold today, and the implementations differ in detail but not in mechanics.

Regulatory framing

For pharmaceutical operators, MSPC sits inside the chemometric procedures that ICH Q14 expects to see described in the lifecycle file, and USP general chapter on chemometrics treats T-squared and Q-residual as standard tools without prescribing the algorithm. ASTM E2891 is the most prescriptive document; it sets out, in language inspectors recognise, what an MSPC implementation in pharmaceutical development and manufacturing should document. The standard does not require MSPC, but it provides the vocabulary an inspector will use.

The contrast with univariate SPC is worth stating clearly. Univariate charts are not banned in spectroscopy. They are simply blind to the bulk of the information the spectrum carries. A facility that runs only univariate charts on a multivariate measurement is, in inspection terms, leaving most of its data on the table.

What MSPC does not do

MSPC monitors the spectrum, not the chemistry directly. It will flag a change in spectral space whether or not the change corresponds to a release-relevant attribute. A solvent swap upstream may produce a strong T-squared signal and no change in tablet potency; a slow drift in active concentration may stay within historical T-squared and Q-residual limits while sliding through specification. The decision logic is therefore two-track: MSPC for spectral-space surveillance, PLS or its equivalents for the predicted attribute. Both layers run together; neither replaces the other.

Closing

T-squared and Q-residual are not new, not glamorous, and not optional in a properly instrumented PAT installation. They are the layer that tells an operator the spectrum no longer looks like one the predictive model was trained on - which is the only honest warning the analyser can give before the predicted value becomes untrustworthy. The chemometric literature has spent thirty years refining how to read the two statistics, but the statistics themselves have not moved. If a process monitor does not compute them, the first task is to add them; everything else is downstream of that.