Field notes: why chemometricians and process engineers talk past each other

A small but recurrent observation across PAT projects: the meeting that should produce a working analyzer instead produces two parallel monologues. The chemometrician describes a model. The process engineer describes a line. Neither is wrong; they are describing different things with overlapping words. The result is a project that takes three times longer than it should because each party assumes the other has internalized the constraint they are emphasizing.

This is a field-notes piece. It synthesizes a pattern from multiple PAT projects and is not attributed to a single named source. The patterns are recurrent enough that any practitioner who has run a few of these meetings will recognize them.

”Accuracy”

The chemometrician means RMSEP — root-mean-square error of prediction on a held-out test set, expressed in the units of the reference method. This is a well-defined number. It is in the model report.

The process engineer means how often does the alarm go off when nothing is wrong, and how often does the alarm fail to go off when something is wrong. These are also well-defined numbers, but they are not RMSEP — they are functions of the alarm threshold, the process variability, and the rate at which the process generates real excursions.

A model with an RMSEP of 0.5 % can produce a useful alarm or a useless alarm depending on the alarm threshold and the process variability. The chemometrician’s report does not address this and the engineer’s experience does not address RMSEP. Both think they have communicated.

The bridge: ask the chemometrician for the RMSEP and the model uncertainty distribution under representative process conditions; ask the engineer for the alarm threshold derivation and the false-positive rate target. Compare. The conversation usually clears.

”Validation”

The chemometrician means a documented exercise that demonstrates the model’s predictive performance on independent data, against a reference method, across the range of validity claimed by the model. ICH Q14 and ASTM E1655 both define this rigorously.

The process engineer means the model has run on the line for a few weeks without surprising us, which is a different and softer claim about operational stability rather than statistical performance.

Both meanings are legitimate; both are required for production deployment; neither is sufficient on its own. Every project we have observed where one party assumed the other’s meaning had quality issues at the next inspection or production batch deviation.

”The model”

The chemometrician means a fixed function — a matrix of regression coefficients, a set of preprocessing steps, a defined input-to-output mapping — that has been validated on a specific dataset and operates within a defined validity range.

The process engineer means a software product — running on the analyzer, alongside an HMI, integrated with the DCS, with operator screens and alarm logic — that does some processing and produces a number on a control chart.

The model report describes the first. The validation against ICH Q14 covers the first. The 21 CFR Part 11 audit covers the second. The thing that goes wrong on a Tuesday at 2 a.m. is almost always the second.

The bridge is to be precise: the calibration, the model’s runtime container, the application layer, the HMI. These are four distinct artifacts; collapsing them into one word has caused project delays.

”Outliers”

The chemometrician means samples whose multivariate profile lies outside the calibration training distribution — diagnosed by Hotelling T² or Q residuals, both well-defined statistics.

The process engineer means samples that came back from the lab with an unexpected value, regardless of whether the spectrum looks unusual to the model.

A spectrum can be a chemometric outlier (high Q) without being a process outlier (the predicted concentration is on-target). A sample can be a process outlier (the lab reports a deviation from spec) without being a chemometric outlier (the spectrum looks completely normal).

The two cases imply different actions. A chemometric outlier with a normal lab reading suggests something has changed about the spectrum that the model does not understand — instrument drift, fouling, a new product variant. A process outlier without a chemometric flag suggests the model is missing the variation that matters — a flaw in calibration coverage. Treating these the same produces incorrect remediation in both cases.

”We need more data”

The chemometrician means more samples that span the variability range, with reference values from a validated method. This is expensive, slow, and the scarce resource on most projects.

The process engineer means more spectra. Spectra are cheap; reference values are not.

A common project failure is the assumption that running the analyzer for six more months will accumulate the data needed to improve the model. It does not — six months of unreferenced spectra is six months of unlabeled data, useful for unsupervised methods (PCA, anomaly detection) but not for supervised retraining. Improving a regression model requires reference values, which means lab work, which means budget that has often been spent.

The conversation that needs to happen early: what is our reference-method capacity, and how is it allocated between calibration build, ongoing performance verification, and re-training?

”It’s working”

The chemometrician means the model’s residuals on a verification set are within the validated specification.

The process engineer means the analyzer has been online for the last week and has not been the cause of any deviations.

These are weakly correlated. A model can be statistically valid and operationally a nuisance (frequent false alarms, frequent calibration drifts requiring intervention). A model can be operationally invisible and statistically poor (predicting a constant offset that the operators have learned to ignore). It’s working needs a definition.

What helps

The teams we have observed running smooth PAT programs share three habits:

First, they hold a shared metrics review monthly. RMSEP and Q-statistic excursion rate from the chemometrician; alarm rate, false-alarm rate, and intervention rate from the engineer. Both at the same table; everyone sees both.

Second, they use a single language for the artifact stack. Calibration. Runtime. Application. HMI. Each named, each with an owner, each with a change-control path. The vocabulary is not glamorous; it prevents the confusion that consumes weeks.

Third, they treat the first ninety days after deployment as a discovery phase, not a steady state. Both teams expect surprises and have a structured process for capturing them, diagnosing them, and feeding back into the next model revision. After ninety days, the rate of surprise drops materially. Before ninety days, there is no such thing as a smoothly running new PAT installation.

The technical work is well-documented. The communication work is not. It is the bigger lever.