Deep learning has been advertised as the successor to partial least squares (PLS) in process Raman work for at least five years. The honest summary of the last twelve months is that the picture is finally sharpening. Several reference benchmarks have appeared, the bioprocess-monitoring literature now has a public dataset to argue over, and the first explainability tooling aimed at regulated use has been published.

This is a reading list, not a position piece. The papers below were selected from the arXiv physics.chem-ph and cs.LG feeds and from recent issues of Measurement, Sensors, and Biotechnology and Bioprocess Engineering. The aim is to capture where the methodology is actually moving, not where it is being marketed. Readers who want a primer on the underlying multivariate models can start with our chemometrics 101 overview, and our open-source chemometric libraries note covers the software side.

A public bioprocess benchmark, finally

The most consequential paper for industrial bioprocess teams is Lange et al., “Deep learning for Raman spectroscopy: Benchmarking models for upstream bioprocess monitoring” (Measurement, 13 September 2025, DOI 10.1016/j.measurement.2025.118884). The TU Berlin group, working with collaborators at DataHow, released a reference dataset of 6,960 annotated Raman spectra covering eight key fermentation substrates, generated through an automated pipetting system. Eleven model types were compared on it: convolutional neural networks, attention-based transformers, ensemble methods, and traditional chemometric approaches including PLS.

The headline conclusion is the one practitioners have been waiting for: several deep-learning approaches significantly outperform PLS on coefficient of determination and mean absolute error, but no single architecture wins across all eight analytes. The authors flag the Tabular Prior-data Fitted Network, an in-context learning model, as promising but inconsistent. For teams used to deploying one PLS model per analyte, this is the more important reading: the dataset is public, the model implementations are versioned, and the results can be replicated outside the lab that produced them. That is a much rarer state of affairs than it should be.

A second benchmark for classification

A second benchmark, published in January 2026, addresses Raman classification rather than concentration prediction. Sineesh and Kamsali’s “Benchmarking Deep Learning Models for Raman Spectroscopy Across Open-Source Datasets” (arXiv 2601.16107, 22 January 2026) evaluates five Raman-specific deep-learning architectures across three open-source datasets under a unified training and hyperparameter-tuning protocol. The authors report that SANet posted the strongest accuracy and macro-averaged F1 across the three datasets, and that transformer-based models underperformed relative to their reputation, which the authors attribute to insufficient training data and architectural choices that have not yet been adapted to spectra.

The two benchmarks point in the same direction. Deep models beat the classical baselines when there is enough data, and the gap closes or reverses when there is not. PLS has not been retired by either paper.

Self-supervised pre-training

The data-hunger problem is exactly what self-supervised pre-training is intended to address. Ren, Zhou, and Li’s “A Self-supervised Learning Method for Raman Spectroscopy based on Masked Autoencoders” (arXiv 2504.16130, 21 April 2025) trains a transformer encoder on unannotated spectra by masking and reconstructing patches. After fine-tuning with a limited amount of labelled data, the model reaches 83.90% identification accuracy on a 30-class pathogenic bacteria task, narrowly above a supervised ResNet baseline at 83.40%. The reconstruction step also delivers more than a twofold signal-to-noise improvement on the pre-training corpus.

This is not yet a process-analytics result. The technique matters because spectra are cheap to acquire in a running plant and expensive to annotate, which is the exact regime where masked-autoencoder pre-training has paid off in other modalities.

Explainability for GMP-bound users

Blake et al.’s “SpecReX: Explainable AI for Raman Spectroscopy” (arXiv 2503.14567, 18 March 2025) is the smallest paper on this list but possibly the most useful for anyone heading into a GMP model validation review. The tool computes causal responsibility maps that quantify which spectral regions drive a deep model’s classification, by iteratively perturbing the input spectrum and checking whether the original output is retained. The authors validate the maps against synthetic spectra with a seeded ground-truth band. For users who need to defend a black-box model to a regulator, having a method that localises model attention onto physically meaningful Raman shifts is a measurable improvement over “we did cross-validation.”

A worked bioprocess case

For an example of all of the above being applied to a real fermentation, Zhou, Duan, Liu and colleagues’ “Accelerating process development of recombinant protein in Escherichia coli fermentation through deep learning and Raman process analysis techniques” (Biotechnology and Bioprocess Engineering, 2026, DOI 10.1007/s12257-026-00259-5) uses a genetic-algorithm-tuned semi-supervised CNN (GA-SCNN) to predict target protein (ProA5m) concentration directly from inline Raman spectra. Under their reported optimum (cooling temperature 22 °C, induction time 10.3 h, feed rate 0.27 h-1), titre rose from 2.80 g/L to 9.37 g/L, a 335% increase. The transferable point is not the titre, which is specific to that strain and that plasmid; it is that the SCNN with very few annotated samples is being used as the optimisation loop’s predictor, not merely as a final reporting model.

Methodological hygiene

The January 2026 Sensors review by Liu, Wu, Wang, Qi, Zhou, and Xue (“Recent Advances in Raman Spectral Classification with Machine Learning”, Sensors 26(1):341, DOI 10.3390/s26010341) is the place to send a new team member. It catalogues applications across biomedical diagnostics, food authentication, microplastics, and forensics, and it is unusually blunt about the field’s standing problems: limited training data, poor cross-laboratory generalisation, no agreed metadata or preprocessing protocols, and reproducibility barriers when raw data and code are not released. Those issues are not unique to Raman, but they apply to it in particularly direct ways, and most published gains are still reported on single-site datasets.

What to take from the set

Three threads run through the six papers. First, public datasets and unified protocols are starting to appear, which is the precondition for the field having useful arguments at all. Second, deep models beat PLS in data-rich settings and tie or lose in data-poor ones, so the architecture choice still depends on the deployment regime. Third, the tooling needed to make deep-learning chemometrics deployable in regulated environments, explainability in particular, has begun to ship as research code. None of those threads is closed, and none of them removes PLS from a working PAT lab. They do make the case that the next round of inline Raman method development should at least include a deep-learning baseline alongside the conventional one.