Anthropic has published a science research blog showing that Claude Opus 4.7, a general-purpose large language model, can match — and in some tests surpass — specialized chemistry software on NMR (nuclear magnetic resonance) spectroscopy tasks. The finding is significant because Claude received no chemistry-specific fine-tuning.
Key Highlights
- Opus 4.7 achieved a hydrogen prediction error of just ±0.079 ppm, well under the ±0.20 ppm industry tolerance
- Carbon prediction error: ±1.37 ppm — effectively tied with MestReNova at ±1.48 ppm
- Splitting pattern accuracy: approximately 80% within 0.5 Hz, compared to 26–35% for ChemDraw and MestReNova
- Structure elucidation: 100% correct on all 8 simpler molecules; 4 of 7 complex molecules solved on every run
What Is NMR Spectroscopy?
Nuclear magnetic resonance spectroscopy is the primary analytical technique chemists use to identify and verify molecular structures. After synthesizing a new compound, a chemist must manually match each spectral peak to a specific atom in the proposed structure — a time-consuming process that represents one of the last major manual bottlenecks in synthetic chemistry.
Current specialist tools like ChemDraw and MestReNova handle forward prediction (structure to spectrum) reasonably well, but inverse prediction — inferring a molecule's structure from its spectrum — is almost entirely left to the chemist's judgment. Claude now handles both.
How Claude Was Evaluated
Anthropic researcher David Kamber evaluated three Claude models (Opus 4.7, Opus 4.6, and Sonnet 4.6) against ChemDraw and MestReNova. The benchmark used 20 compounds drawn from ChemRxiv preprints published after the models' training cutoff, spanning four structural families: chloropyridazines, Boc-N-aryl maleimides, spirobicyclic ketones, and alpha-silyl methanesulfonamides.
Each Claude model ran the prediction tasks three times per compound; classical tools ran once. The test covered three solvents (DMSO-d6, CDCl3, and D2O) and both forward and inverse prediction modes.
Why This Matters
The results challenge a long-standing assumption: that scientific AI must be domain-fine-tuned to be useful. Claude's multimodal capabilities allow chemists to read experimental data directly from journal figures and hand-drawn sketches — removing the need for pre-curated databases.
"Claude is starting to meaningfully assist chemists with the daily translation, recall, and integration work that complements their judgment," Anthropic noted in the post.
Opus 4.7 outperformed both ChemDraw and MestReNova on splitting pattern accuracy (roughly 80% vs. 26–35%) and matched MestReNova on carbon shift prediction — metrics that matter most for practical lab work.
What's Next
Anthropic plans to expand the work to four further bottlenecks: chemical structure recognition, synthetic reasoning, reaction mechanism explanation, and literature comprehension.
The "Making Claude a Chemist" research blog positions Anthropic as a serious player in AI-for-science, a space where Google DeepMind's AlphaFold has already staked major ground. Whether the chemistry work leads to a dedicated research tool remains to be seen — but the benchmark results suggest that general AI models are closing the gap with domain-specific scientific software faster than most anticipated.