Three questions are circulating in every licensed producer currently evaluating AI for GMP-critical workflows.

‍

Can we use ChatGPTΒ  for batch review? Does our AI vendor's compliance certification cover our GMP obligations? Are we ready for 2027 enforcement?

‍

The short answers: no, no, and almost certainly not, but not for the reason most operators assume.

‍

The compliance failure most likely to surface at inspection in 2027-2028 will not be an AI vendor using the wrong model architecture. It will be an operator whose records infrastructure cannot reconstruct an AI-influenced release decision from a batch number. The AI tool passed its validation. The records layer wasn't built to receive what the AI tool produced.

‍

This article explains what Annex 22 requires, where the operator's obligation begins, and what needs to be in place before any AI tool goes near a GMP-critical workflow.

‍

What is EU GMP Annex 22?

‍

EU GMP Annex 22 is a new six-page annex to the EU Good Manufacturing Practice guidelines, published for consultation in July 2025 alongside revisions to Annex 11 and Chapter 4. It governs the use of artificial intelligence in GMP-regulated environments, specifically, AI systems with direct impact on patient safety, product quality, or data integrity.

‍

It extends Annex 11. It does not replace it. Every access control, audit trail, electronic signature, periodic review, and security requirement in Annex 11 continues to apply to the computerised system hosting the AI. Annex 22 adds ten sections on top of that foundation, covering intended use, model validation, test-data integrity, explainability, confidence scoring, human oversight, and change control for AI-specific obligations.

‍

Consultation closed in October 2025. Enforcement is anticipated in the 2027-2028 window, consistent with commentary from PIC/S. For operators with validated GMP systems, the relevant timeline is not 2027 β€” it's now. Records infrastructure changes in a GMP environment take 12 to 18 months to design, validate, and implement. That cycle needs to start before an enforcement deadline is visible on the horizon.

‍

Annex 22 applies to any AI system used in batch release, deviation triage, QC classification, defect detection, yield prediction, or environmental anomaly detection. If the AI output influences a GMP record, Annex 22 applies.

‍

Which AI systems are allowed in GMP workflows?

‍

Section 1 of Annex 22 establishes the scope of the regulation and draws a hard boundary around which AI systems may operate in GMP-critical applications.

‍

The regulation applies only to static, deterministic models. Static means weights are frozen at deployment, and the model does not adapt during use by incorporating new data. Deterministic means identical inputs produce identical outputs, every time. Both conditions must hold simultaneously.

‍

Everything outside that boundary is excluded from GMP-critical workflows.

‍

Dynamic models, those that continue learning after deployment are explicitly excluded. Probabilistic models (those that may return different outputs for identical inputs) are excluded. Generative AI and large language models, including ChatGPT, Claude, and Gemini, are excluded from the GMP-critical path in all circumstances. Non-critical use of LLMs is permitted only where a qualified human reviews every output before it influences a GMP record.

"The document does not apply to Generative AI and Large Language Models (LLM), and such models should not be used in critical GMP applications." β€” Annex 22 Β§1

Operators evaluating AI tools for GMP-critical workflows have one threshold question: are weights frozen at deployment, and do identical inputs return identical outputs? Both must be confirmed in writing before any other evaluation begins. If the vendor cannot confirm both in writing, the conversation ends at Β§1.

‍

This also applies to tools already in use. If someone on your QA team is pasting batch deviations into a general-purpose LLM, even just for first-draft analysis that a human then reviews, that use needs to be assessed against Β§1 before 2027.

‍

What data do you need before deploying AI?

‍

Before any AI model can be deployed in a GMP-critical workflow, two things must exist that most operators don't currently have documented.

‍

First, a characterised training sample space. Section 3 of Annex 22 requires an intended-use description authored by a process subject-matter expert and approved before acceptance testing begins. That description must define the full range of inputs the model will encounter, including rare variations, edge cases, and subgroup stratification by strain, equipment, growth stage, operator, and environment. The data science team cannot write this document. It requires someone who knows the process.

‍

Second, documented baseline performance for the manual process being replaced. Section 4.3 states that the AI model's acceptance criteria must be at least as high as the performance of the process it replaces. To set those criteria, operators need documented reject rates, false-reject rates, and false-pass rates for the current manual workflow, gathered by a documented method, not estimated.

‍

Both of these requirements point at the same place: your existing batch and traceability records.

‍

The historical data needed to characterise a training sample space β€” lot-level attributes, equipment identifiers, operator records, environmental conditions, QC outcomes β€” exists in structured traceability systems. The baseline performance data Β§4.3 requires comes from documented batch records showing how the manual process has actually performed over time.

‍

Operators without structured traceability records face a significant upstream problem before they can begin AI model validation. They either reconstruct historical data from incomplete sources or start collecting it from scratch, which adds months to the compliance timeline before a single line of model code is written.

‍

This is one of the least-discussed practical implications of Annex 22. The conversation has focused on AI tool selection. The data foundation that makes AI tool validation possible has received almost no attention.

‍

What does Annex 22 actually require from operators?

‍

The majority of analysis published on Annex 22 has concentrated on Sections 4 through 7: intended use definitions, validation obligations, the LLM exclusion, statistical requirements for confidence scoring, test-data independence, and the four-eyes principle for small teams.

‍

That analysis is accurate. It addresses the AI system's obligations.

‍

Sections 8 through 10 are different. They do not define what the AI system must be able to do. They define what the operator must be able to prove. The compliance obligations in Β§8-Β§10 are not addressed to AI vendors. They are addressed to the operators running the GMP environment β€” and they resolve entirely into records infrastructure requirements.

‍

Β 

Β§8 β€” Explainability records. The factors contributing to an AI classification or decision must be identified and retained, not just generated. SHAP values, LIME outputs, and attention heatmaps are named as examples. These artefacts become part of the documented decision history. They cannot sit in a separate analytics environment with no enforced structural link to the batch record they relate to.

Β 

Β§9 β€” Confidence scores and human review documentation. Confidence scores must be captured at the point of output β€” not averaged across a run, not aggregated by batch. Where a threshold is not met and the decision escalates to human review, that review event must be documented at individual decision granularity: who reviewed, in what role, at what time, with what outcome. A batch-level electronic signature covering multiple decisions is not sufficient.

Β 

Β§10 β€” Model versioning, change control, and override retention. Every AI-influenced GMP decision must be traceable back to the exact model version that produced it. Changes to model configuration β€” including performance-monitoring-triggered retraining β€” must be documented. Every human override must be individually retained. The records must survive for the retention period applicable to the batch records they touch.

Β 

Taken together, Β§8-Β§10 require operators to answer one question on demand: starting from a batch number, reconstruct the complete AI-influenced decision pathway β€” confidence score, explainability artefact, model version, HITL approval β€” as it existed at the moment the decision was made.

‍

If any link in that chain is broken, the compliance picture collapses. Not because the AI tool failed its validation. Because the records infrastructure wasn't built to receive what the AI tool produced.

‍

What happens during an Annex 22 inspection?

‍

The Β§8-Β§10 requirements become concrete when you follow the sequence an inspector would follow.

‍

The inspector starts with a batch number. They want to examine an AI-influenced release decision made six months ago.

‍

Step one: retrieve the batch record and identify the AI-assisted decision event. This requires the records system to have captured the AI output as a structured, searchable record at the time of the decision β€” not a log entry, not a PDF attachment, a linked record with a retrievable decision identifier.

‍

Step two: surface the confidence score associated with that specific decision. Not the model's average confidence over that batch run. The value captured at the moment that particular output was generated, stored as a data field against that decision record.

‍

Step three: retrieve the explainability artefact β€” the SHAP output or equivalent β€” linked to that decision. The link must be structural, enforced by the records system. A naming convention or a manually maintained cross-reference does not meet Β§8.1.

‍

Step four: confirm which model version was running when the decision was made. The batch record must carry a model version identifier β€” a hash or equivalent β€” that can be matched against the change control log to confirm no updates occurred between the model version that produced this output and the one currently deployed.

‍

Step five: retrieve the HITL approval record if the confidence threshold was not met. Individual record: the reviewer, their role, the timestamp, the outcome.

‍

If any step requires the operator to leave the records environment, query a separate system, rebuild a timeline from log files, or reconstruct associations manually, the records infrastructure is not inspection-ready.

‍

Most EU GMP-compliant operators today have functioning batch records, working audit trails, and Annex 11-ready electronic signatures. What they almost certainly do not have is a records layer structured to store and link the four data types that Β§8-Β§10 introduce:

Β 

–  Β  Β  Confidence scores as structured batch-level data fields. Timestamped values attached to the specific decision event they describe.

–  Β  Β  Explainability artefacts with enforced batch linkage. SHAP outputs and their equivalents were not anticipated in legacy QMS architecture. Most systems have no native field or relationship type for them.

–  Β  Β  HITL approvals at individual decision granularity. Annex 11-style batch-level e-signatures cover the batch. Β§9 requires a discrete approval record per AI-escalated decision.

–  Β  Β  Model version identifiers tied to batch records. A records environment built around equipment IDs and SOPs has no equivalent for a model version hash.

Β 

‍

None of this is an AI vendor obligation. The AI tool produces the confidence score. The records infrastructure has to store it, link it, and make it retrievable. Those are two different systems. Closing the gap between them is the operator's responsibility.

‍

Who is responsible β€” you or your AI vendor?

‍

Chapter 4 Β§4.24 of the EU GMP guidelines places accountability for data processed with AI with the regulated user. The operator owns that obligation, regardless of which vendor supplied the tool.

‍

This has a direct consequence for how AI vendor contracts should be read.

‍

Most AI vendor agreements are structured around the vendor's product compliance, documentation of the model's validation, explainability architecture, confidence scoring methodology. They do not address the operator's records infrastructure obligations because those obligations are not the vendor's to address. Β§4.24 makes that explicit.

‍

An operator who signs an AI vendor agreement and assumes the vendor's compliance certification transfers to their GMP inspection readiness has misread Β§4.24. The vendor delivers a compliant AI tool. The operator is responsible for the records layer that receives it.

‍

An AI vendor claiming Annex 22 compliance has likely satisfied Β§1-Β§7: static model, validated, explainability-capable. Β§8-Β§10 are the operator's obligation. The vendor's certification says nothing about your records layer.

‍

Before signing any AI vendor agreement for a GMP-critical workflow, operators should be asking two questions the contract probably doesn't answer:

Does our records infrastructure currently support the storage and retrieval requirements of Β§8-Β§10, and if not, who is responsible for closing that gap before deployment?

‍

What needs to happen before 2027?

‍

The enforcement window is 2027-2028. That sounds like sufficient runway. It isn't, once the actual sequence of work is mapped out.

‍

Records infrastructure changes in a validated GMP environment move through a fixed sequence: gap assessment, change design, change control review, validation planning, IQ/OQ/PQ execution, user acceptance testing, go-live. For systems touching batch records, that sequence rarely completes in under 12 months. Eighteen months is typical. Compressed timelines require additional resource and carry validation risk.

‍

The sequence for an operator who wants to be inspection-ready in early 2027:

Β 

–  Β  Β  Now
Records infrastructure gap assessment.
Map your current records system against the five Β§8-Β§10 capabilities. Identify what can be configured, what requires development, and what requires a new system component. This assessment should run in parallel to any AI vendor evaluation currently underway β€” not after vendor selection.

‍

–  Β  Β  Q3-Q4 2025
Change design and validation planning.
For each identified gap, design the records infrastructure change and initiate the change control process. Validation planning for new or modified GMP system components begins here.

‍

–  Β  Β  2026
Validation execution.
IQ, OQ, and PQ for records infrastructure changes. User acceptance testing. Any iteration required by test outcomes extends this phase.

‍

–  Β  Β  Early 2027
AI tool integration into validated records layer.
Only at this point does AI tool selection and integration become the operational priority. Deploying an AI tool into a records infrastructure that hasn't completed this sequence does not produce inspection readiness.

‍

–  Β  Β  2027-2028
Inspection window.
Operators who completed the records validation cycle have something to show. Operators who started with AI tool selection and treated records infrastructure as a downstream concern do not.

Β 

The operators who navigate the 2027-2028 enforcement window without disruption will not necessarily be the ones who selected the best AI model. They will be the ones who assessed their records layer first, identified the gaps, and started the validation cycle early enough that the infrastructure was ready before enforcement arrived.

‍

The sequencing is the compliance strategy. Build or validate the records layer first. Then integrate AI into it.

‍

Where do you start?

‍

Most current AI evaluation processes begin with the AI tool and treat the records layer as a downstream integration problem. Annex 22 inverts that priority.

‍

The assessment that should be running now, in parallel to any AI vendor evaluation: can your records infrastructure fully reconstruct an AI-influenced GMP decision at batch level β€” confidence score, explainability artefact, model version, HITL approval β€” from a single starting point, under inspection conditions?

‍

If the answer is no, no AI tool you evaluate is ready to deploy into a GMP-critical workflow. The records gap is the compliance gap.

‍

Cannavigia's traceability platform maintains batch-level operational, quality, and compliance records as a connected, inspection-ready data layer across the full product lifecycle. The historical traceability data Β§3 and Β§4.3 require for AI model validation β€” lot attributes, equipment records, QC outcomes, operator history β€” is already structured and retrievable. The batch-level linkage architecture that Β§8-Β§10 depend on is the foundation the system was built on, not a feature added for Annex 22 readiness.

‍

For operators currently assessing where their records infrastructure sits against the Β§8-Β§10 requirements, that assessment starts with the batch record. Cannavigia is built around exactly that starting point.

Β Before your next AI vendor meeting, know what your records layer can and cannot do. Cannavigia's team can run that assessment with you.

‍

Book a consultation.

‍