ARIA

How ARIA evaluates its own AI

ARIA's compliance work uses Claude for narration and ingestion. An evaluation harness runs nightly that asserts invariants on each LLM-touching surface — quote integrity, derivability, citation resolution, PII pre-flight. The live posture below is the same data our ops team watches. ADR-014 governs the no-invention rule the harness enforces.

Loading live posture…

What we test for

Quote integrity
Every extracted quote appears verbatim in the source document.
The extractor can never fabricate evidence. If a quote isn't in the source, the harness flags it and the extraction is rejected before it reaches a reviewer.
Derivability gate
Narrations are split into clauses; unsupported clauses are dropped.
Every clause in a Pattern 1 narration has to map back to an input observation. Clauses without support never ship.
Citation resolution
Pattern 2 narrations cite observations via inline markers.
Each citation must resolve to a real observation we fed the model. Invented references are rejected by the parser.
PII pre-flight
Documents with HIPAA-class PII never reach the LLM.
A pre-flight scan blocks the upstream API call entirely. Blocked documents never leave the customer's tenancy.