ARIA

How ARIA evaluates its own AI

ARIA's compliance work uses Claude for narration and ingestion. An evaluation harness runs nightly that asserts invariants on each LLM-touching surface — quote integrity, derivability, citation resolution, PII pre-flight. The live posture below is the same data our ops team watches. ADR-014 governs the no-invention rule the harness enforces.

Loading live posture…

What we test for

Quote integrity

Every extracted quote appears verbatim in the source document.

The extractor can never fabricate evidence. If a quote isn't in the source, the harness flags it and the extraction is rejected before it reaches a reviewer.

Derivability gate

Narrations are split into clauses; unsupported clauses are dropped.

Every clause in a Pattern 1 narration has to map back to an input observation. Clauses without support never ship.

Citation resolution

Pattern 2 narrations cite observations via inline markers.

Each citation must resolve to a real observation we fed the model. Invented references are rejected by the parser.

PII pre-flight

Documents with HIPAA-class PII never reach the LLM.

A pre-flight scan blocks the upstream API call entirely. Blocked documents never leave the customer's tenancy.