VRI binds file integrity enforcement and knowledge traceability into a single enforceable layer. Two independently deployable components implement this stack. Integration is achieved at the artifact boundary.
Both components share a common cryptographic foundation in SHA-256 content addressing. They operate at different semantic layers.
Provenance engine. Cryptographic artifact anchoring with forensic change detection. Hardware-isolated signing.
Knowledge extraction engine. Claim-centric graph with source-sentence provenance. Sensitivity-gated retrieval.
Unified deployment. A claim is only valid if its source document is intact. A source document is only meaningful if its claims can be traced. Both conditions enforced at every point in the chain.
Three distinct trust zones govern data movement. Each boundary crossing is enforced by specific, named controls.
Both components are in active pre-SBIR development. This table characterizes current readiness so evaluators can calibrate.
The historical-newspaper OCR pipeline produces article-level text with structural provenance for Gemynd ingestion. On the hardest gold-quality examples, character-level recognition reaches roughly a 2% per-character miss rate; the residual aggregate error is attributable to layout segmentation rather than recognition, which localizes current development to the boundary-detection and reading-order stages.
| Component | Status | Scope |
|---|---|---|
| GraphRAG Ingest Pipeline | DEPLOYED | 5,400-document corpus, five extraction layers, production graph |
| GraphRAG Retrieval API | DEPLOYED | JWT auth, RBAC, sensitivity gating, rate limiting |
| OCR Segmentation Pipeline | FUNCTIONAL | Article-level extraction on gold-quality historical newspapers. Recognition at ~2% per-character miss rate; segmentation isolated as the residual error source. D-FINE-seg boundary detection, β-skeleton reading-order reasoner. |
| Aletheia Core (Python) | FUNCTIONAL | SHA-256 anchoring, ALBC barcoding, software Ed25519 signing |
| Hardware Signing Oracle | FUNCTIONAL | Odin binary on RPi 5, AOKF key format, audit log, wire protocol |
| Aletheia ↔ GraphRAG Integration | P2 ROADMAP | Anchor source docs before extraction, artifact_id on Document nodes |
| Salience-Aware Retrieval | P1 ROADMAP | Conversation log weight updates, template re-ranking |
The system has been deployed against a multi-corpus archive spanning the Turnbull National Wildlife Refuge documentary record and the Spokane historical newspaper corpus. Every claim in the graph traces to a source via an evidence relationship — source-traceability is an enforced property of the store, not a property of a subset.