§ 1
kettle_vri / technical_overview

Verifiable Record Infrastructure — Technical Overview

VRI binds file integrity enforcement and knowledge traceability into a single enforceable layer. Two independently deployable components implement this stack. Integration is achieved at the artifact boundary.

SHA-256 Content Addressing Ed25519 Identity Signing Hardware Key Custody Five-Layer Graph Retrieval On-Premise Deployment
§ 2
kettle_vri / component_map

Component architecture

Both components share a common cryptographic foundation in SHA-256 content addressing. They operate at different semantic layers.

Wreon File Layer

Provenance engine. Cryptographic artifact anchoring with forensic change detection. Hardware-isolated signing.

  • SHA-256 content-addressed artifact identity
  • Entropy-based forensic barcoding (ALBC v1/v2)
  • Ed25519 signing via hardware oracle (Argon2id + ChaCha20-Poly1305)
  • Append-only access log
  • Cryptographic custody chain
  • Byte-level change localization on mismatch
  • Deterministic ingest
  • Re-anchor is idempotent
Gemynd Knowledge Layer

Knowledge extraction engine. Claim-centric graph with source-sentence provenance. Sensitivity-gated retrieval.

  • Five-layer extraction (OCR → structure → semantics → graph → retrieval)
  • Claims traced to exact source sentence, paragraph, page
  • Pre-graph sensitivity gate (PII, indigenous, living-person)
  • RBAC at Cypher layer
  • JWT auth with token versioning
  • Prompt injection defense
  • PII redaction on output
  • Idempotent graph writes (UNWIND MERGE)
Kettle Core

Unified deployment. A claim is only valid if its source document is intact. A source document is only meaningful if its claims can be traced. Both conditions enforced at every point in the chain.

§ 3
kettle_vri / trust_boundary_enforcement

Trust boundary enforcement

Three distinct trust zones govern data movement. Each boundary crossing is enforced by specific, named controls.

Zone A — Source Documents Ingest Boundary ↓
  • Path traversal guard (canonicalized, relative_to CWD)
  • Pre-graph sensitivity gate (auto-quarantine)
  • Idempotent writes (UNWIND MERGE, config fingerprint)
  • Aletheia provenance anchor (integration P2)
Zone B — Graph Store Retrieval Boundary ↓
  • JWT authentication (token versioning, revocation)
  • RBAC at Cypher layer (server-side only)
  • Quarantine exclusion at context assembly
  • Prompt injection defense (XML escaping, regex)
  • PII redaction on synthesis output
  • Rate limiter (20 req/60s sliding window)
Zone C — Client / Retrieval API
  • All answers carry supporting_claim_ids
  • Claims traceable to source sentence
  • Source documents verifiable against anchor
  • On-premise deployment
  • No data leaves the network
ENFORCED    IN PROGRESS
§ 4
kettle_vri / system_maturity

Component maturity

Both components are in active pre-SBIR development. This table characterizes current readiness so evaluators can calibrate.

The historical-newspaper OCR pipeline produces article-level text with structural provenance for Gemynd ingestion. On the hardest gold-quality examples, character-level recognition reaches roughly a 2% per-character miss rate; the residual aggregate error is attributable to layout segmentation rather than recognition, which localizes current development to the boundary-detection and reading-order stages.

ComponentStatusScope
GraphRAG Ingest PipelineDEPLOYED5,400-document corpus, five extraction layers, production graph
GraphRAG Retrieval APIDEPLOYEDJWT auth, RBAC, sensitivity gating, rate limiting
OCR Segmentation PipelineFUNCTIONALArticle-level extraction on gold-quality historical newspapers. Recognition at ~2% per-character miss rate; segmentation isolated as the residual error source. D-FINE-seg boundary detection, β-skeleton reading-order reasoner.
Aletheia Core (Python)FUNCTIONALSHA-256 anchoring, ALBC barcoding, software Ed25519 signing
Hardware Signing OracleFUNCTIONALOdin binary on RPi 5, AOKF key format, audit log, wire protocol
Aletheia ↔ GraphRAG IntegrationP2 ROADMAPAnchor source docs before extraction, artifact_id on Document nodes
Salience-Aware RetrievalP1 ROADMAPConversation log weight updates, template re-ranking
§ 5
kettle_vri / deployment_proof

Active deployment: Turnbull NWR & Spokane corpora

The system has been deployed against a multi-corpus archive spanning the Turnbull National Wildlife Refuge documentary record and the Spokane historical newspaper corpus. Every claim in the graph traces to a source via an evidence relationship — source-traceability is an enforced property of the store, not a property of a subset.

5,400
Documents
~355K
Graph Nodes
~830K
Relationships
56,499
Claims, all source-evidenced