corpus semantic registry
documents define.
the registry identifies.
a registry layer over a corpus of documents: one symbol, one definition home, one content-hashed identity. edit a pinned source and every dependent claim is flagged in ci until it is re-verified and re-pinned.
three live registries
compiled from source and rendered. each is a browsable index: symbols, definition homes, dependency edges, verification states, and the collisions and aliases the registry resolves.
fold real paper
the claim-status table of a mathematics preprint (v67), maintained as a registry with dependency edges and sha256 pins against the pdf. change a byte of the paper and every affected claim is flagged until re-verified.
relay worked example
an auth design doc and an api spec. two teams used session for different things; the registry records the collision and its resolution. api_key was renamed access_token; the alias keeps old references resolving.
gambit worked example
a card-game rulebook vs. its engine: the same machinery on prose-vs-code drift. capture collides between the two; the registry resolves it on the rulebook sense.
how drift detection works
for corpora where documents, code, and ai agents all touch the same vocabulary: research frameworks, spec-driven codebases, long-lived design docs.
1register
documents own the prose and the definitions. the registry records each symbol, its one definition home, and its dependency edges.
2pin
every definition home gets a sha256. the registry compiles to a lockfile, a browsable wiki, a dependency graph, and a validation report with 20+ typed error codes.
3detect
any change to a pinned source flags every dependent symbol with a CSR004 drift diagnostic until it is re-verified and re-pinned.
$ echo >> examples/relay/docs/auth_design_v1.md
$ python3 tools/csr.py --root examples/relay/csr build
CSR004 hash_drift: symbol csr.Auth.session: source changed after hash
pinning (pinned sha256:6dfd2f.. != current sha256:e42753..)
this exact check runs in ci on every commit (the demo is real). these pages rebuild from source weekly and on every push.
part of a verification stack
csr handles identity. its siblings handle provenance and proof-term evidence. each is usable on its own.
#introspect: proof-term dag + leakage report (sorry, mvars, dependency surface) as json