Interpretive truth: Verifiably constrained AI judges for arbitration at scale
Questions like “Is player X more valuable to team A than player Y is to team B?” are “interpretive” ie, there is no universally correct answer but there can be an accepted answer given a“ decision framework” and a bounded set of “evidence”. If a rational observer accepts the framework & evidence, they can accept the answer.
Most arbitration between humans, and soon AI agents are interpretive:
- “Is the work portfolio good enough to enter our freelance marketplace?”
- “Does this content violate our privacy policy?”
- “Was the promised set of deliverables and SLAs not met?”
In the world of AI agents transacting billions of times every second, there arises the need to resolve interpretive judgements at scale. The solution is AI judges that:
- adjudicate on open “decision frameworks” & “evidence”
- produce a reasoning trace that is verifiable by any observer
- are hardened against evidence poisoning attacks
Near-term: Interpretive prediction markets
Is Erling Haaland more valuable to Man City than Kylian Mbappé is to Real Madrid by end of the season?
Tier 1 evidence is decisive. Haaland accounts for 41% of City's PL goals vs Mbappé's 33% at Madrid, and City's scoring rate drops ~40% in Haaland's off-minutes vs Madrid's ~10% in Mbappé's. Tier 2 scout analysis (McGuire, Lowe) agrees: Haaland is structurally load-bearing in a struggling system; Mbappé's individual brilliance has not translated into collective value. Manager quotes from both sides are consistent but not load-bearing under the framework — Pep and Arbeloa have direct conflicts of interest.
Accepting a framework allows accepting an AI verdict
Declare the framework
IPFSThe framework is the rule of law for the market. A content-addressed tarball bundling: the rules and instructions the AI follows when interpreting evidence, the evidence types it will accept, the kinds of questions it can adjudicate, and backtests of example cases so its behaviour is auditable before live use.
manifest.json{ "name": "football-player-value-v1", "version": "2.1.0", "applicableTo": ["player_value", "player_ranking"], "description": "Tiered evidence dossier: primary on/off splits + team-share metrics, secondary independent scout analysis, tertiary decorative quotes.", "model": { "id": "zai-org/GLM-4.7-FP8", "sampling": { "temperature": 0, "topP": 1.0, "seed": 0, "maxTokens": 2000 } }, "evidenceSchema": "schemas/dossierV1.json", "outputSchema": { "outcome": { "enum": [0, 1, 2] }, "confidence": { "type": "number", "min": 0, "max": 1 }, "rationale": { "type": "string", "maxLength": 2400 }, "citations": { "type": "array", "items": "string" } }, "promptTemplate": { "system": "framework.md", "userTemplate": "Question: {{question}}\n\nDossier:\n{{evidence_json}}" } }
framework.md# football-player-value-v1 (v2.1.0) "Value to club" is not the same as "transfer market value." This is the interpretive heart of the question: reasonable analysts disagree on the weighting. The framework declares its stance up front. Three dimensions, in priority order — they compound, they do not add: 1. Productive output goals, assists, chances created, defensive contribution, save percentage. Position-adjusted, per-90. 2. Irreplaceability on/off team output. Structural dependence ("the system runs through them"). Absence of like-for-like alternatives. 3. Ceiling and durability age, contract length, injury history. Realistic horizon of contribution, not just one match. ## Evidence hierarchy (new in v1.3.0) Tier 1 (primary, weight heavily) on/off splits, substitution patterns, team-share metrics, big-game start rate, trophies/standings differential. Things the player cannot author through PR. Tier 2 (secondary, weight moderately) independent scout notes, peer-club bids actually received, salary % of total wage bill (what the wallet says, not the mouth). Tier 3 (decorative, low weight) manager quotes, fan sentiment, media narratives. Included for colour, never load-bearing. When Tier 1 and Tier 3 disagree, Tier 1 wins. How to weigh signals: - In-form play > season aggregates - Transfer-market value is a sanity check, not a primary signal - Long-form context_notes are load-bearing — read them Edge cases — when signals conflict: - Stats vs. narrative → prefer the side that explains more of the dossier - Output vs. system player → lean system player when on/off splits agree - Established vs. emerging → weight present output over projection Hard rules: - Do not import facts outside the dossier - If a referenced player isn't described, return outcome: 2 - If confidence falls below 0.55, prefer outcome: 2 - If reasoning leans on Tier 3, cap confidence at 0.65 Return a single JSON object: { "outcome": 1, "confidence": 0.72, "rationale": "2–4 sentence explanation, declaring which tier drove the call.", "citations": ["Pedri.team_share", "Pedri.on_off_splits", …] } Citation conventions: - Cite per subject. Include at least one stats path per subject. - Order: team_share / on_off_splits → stats → match_reports → scout_notes → manager_quotes → context_notes - Aim for 8–16 citations — confidence signals breadth, citations signal weight. More citations when multiple Tier 1 fields contribute orthogonally. Keep the rationale terse — it is part of the on-chain verdict and will be replayed for verification. Extra prose changes the hash.
Register framework + judge on-chain
chainFrameworks register against FrameworkRegistry. Judges (images that will execute the resolution) register their Ritual TEE-derived signer against JudgeRegistry. Both are append-only; once a judge image is registered, only its attested key can sign for it.
Create the market
chainA market binds a question to a framework, a source allowlist, the dossier subjects, and a resolution time. All rules are frozen before the AI ever sees the data.
$ manager/create-marketnpm run create-market market.json # market.json { "question": "Is Erling Haaland more valuable to Man City than Kylian Mbappé is to Real Madrid by end of the season?", "frameworkId": "0x572f174004cb7791ebb89118750af59e2c7ac93e…", "sourceAllowlist": ["https://fbref.com/", "https://www.premierleague.com/", …], "dossierSubjects": ["Erling Haaland", "Kylian Mbappe"], "model": "zai-org/GLM-4.7-FP8", "resolutionTime": 1748152800 }
Judge resolves with pinned inference on Ritual
ritualThe judge runs inside Ritual's TEE: it pulls the framework from IPFS, verifies its hash, fetches the dossier, and adjudicates via a Ritual L1 LLM-precompile call against zai-org/GLM-4.7-FP8 with pinned model, sampling, and seed. Verdict matches the framework's output schema: outcome, confidence, rationale, citations.
verdict.json{ "outcome": 1, "confidence": 0.74, "rationale": "Tier 1 evidence is decisive. Haaland accounts for 41% of City's PL goals vs Mbappé's 33% at Madrid, and City's scoring rate drops ~40% in Haaland's off-minutes vs Madrid's ~10% in Mbappé's. Tier 2 scout analysis (McGuire, Lowe) agrees. Manager quotes from both sides are consistent but not load-bearing under the framework — Pep and Arbeloa have direct conflicts of interest.", "citations": [ "Erling Haaland.team_share", "Kylian Mbappe.team_share", "Erling Haaland.on_off_splits", "Kylian Mbappe.on_off_splits", "Erling Haaland.substitution_patterns", "Kylian Mbappe.substitution_patterns", "Erling Haaland.contract_signals", "Kylian Mbappe.contract_signals", "Erling Haaland.stats.season.notes", "Kylian Mbappe.stats.season.notes", "Erling Haaland.scout_notes[0]", "Kylian Mbappe.scout_notes[0]", "Erling Haaland.match_reports[0]", "Kylian Mbappe.match_reports[1]", "Kylian Mbappe.scout_notes[1]", "Erling Haaland.manager_quotes[1]" ] }
pinned inference{ "model": "zai-org/GLM-4.7-FP8", "sampling": { "temperature": 0, "topP": 1.0, "seed": 0, "maxTokens": 2000 } } // pinned model + sampling, executed by the // Ritual L1 LLM precompile inside a TEE — // the attested executor key signs the verdict.
Verifiable forever
chainThe judge pins a full re-execution bundle to IPFS (framework + dossier + prompt + raw model response + citations), signs the verdict digest with its TEE-attested executor key, and calls Market.resolve(). Anyone can now fetch the bundle, recompute every hash, and check the signature recovers to the executor registered for the judge image on-chain. No backend trusted at any step.
bundle.json (pinned to IPFS){ "marketId": 1, "frameworkTarballSha256": "0x9c7a4f3b2e1d8a05…", "notarizedData": { "raw": {…dossier…}, "sourceUri": "https://…", "fetchedAt": 1748441727 }, "prompt": { "system": "…framework.md…", "user": "…", "assembledSha256": "0x4e1a772b09cf38d2" }, "judge": { "model": "zai-org/GLM-4.7-FP8", "sampling": { "temperature": 0, "seed": 0, … } }, "verdictPayload": { "outcome": 1, "confidence": 0.74, "rationale": "…", "citations": [16] }, "onChainVerdict": { "outcome": 1, "confidence": "740000000000000000", "verdictHash": "0xab12cd34…" }, "attestation": { "executor": "0x9dc11412391Dc3ED…", "signature": "0x…", "chain": "ritual" } }
The arc of interpretive truth.
In a world of sovereign AI agents that run their own companies and hold property, the same interpretive disputes we arbitrate today need to be arbitrated on the internet. Networks vote on, fork, and remix decision frameworks the way they vote on protocol upgrades.
One TEE-attested signer per judge image, with its public track record.
Eligibility gate
An agent applies to a marketplace tier that requires demonstrated competence in security audits. The judge runs the competence framework over its work-samples.
Rejected deliverable
Agent A rejects B's logo as off-brief; B wants to get paid. The judge runs the decision framework over the brief and the delivered files.
zai-org/GLM-4.7-FP8 via Ritual LLM precompile
IPFS (Pinata) · Postgres