How risky is the code your AI just wrote?
CI/CD pipelines treat all AI-generated code equally — a one-line docstring edit and a 500-line autonomous refactor get the same review process. PRISM quantifies risk from VIBES audit signals, so your pipeline can tell the difference.
Built on VIBES audit data · Attested via VERIFY · Feeds EVOLVE learning
For a plain-language overview, see PRISM for Users →
AI-generated code is flooding into production at a pace human reviewers cannot match. But not all AI-generated code carries the same risk. A model adding a docstring is fundamentally different from a model autonomously creating an authentication handler — yet most teams have no way to distinguish between them at review time.
Without quantified risk:
VIBES already captures the signals that risk assessment needs — action types, scope, assurance levels, review status, temperature. PRISM defines how to combine those signals into a single quantified score that your CI/CD pipeline can act on.
PRISM (Provenance & Risk Intelligence Scoring Model) is a standalone risk scoring extension built on VIBES audit data. Every annotation in a VIBES audit trail carries contextual signals — what kind of action was taken, how large the change was, what assurance level was configured versus what was actually recorded, whether human review occurred. PRISM combines these signals into a single 0.0–1.0 score.
PRISM is a framework, not a fixed formula. The reference algorithm below uses a weighted average, but implementors are free to substitute their own scoring models as long as the output conforms to the 0.0–1.0 range and the risk_factors array provides transparency into which signals drove the score.
PRISM integrates with VERIFY for attested risk scores and with EVOLVE for agent learning feedback — but operates independently as its own extension.
PRISM scores map to four severity bands, each with a recommended action for CI/CD pipeline integration.
| Band | Range | Meaning | Recommended Action |
|---|---|---|---|
| Low | 0.00 – 0.29 | Routine change with minimal risk signals | Auto-merge permitted |
| Medium | 0.30 – 0.59 | Moderate risk — larger scope or assurance gap | Flag for review; require approval |
| High | 0.60 – 0.79 | Significant risk — complex change or missing review | Block merge; require senior review |
| Critical | 0.80 – 1.00 | Extreme risk — large unreviewed creation at high temperature | Block merge; escalate to security team |
The following signals are available for PRISM computation. Each signal produces a normalized 0.0–1.0 value and carries an implementor-defined weight.
temperature — Model sampling temperature at generation time. Higher temperatures increase output randomness and reduce reproducibility. optionalaction_type — Whether the change was a create, modify, or review. New file creation carries higher risk than modification of existing, reviewed code. requiredscope_lines — Total line count of the annotated change. Larger changes have more surface area for defects. requiredassurance_gap — Difference between the configured assurance level and the actual assurance data recorded. A project configured for High assurance that only captured Low-level data has a significant gap. optionalreview_status — Whether the change has been human-reviewed. Unreviewed AI-generated code carries inherently higher risk. optionalprompt_complexity — Estimated complexity of the prompt that generated the change, derived from token count and instruction structure. Available at Medium assurance and above. optionalThe reference PRISM computation is a weighted average of available signals. Implementors may substitute any scoring model as long as the output conforms to the 0.0–1.0 range and provides a transparent risk_factors array.
PRISM data is stored directly on VIBES annotation records using two optional fields: risk_score (the computed 0.0–1.0 value) and risk_factors (an array of signal assessments providing transparency into the score).
PRISM scores are most powerful when they drive automated pipeline decisions. Rather than treating every AI-generated change identically, teams can set thresholds that gate merges based on quantified risk — low-risk changes flow through automatically while high-risk changes require human review.
The vibecheck CLI provides built-in commands for PRISM evaluation. Run these in your CI pipeline to enforce risk-based gating without custom scripting.
Run vibecheck risk in your project directory. This scans the .ai-audit/annotations.jsonl file, computes PRISM scores for every annotation that has signal data, and outputs a summary with per-file scores and an aggregate project score.
Use vibecheck risk --threshold 0.6 --ci to fail the pipeline if any annotation exceeds the threshold. The --ci flag sets the exit code to non-zero on threshold violation, making it compatible with any CI system that checks exit codes.
Use vibecheck risk --format json to produce machine-readable output. Pipe this into your PR bot or GitHub Action to post a risk summary comment on every pull request, giving reviewers immediate visibility into which files carry elevated risk.
For high-stakes repositories, set a hard gate: vibecheck risk --threshold 0.8 --ci --fail-on critical. Any annotation in the Critical band (PRISM ≥ 0.80) blocks the merge and triggers an escalation notification to the security team.
A minimal GitHub Actions step that blocks merges where any annotation exceeds the High severity threshold:
The --format json flag produces structured output suitable for dashboards, PR bots, and downstream analysis tools.
Build a custom policy engine. PRISM is a framework, not a fixed formula. If your organization has domain-specific risk factors — compliance requirements, security-sensitive file paths, model allowlists — you can build a custom scoring model that incorporates them. As long as the output conforms to the 0.0–1.0 range and provides transparent risk_factors, your policy engine is a first-class PRISM implementation. See the Implementors Guide for integration details.
PRISM is one of four complementary standards in the VIBES ecosystem.
PRISM computes risk scores from VIBES audit data. Risk scores can be cryptographically attested via VERIFY, and feed into EVOLVE agent learning pipelines as quantitative signal.
Whether you want to join the community, build risk infrastructure, or champion adoption — there's a path for you.
Contribute to the PRISM specification, propose new risk signals, and help refine severity band thresholds for real-world pipelines.
Get involved →Integrate PRISM scoring into your CI/CD pipeline, build a custom policy engine with domain-specific risk factors, or add risk visualization to your developer tooling.
Implementation guide →Define risk policies for your organization. Make the case for quantified, automated risk gating on AI-generated code — before a high-risk change slips through unnoticed.
Resources →