Guardrails & Policies

Guardrails and Policies are the core mechanisms through which GovernanceAI enforces governance rules on AI applications.

What Are Guardrails?

Guardrails are intelligent filters that evaluate AI requests and responses against predefined rules in real-time. They act as the “gates” between your application and the LLM, ensuring that only safe, compliant interactions proceed.

Guardrail Types

1. Runtime Guardrails

Applied to standard LLM interactions in your application.

Flow:

Your App → Request Evaluation → LLM Call → Response Evaluation → User

Examples:

Block requests containing PII (credit cards, SSNs)
Filter responses containing toxic content
Enforce response length limits
Redact sensitive information
Rate limit by user or IP
Verify data classification tags

2. Agent Guardrails

Applied to multi-step AI agents that can take actions.

Flow:

Agent Loop
├─ Plan Generation → Evaluate against guardrails
├─ Tool Selection → Verify tool is approved
├─ Action Execution → Check for policy violations
├─ Observation → Sanitize outputs
└─ Loop back to step 1

Examples:

Control which tools an agent can use
Require human approval for destructive actions
Log all agent decisions for audit trails
Prevent access to restricted APIs
Enforce budget limits on external calls

Built-in Guardrail Rules

GovernanceAI includes common rules you can enable:

Rule	Type	Purpose
`block_toxic_content`	Runtime	Detect and block toxic/abusive language
`block_pii`	Runtime	Redact PII like SSNs, credit cards, emails
`enforce_classification_tags`	Runtime	Require data classification on inputs
`rate_limit`	Runtime	Limit requests by user/IP/org
`content_filter`	Runtime	Filter adult, violent, or illegal content
`jailbreak_detection`	Runtime	Detect and block jailbreak attempts
`agent_tool_control`	Agent	Restrict which tools agents can call
`agent_action_approval`	Agent	Require human approval for actions
`budget_limit`	Agent	Set spending limits on external calls
`output_sanitization`	Both	Sanitize outputs before returning

What Are Policies?

Policies are sets of guardrails organized by purpose and scope. They define how your organization governs AI usage.

Policy Structure

1 Policy: "Production LLM Governance"
2 ├─ Scope: All production environments
3 ├─ Guardrails:
4 │  ├─ block_toxic_content (severity: high)
5 │  ├─ block_pii (severity: high)
6 │  ├─ rate_limit (100 req/min per user)
7 │  └─ jailbreak_detection (severity: medium)
8 ├─ Overrides:
9 │  ├─ For admins: allow 500 req/min
10 │  ├─ For reports: skip pii block
11 └─ Audit: Log all decisions

Policy Scopes

Policies can be applied at different scopes:

Scope	Level	Use Case
Organization	Highest	Company-wide compliance rules
Workspace	Middle	Department or team rules
Application	Lower	App-specific guardrails
User	Lowest	Individual user overrides

Priority: Narrower scopes override broader scopes (user > app > workspace > org)

Creating Guardrails

Via Dashboard

Go to Guardrails section
Click Create Guardrail
Select rule type and configure parameters
Set severity level (low, medium, high, critical)
Add description for team reference
Click Save

Via API

$ curl -X POST https://api.governanceai.com/v1/guardrails/create \
>   -H "Authorization: Bearer $API_KEY" \
>   -H "Content-Type: application/json" \
>   -d '{
>     "name": "Block Toxic Content",
>     "rule_type": "block_toxic_content",
>     "severity": "high",
>     "enabled": true,
>     "config": {
>       "toxicity_threshold": 0.8,
>       "action": "block",
>       "log_violations": true
>     },
>     "description": "Blocks responses with toxic language"
>   }'

Creating Policies

Via Dashboard

Go to Policies section
Click Create Policy
Enter policy name and description
Select scope (organization, workspace, or application)
Add guardrails:
- Select existing guardrails
- Or create new ones
- Set priority if multiple rules apply
Configure overrides (optional)
Set rollout strategy (immediate, staged, or scheduled)
Click Create

Via API

$ curl -X POST https://api.governanceai.com/v1/policies/create \
>   -H "Authorization: Bearer $API_KEY" \
>   -H "Content-Type: application/json" \
>   -d '{
>     "name": "Production LLM Governance",
>     "description": "Company-wide LLM safety policy",
>     "scope": "organization",
>     "guardrail_ids": [
>       "guardrail_toxic_123",
>       "guardrail_pii_456",
>       "guardrail_rate_limit_789"
>     ],
>     "enforcement": {
>       "mode": "blocking",
>       "log_all_evaluations": true,
>       "alert_on_violation": true
>     },
>     "overrides": [
>       {
>         "user_group": "admins",
>         "guardrails_disabled": ["guardrail_rate_limit_789"]
>       }
>     ]
>   }'

Guardrail Evaluation Flow

Detailed Evaluation Process

- Request arrives
   ├─ Extract metadata (user, org, context)
   └─ Check if guardrails apply to this request
- Load applicable policies
   ├─ Query organization policy
   ├─ Check workspace policy
   ├─ Check application policy
   └─ Merge with priority (narrowest scope wins)
- Evaluate each guardrail in sequence
   ├─ Run rule logic
   ├─ Generate violation data
   ├─ Determine action (allow/block/transform)
   └─ Accumulate risk score
- Make decision
   ├─ If high-severity violation → Block
   ├─ If medium-severity → Transform or log
   ├─ If low-severity → Log only
   └─ Calculate overall risk score
- Execute action
   ├─ If block → Return 403 Forbidden
   ├─ If transform → Return transformed content
   ├─ If allow → Continue to LLM
   └─ Log decision for audit trail
- Return response to application
   └─ Include decision, violations, risk score

Real Example

Request:

1 {
2   "messages": [{"role": "user", "content": "My SSN is 123-45-6789"}],
3   "context": {"org_id": "org_123", "user_id": "user_456"}
4 }

Evaluation:

- Check organization policy: "Production Governance"
   ├─ Guardrail 1: block_toxic_content → Pass (no toxic content)
   ├─ Guardrail 2: block_pii
   │  └─ VIOLATION: SSN detected
   │     ├─ Severity: High
   │     ├─ Action: Block
   │     └─ Risk score: 0.95
   └─ Result: BLOCK

Response:

1 {
2   "decision": "block",
3   "policy_violations": [
4     {
5       "guardrail_id": "guardrail_pii_456",
6       "guardrail_name": "block_pii",
7       "severity": "high",
8       "violation_type": "pii_detected",
9       "detected_pii": ["ssn"]
10     }
11   ],
12   "risk_score": 0.95,
13   "action": "Blocked due to PII detection"
14 }

Policy Versioning

Policies are versioned to track changes and enable rollback.

Policy Version History:
├─ v1.0 (Jan 1, 2024) - Initial policy
│  └─ Guardrails: toxic_content, rate_limit
├─ v1.1 (Jan 15, 2024) - Added PII block
│  └─ Guardrails: toxic_content, pii, rate_limit
├─ v2.0 (Feb 1, 2024) - Rebranded
│  └─ Guardrails: toxic_content, pii, rate_limit, jailbreak_detection
└─ v2.1 (Current) - Tightened rate limits
   └─ Guardrails: toxic_content, pii, rate_limit (100→50), jailbreak_detection

Rollback Example:

$ # Rollback to v1.1
$ curl -X POST https://api.governanceai.com/v1/policies/rollback \
>   -H "Authorization: Bearer $API_KEY" \
>   -d '{"policy_id": "policy_123", "version": "v1.1"}'

Policy Rollout Strategies

Immediate

Policy takes effect instantly for all users.

├─ All users immediately → New policy

Pros: Complete control, simple Cons: Risk of disruption

Canary (Recommended)

Roll out to small subset first, then expand.

Day 1-3:
├─ 5% of users → New policy
├─ 95% of users → Old policy
Day 4-6:
├─ 25% of users → New policy
├─ 75% of users → Old policy
Day 7+:
├─ 100% of users → New policy

Pros: Detect issues early, minimize risk Cons: Requires monitoring

Scheduled

Activate at specific time.

├─ Now until Jan 15 → Old policy
├─ Jan 15 at 2 AM UTC → Switch to new policy

Pros: Control timing, notify users Cons: Single point of failure

Monitoring & Debugging

View Policy Evaluations

$ curl https://api.governanceai.com/v1/audit/logs \
>   -H "Authorization: Bearer $API_KEY" \
>   -H "X-Filter: resource_type=policy,action=evaluate" \
>   -H "X-Limit: 100"

Enable Policy Debugging

In Dashboard:

Go to Policies → Select policy
Click ⋯ (More) → Debug Mode
Set logging level: DEBUG, INFO, or ERROR
Policy now logs every evaluation detail

Common Issues

“No applicable policies found”

Verify policy scope matches request context
Check organization/workspace IDs

“Policy evaluation timeout”

Policy has too many complex rules
Optimize or split into multiple policies
Contact support for performance tuning

Best Practices

✅ Do:

Start with pre-built rules and customize
Use policy versioning for changes
Test in staging before production rollout
Monitor evaluation metrics regularly
Document policy decisions for compliance
Review policies quarterly

❌ Don’t:

Create overly complex policies with many rules
Apply broad policies without understanding impact
Forget to test policy interactions
Ignore policy evaluation metrics
Make policy changes without version control

Next Steps

Setting Up Guardrails - Practical guardrail setup
Creating Policies - Policy creation walkthrough
Core Concepts - Compliance Frameworks - Compliance-focused policies
API Reference - Policy and guardrail endpoints