Guardrails & Policies

Guardrails & Policies

Guardrails and Policies are the core mechanisms through which GovernanceAI enforces governance rules on AI applications.

What Are Guardrails?

Guardrails are intelligent filters that evaluate AI requests and responses against predefined rules in real-time. They act as the “gates” between your application and the LLM, ensuring that only safe, compliant interactions proceed.

Guardrail Types

1. Runtime Guardrails

Applied to standard LLM interactions in your application.

Flow:

Your App → Request Evaluation → LLM Call → Response Evaluation → User

Examples:

  • Block requests containing PII (credit cards, SSNs)
  • Filter responses containing toxic content
  • Enforce response length limits
  • Redact sensitive information
  • Rate limit by user or IP
  • Verify data classification tags

2. Agent Guardrails

Applied to multi-step AI agents that can take actions.

Flow:

Agent Loop
├─ Plan Generation → Evaluate against guardrails
├─ Tool Selection → Verify tool is approved
├─ Action Execution → Check for policy violations
├─ Observation → Sanitize outputs
└─ Loop back to step 1

Examples:

  • Control which tools an agent can use
  • Require human approval for destructive actions
  • Log all agent decisions for audit trails
  • Prevent access to restricted APIs
  • Enforce budget limits on external calls

Built-in Guardrail Rules

GovernanceAI includes common rules you can enable:

RuleTypePurpose
block_toxic_contentRuntimeDetect and block toxic/abusive language
block_piiRuntimeRedact PII like SSNs, credit cards, emails
enforce_classification_tagsRuntimeRequire data classification on inputs
rate_limitRuntimeLimit requests by user/IP/org
content_filterRuntimeFilter adult, violent, or illegal content
jailbreak_detectionRuntimeDetect and block jailbreak attempts
agent_tool_controlAgentRestrict which tools agents can call
agent_action_approvalAgentRequire human approval for actions
budget_limitAgentSet spending limits on external calls
output_sanitizationBothSanitize outputs before returning

What Are Policies?

Policies are sets of guardrails organized by purpose and scope. They define how your organization governs AI usage.

Policy Structure

1Policy: "Production LLM Governance"
2├─ Scope: All production environments
3├─ Guardrails:
4│ ├─ block_toxic_content (severity: high)
5│ ├─ block_pii (severity: high)
6│ ├─ rate_limit (100 req/min per user)
7│ └─ jailbreak_detection (severity: medium)
8├─ Overrides:
9│ ├─ For admins: allow 500 req/min
10│ ├─ For reports: skip pii block
11└─ Audit: Log all decisions

Policy Scopes

Policies can be applied at different scopes:

ScopeLevelUse Case
OrganizationHighestCompany-wide compliance rules
WorkspaceMiddleDepartment or team rules
ApplicationLowerApp-specific guardrails
UserLowestIndividual user overrides

Priority: Narrower scopes override broader scopes (user > app > workspace > org)

Creating Guardrails

Via Dashboard

  • Go to Guardrails section
  • Click Create Guardrail
  • Select rule type and configure parameters
  • Set severity level (low, medium, high, critical)
  • Add description for team reference
  • Click Save

Via API

$curl -X POST https://api.governanceai.com/v1/guardrails/create \
> -H "Authorization: Bearer $API_KEY" \
> -H "Content-Type: application/json" \
> -d '{
> "name": "Block Toxic Content",
> "rule_type": "block_toxic_content",
> "severity": "high",
> "enabled": true,
> "config": {
> "toxicity_threshold": 0.8,
> "action": "block",
> "log_violations": true
> },
> "description": "Blocks responses with toxic language"
> }'

Creating Policies

Via Dashboard

  • Go to Policies section
  • Click Create Policy
  • Enter policy name and description
  • Select scope (organization, workspace, or application)
  • Add guardrails:
    • Select existing guardrails
    • Or create new ones
    • Set priority if multiple rules apply
  • Configure overrides (optional)
  • Set rollout strategy (immediate, staged, or scheduled)
  • Click Create

Via API

$curl -X POST https://api.governanceai.com/v1/policies/create \
> -H "Authorization: Bearer $API_KEY" \
> -H "Content-Type: application/json" \
> -d '{
> "name": "Production LLM Governance",
> "description": "Company-wide LLM safety policy",
> "scope": "organization",
> "guardrail_ids": [
> "guardrail_toxic_123",
> "guardrail_pii_456",
> "guardrail_rate_limit_789"
> ],
> "enforcement": {
> "mode": "blocking",
> "log_all_evaluations": true,
> "alert_on_violation": true
> },
> "overrides": [
> {
> "user_group": "admins",
> "guardrails_disabled": ["guardrail_rate_limit_789"]
> }
> ]
> }'

Guardrail Evaluation Flow

Detailed Evaluation Process

- Request arrives
├─ Extract metadata (user, org, context)
└─ Check if guardrails apply to this request
- Load applicable policies
├─ Query organization policy
├─ Check workspace policy
├─ Check application policy
└─ Merge with priority (narrowest scope wins)
- Evaluate each guardrail in sequence
├─ Run rule logic
├─ Generate violation data
├─ Determine action (allow/block/transform)
└─ Accumulate risk score
- Make decision
├─ If high-severity violation → Block
├─ If medium-severity → Transform or log
├─ If low-severity → Log only
└─ Calculate overall risk score
- Execute action
├─ If block → Return 403 Forbidden
├─ If transform → Return transformed content
├─ If allow → Continue to LLM
└─ Log decision for audit trail
- Return response to application
└─ Include decision, violations, risk score

Real Example

Request:

1{
2 "messages": [{"role": "user", "content": "My SSN is 123-45-6789"}],
3 "context": {"org_id": "org_123", "user_id": "user_456"}
4}

Evaluation:

- Check organization policy: "Production Governance"
├─ Guardrail 1: block_toxic_content → Pass (no toxic content)
├─ Guardrail 2: block_pii
│ └─ VIOLATION: SSN detected
│ ├─ Severity: High
│ ├─ Action: Block
│ └─ Risk score: 0.95
└─ Result: BLOCK

Response:

1{
2 "decision": "block",
3 "policy_violations": [
4 {
5 "guardrail_id": "guardrail_pii_456",
6 "guardrail_name": "block_pii",
7 "severity": "high",
8 "violation_type": "pii_detected",
9 "detected_pii": ["ssn"]
10 }
11 ],
12 "risk_score": 0.95,
13 "action": "Blocked due to PII detection"
14}

Policy Versioning

Policies are versioned to track changes and enable rollback.

Policy Version History:
├─ v1.0 (Jan 1, 2024) - Initial policy
│ └─ Guardrails: toxic_content, rate_limit
├─ v1.1 (Jan 15, 2024) - Added PII block
│ └─ Guardrails: toxic_content, pii, rate_limit
├─ v2.0 (Feb 1, 2024) - Rebranded
│ └─ Guardrails: toxic_content, pii, rate_limit, jailbreak_detection
└─ v2.1 (Current) - Tightened rate limits
└─ Guardrails: toxic_content, pii, rate_limit (100→50), jailbreak_detection

Rollback Example:

$# Rollback to v1.1
$curl -X POST https://api.governanceai.com/v1/policies/rollback \
> -H "Authorization: Bearer $API_KEY" \
> -d '{"policy_id": "policy_123", "version": "v1.1"}'

Policy Rollout Strategies

Immediate

Policy takes effect instantly for all users.

├─ All users immediately → New policy

Pros: Complete control, simple Cons: Risk of disruption

Roll out to small subset first, then expand.

Day 1-3:
├─ 5% of users → New policy
├─ 95% of users → Old policy
Day 4-6:
├─ 25% of users → New policy
├─ 75% of users → Old policy
Day 7+:
├─ 100% of users → New policy

Pros: Detect issues early, minimize risk Cons: Requires monitoring

Scheduled

Activate at specific time.

├─ Now until Jan 15 → Old policy
├─ Jan 15 at 2 AM UTC → Switch to new policy

Pros: Control timing, notify users Cons: Single point of failure

Monitoring & Debugging

View Policy Evaluations

$curl https://api.governanceai.com/v1/audit/logs \
> -H "Authorization: Bearer $API_KEY" \
> -H "X-Filter: resource_type=policy,action=evaluate" \
> -H "X-Limit: 100"

Enable Policy Debugging

In Dashboard:

  • Go to Policies → Select policy
  • Click ⋯ (More)Debug Mode
  • Set logging level: DEBUG, INFO, or ERROR
  • Policy now logs every evaluation detail

Common Issues

“No applicable policies found”

  • Verify policy scope matches request context
  • Check organization/workspace IDs

“Policy evaluation timeout”

  • Policy has too many complex rules
  • Optimize or split into multiple policies
  • Contact support for performance tuning

Best Practices

Do:

  • Start with pre-built rules and customize
  • Use policy versioning for changes
  • Test in staging before production rollout
  • Monitor evaluation metrics regularly
  • Document policy decisions for compliance
  • Review policies quarterly

Don’t:

  • Create overly complex policies with many rules
  • Apply broad policies without understanding impact
  • Forget to test policy interactions
  • Ignore policy evaluation metrics
  • Make policy changes without version control

Next Steps