LiteLLM Provider Integration

Use GovernanceAI as a drop-in LiteLLM-compatible proxy to add AI governance to any application using LiteLLM.

Overview

GovernanceAI provides a LiteLLM-compatible API endpoint that:

✅ Accepts same requests as OpenAI/Claude/etc.
✅ Applies guardrails and policies
✅ Returns policy-compliant responses
✅ Logs all activity for audit

No code changes needed - Just point your LiteLLM client to GovernanceAI.

Setup

Step 1: Get Proxy Endpoint

Integrations → LiteLLM
Copy your endpoint: https://litellm.governanceai.com/v1
Generate API key (or use existing)

Step 2: Configure LiteLLM

1 import litellm
2 
3 # Point LiteLLM to GovernanceAI
4 litellm.api_base = "https://litellm.governanceai.com/v1"
5 litellm.api_key = "gak_prod_your_api_key"
6 
7 # Use normally - guardrails applied automatically
8 response = litellm.completion(
9     model="openai/gpt-4",
10     messages=[{"role": "user", "content": "Hello"}]
11 )
12 
13 print(response.choices[0].message.content)

Step 3: Supported Models

All models are supported by passing through to their provider:

1 # OpenAI models
2 litellm.completion(model="openai/gpt-4", ...)
3 litellm.completion(model="openai/gpt-3.5-turbo", ...)
4 
5 # Claude models
6 litellm.completion(model="claude-3-opus", ...)
7 
8 # Cohere models
9 litellm.completion(model="cohere/command", ...)
10 
11 # And more...

Configuration

Model Routing

Route different models through different guardrails:

$ curl -X POST https://api.governanceai.com/v1/litellm/model-routing \
>   -H "Authorization: Bearer $API_KEY" \
>   -d '{
>     "routes": [
>       {
>         "model_pattern": "gpt-4",
>         "guardrail_policy": "strict"
>       },
>       {
>         "model_pattern": "gpt-3.5-turbo",
>         "guardrail_policy": "standard"
>       },
>       {
>         "model_pattern": "claude-*",
>         "guardrail_policy": "standard"
>       }
>     ]
>   }'

Rate Limiting

Configure per-model rate limits:

$ curl -X POST https://api.governanceai.com/v1/litellm/rate-limits \
>   -H "Authorization: Bearer $API_KEY" \
>   -d '{
>     "limits": [
>       {
>         "model": "gpt-4",
>         "requests_per_minute": 60,
>         "tokens_per_minute": 300000
>       },
>       {
>         "model": "*",
>         "requests_per_minute": 1000,
>         "tokens_per_minute": 1000000
>       }
>     ]
>   }'

Usage Example

Python Application

1 import litellm
2 import json
3 
4 # Configure
5 litellm.api_base = "https://litellm.governanceai.com/v1"
6 litellm.api_key = "gak_prod_..."
7 
8 # Make request with context (optional)
9 response = litellm.completion(
10     model="openai/gpt-4",
11     messages=[{
12         "role": "user",
13         "content": "What is the capital of France?"
14     }],
15     # GovernanceAI-specific context
16     metadata={
17         "org_id": "org_123",
18         "user_id": "user_456",
19         "workspace_id": "ws_789"
20     }
21 )
22 
23 # Response includes GovernanceAI metadata
24 print(f"Content: {response.choices[0].message.content}")
25 print(f"Risk Score: {response.risk_score}")  # GovernanceAI addition
26 print(f"Policy Violations: {response.policy_violations}")  # GovernanceAI addition

LangChain Integration

1 from langchain.chat_models import ChatOpenAI
2 from langchain.schema import HumanMessage
3 
4 # Configure to use GovernanceAI
5 chat = ChatOpenAI(
6     model_name="gpt-4",
7     openai_api_base="https://litellm.governanceai.com/v1",
8     openai_api_key="gak_prod_...",
9     temperature=0
10 )
11 
12 # Use normally - all requests go through GovernanceAI
13 messages = [HumanMessage(content="Hello!")]
14 response = chat(messages)
15 
16 print(response.content)

LlamaIndex Integration

1 from llama_index.llms import OpenAI
2 
3 # Use GovernanceAI endpoint
4 llm = OpenAI(
5     model="gpt-4",
6     api_base="https://litellm.governanceai.com/v1",
7     api_key="gak_prod_..."
8 )
9 
10 # All requests go through GovernanceAI guardrails
11 response = llm.complete("What is AI governance?")

Monitoring & Metrics

View Usage

$ # Get LiteLLM endpoint usage
$ curl -H "Authorization: Bearer $API_KEY" \
>   https://api.governanceai.com/v1/litellm/usage
$ 
$ # Returns:
$ {
>   "total_requests": 45230,
>   "total_tokens": 12453000,
>   "avg_latency_ms": 245,
>   "policy_violations": 123,
>   "blocked_requests": 45,
>   "transformed_responses": 78
> }

Per-Model Metrics

$ curl -H "Authorization: Bearer $API_KEY" \
>   'https://api.governanceai.com/v1/litellm/usage/by-model'
$ 
$ # Returns metrics per model (gpt-4, gpt-3.5-turbo, etc.)

Error Handling

GovernanceAI returns standard OpenAI error codes:

1 try:
2     response = litellm.completion(
3         model="openai/gpt-4",
4         messages=[...]
5     )
6 except litellm.APIError as e:
7     # Handle API errors
8     print(f"Error: {e.http_status} - {e.message}")
9 
10 # Common errors:
11 # 400 - Invalid request (malformed guardrail config)
12 # 401 - Authentication failed (invalid API key)
13 # 429 - Rate limit exceeded
14 # 500 - Server error (try again)

Performance

Latency Impact

GovernanceAI adds minimal latency:

Average overhead: 45-100ms
P95: 150ms
P99: 250ms

Varies based on:

Policy complexity
Model response size
Network latency to provider

Caching

Enable response caching:

$ curl -X POST https://api.governanceai.com/v1/litellm/caching \
>   -H "Authorization: Bearer $API_KEY" \
>   -d '{
>     "enabled": true,
>     "ttl_seconds": 3600,
>     "cache_identical_requests": true
>   }'

Best Practices

✅ Do:

Use org_id and user_id in metadata
Set appropriate rate limits
Monitor usage regularly
Test policies before production
Use different keys per environment

❌ Don’t:

Share API keys between environments
Disable logging for audit trails
Route sensitive data without PII guardrails
Forget to set up alerts

Troubleshooting

401 Unauthorized - Check API key
Rate limit exceeded - Check configured limits
Slow responses - Check policy complexity
Connection refused - Verify endpoint URL

Next Steps

Setting Up Guardrails - Configure policies
Quick Start - First API call
API Reference - Full LiteLLM API docs