LiteLLM Provider Integration

LiteLLM Provider Integration

Use GovernanceAI as a drop-in LiteLLM-compatible proxy to add AI governance to any application using LiteLLM.

Overview

GovernanceAI provides a LiteLLM-compatible API endpoint that:

  • ✅ Accepts same requests as OpenAI/Claude/etc.
  • ✅ Applies guardrails and policies
  • ✅ Returns policy-compliant responses
  • ✅ Logs all activity for audit

No code changes needed - Just point your LiteLLM client to GovernanceAI.

Setup

Step 1: Get Proxy Endpoint

  • IntegrationsLiteLLM
  • Copy your endpoint: https://litellm.governanceai.com/v1
  • Generate API key (or use existing)

Step 2: Configure LiteLLM

1import litellm
2
3# Point LiteLLM to GovernanceAI
4litellm.api_base = "https://litellm.governanceai.com/v1"
5litellm.api_key = "gak_prod_your_api_key"
6
7# Use normally - guardrails applied automatically
8response = litellm.completion(
9 model="openai/gpt-4",
10 messages=[{"role": "user", "content": "Hello"}]
11)
12
13print(response.choices[0].message.content)

Step 3: Supported Models

All models are supported by passing through to their provider:

1# OpenAI models
2litellm.completion(model="openai/gpt-4", ...)
3litellm.completion(model="openai/gpt-3.5-turbo", ...)
4
5# Claude models
6litellm.completion(model="claude-3-opus", ...)
7
8# Cohere models
9litellm.completion(model="cohere/command", ...)
10
11# And more...

Configuration

Model Routing

Route different models through different guardrails:

$curl -X POST https://api.governanceai.com/v1/litellm/model-routing \
> -H "Authorization: Bearer $API_KEY" \
> -d '{
> "routes": [
> {
> "model_pattern": "gpt-4",
> "guardrail_policy": "strict"
> },
> {
> "model_pattern": "gpt-3.5-turbo",
> "guardrail_policy": "standard"
> },
> {
> "model_pattern": "claude-*",
> "guardrail_policy": "standard"
> }
> ]
> }'

Rate Limiting

Configure per-model rate limits:

$curl -X POST https://api.governanceai.com/v1/litellm/rate-limits \
> -H "Authorization: Bearer $API_KEY" \
> -d '{
> "limits": [
> {
> "model": "gpt-4",
> "requests_per_minute": 60,
> "tokens_per_minute": 300000
> },
> {
> "model": "*",
> "requests_per_minute": 1000,
> "tokens_per_minute": 1000000
> }
> ]
> }'

Usage Example

Python Application

1import litellm
2import json
3
4# Configure
5litellm.api_base = "https://litellm.governanceai.com/v1"
6litellm.api_key = "gak_prod_..."
7
8# Make request with context (optional)
9response = litellm.completion(
10 model="openai/gpt-4",
11 messages=[{
12 "role": "user",
13 "content": "What is the capital of France?"
14 }],
15 # GovernanceAI-specific context
16 metadata={
17 "org_id": "org_123",
18 "user_id": "user_456",
19 "workspace_id": "ws_789"
20 }
21)
22
23# Response includes GovernanceAI metadata
24print(f"Content: {response.choices[0].message.content}")
25print(f"Risk Score: {response.risk_score}") # GovernanceAI addition
26print(f"Policy Violations: {response.policy_violations}") # GovernanceAI addition

LangChain Integration

1from langchain.chat_models import ChatOpenAI
2from langchain.schema import HumanMessage
3
4# Configure to use GovernanceAI
5chat = ChatOpenAI(
6 model_name="gpt-4",
7 openai_api_base="https://litellm.governanceai.com/v1",
8 openai_api_key="gak_prod_...",
9 temperature=0
10)
11
12# Use normally - all requests go through GovernanceAI
13messages = [HumanMessage(content="Hello!")]
14response = chat(messages)
15
16print(response.content)

LlamaIndex Integration

1from llama_index.llms import OpenAI
2
3# Use GovernanceAI endpoint
4llm = OpenAI(
5 model="gpt-4",
6 api_base="https://litellm.governanceai.com/v1",
7 api_key="gak_prod_..."
8)
9
10# All requests go through GovernanceAI guardrails
11response = llm.complete("What is AI governance?")

Monitoring & Metrics

View Usage

$# Get LiteLLM endpoint usage
$curl -H "Authorization: Bearer $API_KEY" \
> https://api.governanceai.com/v1/litellm/usage
$
$# Returns:
${
> "total_requests": 45230,
> "total_tokens": 12453000,
> "avg_latency_ms": 245,
> "policy_violations": 123,
> "blocked_requests": 45,
> "transformed_responses": 78
>}

Per-Model Metrics

$curl -H "Authorization: Bearer $API_KEY" \
> 'https://api.governanceai.com/v1/litellm/usage/by-model'
$
$# Returns metrics per model (gpt-4, gpt-3.5-turbo, etc.)

Error Handling

GovernanceAI returns standard OpenAI error codes:

1try:
2 response = litellm.completion(
3 model="openai/gpt-4",
4 messages=[...]
5 )
6except litellm.APIError as e:
7 # Handle API errors
8 print(f"Error: {e.http_status} - {e.message}")
9
10# Common errors:
11# 400 - Invalid request (malformed guardrail config)
12# 401 - Authentication failed (invalid API key)
13# 429 - Rate limit exceeded
14# 500 - Server error (try again)

Performance

Latency Impact

GovernanceAI adds minimal latency:

  • Average overhead: 45-100ms
  • P95: 150ms
  • P99: 250ms

Varies based on:

  • Policy complexity
  • Model response size
  • Network latency to provider

Caching

Enable response caching:

$curl -X POST https://api.governanceai.com/v1/litellm/caching \
> -H "Authorization: Bearer $API_KEY" \
> -d '{
> "enabled": true,
> "ttl_seconds": 3600,
> "cache_identical_requests": true
> }'

Best Practices

Do:

  • Use org_id and user_id in metadata
  • Set appropriate rate limits
  • Monitor usage regularly
  • Test policies before production
  • Use different keys per environment

Don’t:

  • Share API keys between environments
  • Disable logging for audit trails
  • Route sensitive data without PII guardrails
  • Forget to set up alerts

Troubleshooting

  • 401 Unauthorized - Check API key
  • Rate limit exceeded - Check configured limits
  • Slow responses - Check policy complexity
  • Connection refused - Verify endpoint URL

Next Steps