Configuration¶
Agent Gateway is configured via a gateway.yaml file in the workspace directory. All settings have sensible defaults so you can start with an empty or minimal file and add sections as needed.
Configuration precedence (highest to lowest):
- Environment variables (
AGENT_GATEWAY_*) gateway.yaml- Built-in defaults
gateway.yaml reference¶
server¶
Controls the HTTP server:
server:
host: "0.0.0.0" # Bind address (default: 0.0.0.0)
port: 8000 # Port (default: 8000)
workers: 1 # Number of worker processes (default: 1)
model¶
Default LLM settings used by all agents unless overridden in AGENT.md:
model:
default: "gpt-4o-mini" # LiteLLM model identifier
temperature: 0.1 # Sampling temperature (default: 0.1)
max_tokens: 50000 # Maximum output tokens (default: 50000)
fallback: null # Fallback model if primary fails (default: none)
Model names follow LiteLLM format. Examples: gpt-4o, anthropic/claude-3-5-sonnet-20241022, gemini/gemini-2.0-flash.
guardrails¶
Hard limits applied to every agent execution:
guardrails:
max_tool_calls: 20 # Maximum tool calls per execution (default: 20)
max_iterations: 10 # Maximum LLM reasoning iterations (default: 10)
timeout_ms: 60000 # Execution timeout in milliseconds (default: 60000)
max_delegation_depth: 3 # Maximum agent-to-agent delegation depth (default: 3)
An execution that hits any of these limits is stopped with the appropriate stop_reason.
auth¶
Authentication configuration:
auth:
enabled: true
mode: api_key # api_key | oauth2 | composite | custom | none
api_keys:
- name: production
key: "${API_KEY}"
scopes: ["*"]
oauth2:
issuer: "https://auth.example.com"
audience: "my-api"
jwks_uri: null # Auto-derived from issuer if null
algorithms: [RS256, ES256]
scope_claim: "scope" # Use "scp" for Azure AD
clock_skew_seconds: 30
public_paths:
- /v1/health
See Authentication for the full authentication guide.
persistence¶
Database storage for conversations, execution records, audit logs, schedules, and memories. When enabled, chat sessions also survive server restarts via session rehydration.
persistence:
enabled: true
backend: sqlite # sqlite | postgres
url: "sqlite+aiosqlite:///agent_gateway.db" # Database URL
table_prefix: "" # Optional table name prefix
db_schema: null # PostgreSQL schema (default: public)
For PostgreSQL:
persistence:
backend: postgres
url: "postgresql+asyncpg://user:password@host:5432/dbname"
db_schema: "agent_gw"
telemetry¶
OpenTelemetry tracing:
telemetry:
enabled: true
service_name: "agent-gateway"
exporter: console # console | otlp | none
endpoint: "http://localhost:4317"
protocol: grpc # grpc | http
sample_rate: 1.0 # 0.0 to 1.0
queue¶
Background execution queue for async agents:
queue:
backend: none # none | memory | redis | rabbitmq
redis_url: "redis://localhost:6379/0"
rabbitmq_url: "amqp://guest:guest@localhost:5672/"
stream_key: "ag:executions" # Redis stream key
queue_name: "ag.executions" # RabbitMQ queue name
consumer_group: "ag-workers" # Redis consumer group
workers: 4 # Number of concurrent workers
max_retries: 3 # Retry attempts for failed executions
visibility_timeout_s: 300 # Seconds before a claimed job is requeued
drain_timeout_s: 30 # Seconds to wait for workers on shutdown
default_execution_mode: sync # sync | async (used when agent doesn't specify)
When backend: none, async executions run in-process using asyncio. For production, use redis or rabbitmq.
scheduler¶
Controls cron-based agent scheduling (requires agents with schedules: defined):
scheduler:
enabled: true
misfire_grace_seconds: 60 # How late a job can start before being skipped (default: 60)
max_instances: 1 # Max concurrent instances of the same job (default: 1)
coalesce: true # Merge missed firings into one (default: true)
distributed_lock:
enabled: false # Enable to prevent duplicate firings across multiple instances
backend: auto # auto | redis | postgres | none
redis_url: null # Redis URL (defaults to queue.redis_url when omitted)
key_prefix: "ag:sched-lock:"
lock_ttl_seconds: 300
When running multiple gateway instances or worker processes, set distributed_lock.enabled: true so only one instance fires each scheduled job. The backend: auto setting detects the right backend automatically — Redis when a Redis queue is configured, PostgreSQL when a PostgreSQL persistence backend is in use.
See the Scheduling guide for a full walkthrough of distributed locking options.
mcp¶
Settings for MCP server connections:
mcp:
tool_call_timeout_ms: 30000 # Per-tool-call timeout in ms (default: 30000)
connection_timeout_ms: 10000 # Connection startup timeout in ms (default: 10000)
These timeouts apply to all MCP servers registered via gw.add_mcp_server() or the Admin API.
context_retrieval¶
Controls how context is fetched from retrievers and static files:
context_retrieval:
retriever_timeout_seconds: 10.0 # Per-retriever timeout (default: 10.0)
max_retrieved_chars: 50000 # Max total chars from all retrievers (default: 50000)
max_context_file_chars: 100000 # Max chars from static context files (default: 100000)
memory¶
Global memory defaults (overridable per-agent in AGENT.md):
memory:
enabled: false # Enable memory globally (default: false)
max_injected_chars: 4000 # Max characters of memory injected per turn (default: 4000)
extraction_model: null # Model used for memory extraction (defaults to global model)
auto_extract: false # Auto-extract memories after each turn (default: false)
max_memory_md_lines: 200 # Max lines in MEMORY.md file (default: 200)
compaction:
enabled: true # Enable automatic memory compaction (default: true)
max_memories_per_scope: 100 # Trigger compaction when scope exceeds this (default: 100)
compact_ratio: 0.5 # Fraction of memories to compact (default: 0.5)
min_age_hours: 24 # Don't compact memories younger than this (default: 24)
importance_threshold: 0.8 # Never compact memories with importance >= this (default: 0.8)
decay_factor: 0.95 # Relevance decay per day since last access (default: 0.95)
cors¶
Cross-Origin Resource Sharing headers:
cors:
enabled: false
allow_origins:
- "https://app.example.com"
allow_methods: [GET, POST, DELETE, OPTIONS]
allow_headers: [Authorization, Content-Type]
allow_credentials: false
max_age: 3600
allow_credentials: true cannot be combined with allow_origins: ["*"] — specify explicit origins instead.
rate_limit¶
Rate limiting for API endpoints (requires slowapi):
rate_limit:
enabled: false
default_limit: "100/minute" # Default rate limit for all endpoints
storage_uri: "redis://localhost:6379" # Shared storage for multi-worker deployments
trust_forwarded_for: false # Use X-Forwarded-For header for client IP
Install the optional dependency: pip install agents-gateway[rate-limiting]
When running with multiple workers, set storage_uri to a Redis URL so rate limits are enforced across all processes. Without it, each worker maintains its own counter.
See the Rate Limiting guide for details.
security¶
Security headers are injected into every HTTP response by default:
security:
enabled: true # Enabled by default (set false to disable)
x_content_type_options: "nosniff"
x_frame_options: "DENY"
strict_transport_security: "max-age=31536000; includeSubDomains"
content_security_policy: "default-src 'self'"
referrer_policy: "strict-origin-when-cross-origin"
Unlike CORS and rate limiting, security headers are enabled by default (opt-out). Dashboard paths automatically receive a relaxed Content-Security-Policy that allows inline styles and scripts.
See the Security Headers guide for details.
dashboard¶
The built-in monitoring dashboard (opt-in):
dashboard:
enabled: false
title: "Agent Gateway"
logo_url: null
favicon_url: null
auth:
enabled: true
username: admin
password: "${DASHBOARD_PASSWORD}"
login_button_text: "Sign in with SSO"
session_secret: "" # Auto-generated if empty
oauth2: # Optional OAuth2/OIDC SSO (replaces password auth)
issuer: "https://auth.example.com"
client_id: "dashboard-client"
client_secret: "${DASHBOARD_CLIENT_SECRET}"
scopes: [openid, profile, email]
theme:
mode: auto # light | dark | auto
colors:
primary: "#6366f1"
primary_dark: "#818cf8"
secondary: "#64748b"
secondary_dark: "#94a3b8"
surface: "#ffffff"
surface_dark: "#141b2d"
sidebar: "#0f172a"
sidebar_dark: "#0b0f1a"
danger: "#ef4444"
danger_dark: "#f87171"
notifications¶
Global notification backends:
notifications:
slack:
enabled: false
bot_token: "${SLACK_BOT_TOKEN}"
default_channel: "#agent-alerts"
webhooks:
- name: monitoring
url: "https://hooks.example.com/agent-events"
secret: "${WEBHOOK_SECRET}"
events: [] # Empty = all events
payload_template: null # Custom Jinja2 template for payload
Multiple webhook endpoints can be defined. Each has a unique name referenced in agent notification config.
context¶
Arbitrary key-value data available to agent prompts and tool handlers:
timezone¶
Global default timezone for schedules and timestamps:
Valid values: UTC, Europe/London, America/New_York, Asia/Tokyo, etc.
Environment variable overrides¶
Any configuration value can be overridden with an environment variable. The prefix is AGENT_GATEWAY_ and nested keys are separated by __ (double underscore):
AGENT_GATEWAY_SERVER__PORT=9000
AGENT_GATEWAY_MODEL__DEFAULT=gpt-4o
AGENT_GATEWAY_AUTH__ENABLED=false
AGENT_GATEWAY_PERSISTENCE__BACKEND=postgres
AGENT_GATEWAY_PERSISTENCE__URL=postgresql+asyncpg://...
Environment variables always take precedence over gateway.yaml.
Variable interpolation¶
Use ${VAR_NAME} syntax in any YAML string value to reference environment variables. The gateway substitutes the value at startup and raises an error if the variable is not set:
auth:
api_keys:
- name: production
key: "${PRODUCTION_API_KEY}"
persistence:
url: "postgresql+asyncpg://${DB_USER}:${DB_PASSWORD}@${DB_HOST}:5432/agent_gw"
notifications:
slack:
bot_token: "${SLACK_BOT_TOKEN}"
Example configurations¶
Minimal¶
Just agents, no extras — the smallest possible config:
# gateway.yaml — minimal
auth:
mode: api_key
api_keys:
- name: default
key: "${API_KEY}"
scopes: ["*"]
All other settings use built-in defaults (SQLite, console telemetry, no dashboard).
Development¶
# gateway.yaml — development
server:
port: 8000
model:
default: "gpt-4o-mini"
temperature: 0.1
auth:
mode: api_key
api_keys:
- name: dev
key: "dev-api-key-change-me"
scopes: ["*"]
persistence:
backend: sqlite
url: "sqlite+aiosqlite:///agent_gateway.db"
telemetry:
enabled: true
exporter: console
cors:
enabled: true
dashboard:
enabled: true
auth:
username: admin
password: "admin"
timezone: "UTC"
Production¶
# gateway.yaml — production
server:
host: "0.0.0.0"
port: 8000
workers: 4
model:
default: "gpt-4o"
temperature: 0.1
max_tokens: 8192
fallback: "gpt-4o-mini"
guardrails:
max_tool_calls: 30
timeout_ms: 120000
auth:
mode: oauth2
oauth2:
issuer: "${OAUTH2_ISSUER}"
audience: "${OAUTH2_AUDIENCE}"
public_paths:
- /v1/health
persistence:
backend: postgres
url: "${DATABASE_URL}"
db_schema: "agent_gw"
queue:
backend: redis
redis_url: "${REDIS_URL}"
workers: 8
max_retries: 3
telemetry:
enabled: true
service_name: "agent-gateway-prod"
exporter: otlp
endpoint: "${OTEL_EXPORTER_OTLP_ENDPOINT}"
protocol: grpc
sample_rate: 0.1
memory:
enabled: true
auto_extract: true
compaction:
enabled: true
cors:
enabled: true
allow_origins:
- "https://app.example.com"
allow_credentials: false
notifications:
slack:
enabled: true
bot_token: "${SLACK_BOT_TOKEN}"
default_channel: "#agent-alerts"
webhooks:
- name: pagerduty
url: "${PAGERDUTY_WEBHOOK_URL}"
secret: "${PAGERDUTY_WEBHOOK_SECRET}"
dashboard:
enabled: true
title: "Agent Gateway — Production"
auth:
enabled: true
oauth2:
issuer: "${OAUTH2_ISSUER}"
client_id: "${DASHBOARD_CLIENT_ID}"
client_secret: "${DASHBOARD_CLIENT_SECRET}"
theme:
mode: dark
colors:
primary: "#2563eb"
sidebar: "#0f172a"
timezone: "UTC"