Deploy Codex on AWS, Azure, or GCP — or let us manage the infrastructure. Production-grade scaling and monitoring included.
Three deployment models — self-hosted on your cloud, Codex-managed hosting, or hybrid — pick the one that fits your infrastructure strategy.
Codex Enterprise offers flexible deployment options that accommodate different organizational requirements. Self-hosted deployments run the full Codex platform within your AWS, Azure, or GCP account using Terraform modules maintained by Codex. Managed hosting shifts infrastructure operations to Codex while your organization retains control over data, access policies, and integration points. Hybrid deployments split responsibilities — for example, running the inference engine on Codex-managed infrastructure while keeping the control plane and data storage within your VPC.
All deployment models share the same core platform: identical API surface, identical CLI behavior, and identical feature set. Switching between models does not require reconfiguration of client tools or CI/CD pipelines — the endpoint URL changes, everything else stays the same. This consistency means you can start with managed hosting for rapid evaluation and migrate to self-hosted when your infrastructure requirements mature, with zero changes to how your developers interact with Codex.
Codex runs natively on all three major cloud platforms with provider-optimized deployment modules.
| Capability | AWS | Azure | GCP |
|---|---|---|---|
| Deployment Method | Terraform + EKS | Terraform + AKS | Terraform + GKE |
| Private Networking | PrivateLink | Private Link | VPC Service Controls |
| Key Management | AWS KMS | Azure Key Vault | Cloud KMS |
| Container Registry | ECR | ACR | Artifact Registry |
| Load Balancing | ALB/NLB | Application Gateway | Cloud Load Balancing |
| Monitoring Integration | CloudWatch | Azure Monitor | Cloud Monitoring |
| GPU Support | P4d/P5 Instances | NCv4 Series | L4/A100 VMs |
| Hybrid Available | Yes | Yes | Yes |
Full control over infrastructure, networking, and data — deploy Codex into your cloud account with production-hardened Terraform modules.
Self-hosted deployment gives your organization complete control over the infrastructure running Codex. The platform ships as a set of Terraform modules — one per cloud provider — that provision Kubernetes clusters, databases, object storage, load balancers, and monitoring infrastructure. The modules are production-hardened: they include VPC configuration with private subnets, security groups with least-privilege rules, IAM roles with scoped permissions, and encryption configuration using your own KMS keys.
Deployment typically completes within two hours for a single-region setup. The Terraform modules support multi-region deployments with cross-region replication for disaster recovery. Configuration is declarative — describe your desired topology in Terraform variables, and the modules provision infrastructure that matches. Upgrades use a rolling strategy: new platform versions are deployed alongside existing ones, traffic shifts after health checks pass, and the previous version is decommissioned after a configurable drain period. Rollback is a single Terraform apply away if issues surface after an upgrade.
Codex operates the infrastructure — your team focuses on building software, not managing deployment infrastructure.
Managed hosting shifts infrastructure operations to the Codex team. Your organization gets a dedicated Codex deployment — not shared multi-tenant — running on infrastructure managed by Codex SREs. You control data residency region, encryption keys, access policies, and integration points. Codex handles OS patching, platform upgrades, database maintenance, backup management, and 24/7 monitoring. The 99.9% SLA applies, backed by financial penalties for missed uptime targets.
Managed hosting is the fastest path to production. Codex provisions your dedicated environment within two business days of contract signing. The environment comes pre-configured with monitoring dashboards, alerting rules, and backup schedules based on production patterns from hundreds of existing deployments. Your team connects through the same API endpoints and CLI commands as any other deployment model. Migration from managed hosting to self-hosted is supported — Codex provides database exports and configuration manifests that replicate your environment in your own cloud account.
Split Codex components across your infrastructure and Codex-managed infrastructure — control what matters, offload what does not.
Hybrid deployment addresses organizations with nuanced infrastructure requirements. A common pattern: the Codex control plane — API gateway, authentication, project management, and dashboards — runs in your VPC behind your firewall, while inference workloads run on Codex-managed GPU clusters optimized for low-latency code generation. This configuration keeps sensitive data (user identities, project metadata, audit logs) within your network while leveraging Codex infrastructure for computationally intensive AI processing where data passes through ephemerally and is never stored.
The reverse pattern is also supported: inference workloads run on your GPU infrastructure (useful if you have reserved GPU capacity or specialized hardware), while Codex manages the control plane. Hybrid deployments use mTLS for component-to-component communication with certificate rotation managed by your CA or Codex-managed certificates. Configuration is managed through a unified deployment manifest that declares which components run where — the platform handles service discovery and secure communication transparently.
Autoscaling based on real workload metrics — inference queue depth, API latency, and concurrent session count — keeps Codex responsive under any load.
Codex cloud deployments use Kubernetes Horizontal Pod Autoscaling driven by platform-specific metrics. The primary scaling signal is inference queue depth: as more code generation and review requests arrive, the platform scales inference workers to maintain response time targets. Secondary signals include API request latency (scaling the gateway tier) and concurrent session count (scaling the session management tier). Scaling policies are configurable — set minimum and maximum pod counts per component, cooldown periods, and scale-up aggressiveness.
GPU node autoscaling is managed through the cloud provider's Kubernetes node autoscaler. Codex configures node groups with appropriate GPU instance types and sets scaling bounds that control infrastructure cost. For predictable workloads — daily stand-up spikes when entire teams begin coding, or CI pipeline bursts before deployment deadlines — scheduled scaling rules can pre-warm capacity before demand arrives. The scaling configuration reference in the documentation covers every parameter with guidance for common team sizes and usage patterns.
Every Codex deployment ships with Prometheus, Grafana, and structured logging — Enterprise adds Datadog and Splunk integration.
Codex deployments include a comprehensive observability stack out of the box. Prometheus scrapes metrics from every platform component — API latency histograms, inference queue depths, error rates by endpoint, database query performance, and resource utilization. Grafana dashboards provide pre-built views for operations teams: platform health overview, inference performance, capacity planning, and SLO tracking against the SLA targets. Alerting rules ship with sensible defaults and are fully customizable — integrate with PagerDuty, Opsgenie, or your existing incident management tool.
Enterprise customers can integrate with Datadog and Splunk for centralized observability. The Datadog integration exports metrics, traces, and logs to your existing Datadog account with pre-built dashboards and monitors. Splunk integration streams structured audit logs and platform events into your Splunk instance for correlation with other infrastructure and security events. Both integrations use standard agents and require no custom code — configuration is a few lines in the deployment manifest.
AWS, Azure, and GCP — each with native Terraform modules that use provider-specific services for networking, encryption, and monitoring.
Codex provides production-hardened deployment modules for all three major cloud providers. The AWS module uses EKS, RDS, ElastiCache, S3, and PrivateLink. Azure uses AKS, Azure Database for PostgreSQL, Azure Cache for Redis, Blob Storage, and Private Link. GCP uses GKE, Cloud SQL, Memorystore, Cloud Storage, and VPC Service Controls. Each module is maintained in its own repository with provider-specific documentation and examples. The modules produce functionally identical deployments — platform behavior, API surface, and CLI interaction are indistinguishable across providers.
Horizontal Pod Autoscaling driven by inference queue depth and API response time — plus GPU node autoscaling for the inference tier with configurable scaling bounds.
The Codex autoscaling system responds to real workload signals rather than CPU or memory utilization alone. Inference queue depth is the primary metric — when pending code generation requests exceed a threshold, additional inference workers start within 30 seconds. GPU nodes scale through the cloud provider's Kubernetes node autoscaler with instance type preferences and scaling limits you configure. All scaling parameters — thresholds, cooldown periods, minimum and maximum counts — are exposed in the deployment configuration. For teams with predictable usage patterns, scheduled scaling can pre-warm capacity before daily stand-ups or CI pipeline bursts.
Yes — split control plane and inference workloads across your VPC and Codex-managed infrastructure with mTLS-secured inter-component communication.
Hybrid deployment gives you fine-grained control over where each Codex component runs. A common pattern keeps the control plane in your VPC (where user identities, project metadata, and audit logs reside) while running inference on Codex-managed GPU clusters (where code passes through ephemerally for processing). The reverse is also supported: run inference on your GPU hardware while Codex manages the control plane. Component-to-component communication uses mTLS with configurable certificate authorities. The deployment manifest declares component placement, and the platform handles service discovery, secure communication, and health monitoring across infrastructure boundaries.
Prometheus metrics, Grafana dashboards, structured logging, and configurable alerting — Enterprise adds Datadog and Splunk for centralized observability.
Every Codex deployment includes a complete observability stack. Prometheus collects metrics from all platform components with 15-second granularity. Grafana provides pre-built dashboards for operations, inference performance, and SLO tracking. Alerting rules ship with defaults for critical conditions — high error rates, degraded inference latency, database replication lag — and are integrated with Alertmanager for routing to your incident management tools. Enterprise customers can additionally stream metrics, traces, and logs to Datadog for integration with existing monitoring infrastructure, and export structured audit events to Splunk for security information and event management correlation.
Whether you are looking to download Codex for the first time, explore the Codex CLI for terminal-native development, or understand how Codex AI transforms your engineering practice, the platform provides integrated tools for every stage of software delivery. The AI code generation engine produces idiomatic code across 40+ languages, while intelligent code review catches bugs before they reach production. Teams can automate testing with the integrated testing suite, debug efficiently with automated debugging, and enforce quality standards with deep code analysis.
Developers integrating Codex into their toolchain start with CLI installation and IDE plugin setup for their preferred editor. The comprehensive API enables custom automation, CI/CD pipeline integration connects Codex to your deployment workflow, and Docker containerization simplifies environment configuration. For deeper integration, see the full documentation covering every feature in detail.