As organizations scale their AI applications, many are looking for ways to maintain complete control over their infrastructure while leveraging the power of unified LLM routing. Self-hosting Requesty on Kubernetes provides the perfect balance of flexibility, security, and performance for teams with specific compliance requirements or existing Kubernetes infrastructure.
In this comprehensive guide, we'll walk through everything you need to know about deploying Requesty on Kubernetes using Helm, from initial setup to production-ready configurations. Whether you're managing air-gapped environments or simply want full control over your LLM gateway, this guide will help you get Requesty running smoothly in your Kubernetes cluster.
Why Self-Host Requesty on Kubernetes?
Before diving into the technical details, let's explore why self-hosting Requesty might be the right choice for your organization.
Complete Data Control: When you self-host Requesty, all your LLM traffic, caching data, and usage analytics remain within your infrastructure. This is crucial for organizations handling sensitive data or operating in regulated industries.
Compliance and Data Residency: Many enterprises have strict requirements about where data can be stored and processed. Self-hosting ensures you meet these requirements while still benefiting from Requesty's smart routing and optimization features.
Air-Gapped Environments: For organizations operating in secure, isolated networks, self-hosting is often the only option. Requesty's Helm chart supports air-gapped installations, allowing you to deploy without internet access.
Cost Optimization at Scale: While Requesty's cloud offering already provides up to 80% cost savings through intelligent caching and routing, self-hosting gives you additional control over resource allocation and scaling strategies.
Prerequisites and System Requirements
Before beginning your Requesty deployment, ensure you have the following prerequisites in place:
Kubernetes Cluster Requirements
Kubernetes Version: 1.19 or higher (1.24+ recommended for latest features)
Cluster Resources: Minimum 3 nodes with 4 CPU cores and 16GB RAM each
Storage: SSD-class persistent storage with at least 100GB available
Networking: Cluster networking configured with DNS and ingress controller
Required Tools
Helm: Version 3.8 or higher installed on your local machine
kubectl: Configured with access to your target cluster
Storage Class: Dynamic provisioning enabled for persistent volumes
Ingress Controller: NGINX or similar for external access
Recommended Resources for Production
For production deployments handling significant traffic through Requesty's 160+ supported models, we recommend:
API Gateway Pods: 3 replicas with 2 CPU / 4GB RAM each
Cache Layer: Redis with 8GB+ memory for optimal performance
Database: PostgreSQL with 4 CPU / 16GB RAM for analytics and metadata
Object Storage: S3-compatible storage for long-term cache persistence
Step-by-Step Deployment Guide
Now let's walk through the complete deployment process for Requesty on Kubernetes.
Step 1: Add the Requesty Helm Repository
First, add the official Requesty Helm repository to your local Helm installation:
```bash helm repo add requesty https://helm.requesty.ai helm repo update ```
For air-gapped environments, download the chart package separately and transfer it to your isolated network.
Step 2: Create a Namespace
Create a dedicated namespace for your Requesty deployment:
```bash kubectl create namespace requesty ```
This isolation helps with resource management and security policies.
Step 3: Configure Your values.yaml
Create a custom `values.yaml` file to configure your Requesty deployment. Here's a production-ready example:
```yaml
Requesty API Configuration
requesty: replicaCount: 3 resources: limits: cpu: "2" memory: "4Gi" requests: cpu: "1" memory: "2Gi"
Core configuration
config:
Enable all routing features
smartRouting: enabled: true fallbackPolicies: enabled: true caching: enabled: true ttl: 3600
Security settings
guardrails: enabled: true promptInjection: true piiRedaction: true
PostgreSQL Configuration
postgresql: deploy: true # Set to false for external database auth: database: requesty username: requesty password: "changeme" # Use secrets in production resources: limits: cpu: "4" memory: "16Gi" requests: cpu: "2" memory: "8Gi" persistence: enabled: true size: 100Gi storageClass: fast-ssd
Redis Configuration
redis: deploy: true # Set to false for external Redis auth: enabled: true password: "changeme" # Use secrets in production master: resources: limits: cpu: "2" memory: "8Gi" requests: cpu: "1" memory: "4Gi" persistence: enabled: true size: 50Gi
Ingress Configuration
ingress: enabled: true className: nginx annotations: cert-manager.io/cluster-issuer: letsencrypt-prod hosts:
host: requesty.yourdomain.com
paths:
path: /
pathType: Prefix tls:
secretName: requesty-tls
hosts:
requesty.yourdomain.com
Object Storage (S3-compatible)
objectStorage: provider: s3 s3: bucket: requesty-cache region: us-east-1 endpoint: https://s3.amazonaws.com accessKeyId: valueFrom: secretKeyRef: name: s3-credentials key: access-key-id secretAccessKey: valueFrom: secretKeyRef: name: s3-credentials key: secret-access-key ```
Step 4: Create Required Secrets
Before deploying, create the necessary Kubernetes secrets:
```bash
S3 credentials
kubectl create secret generic s3-credentials \ --from-literal=access-key-id=YOUR_ACCESS_KEY \ --from-literal=secret-access-key=YOUR_SECRET_KEY \ -n requesty
Database passwords (if using external)
kubectl create secret generic db-credentials \ --from-literal=postgres-password=YOUR_DB_PASSWORD \ --from-literal=redis-password=YOUR_REDIS_PASSWORD \ -n requesty ```
Step 5: Deploy Requesty
Now deploy Requesty using your custom configuration:
```bash helm install requesty requesty/requesty \ --namespace requesty \ --values values.yaml \ --wait ```
The `--wait` flag ensures Helm waits for all pods to be ready before completing.
Step 6: Verify the Deployment
Check that all pods are running successfully:
```bash kubectl get pods -n requesty ```
You should see output similar to: ``` NAME READY STATUS RESTARTS AGE requesty-api-7d9b8c6f5-abc123 1/1 Running 0 2m requesty-api-7d9b8c6f5-def456 1/1 Running 0 2m requesty-api-7d9b8c6f5-ghi789 1/1 Running 0 2m requesty-postgresql-0 1/1 Running 0 2m requesty-redis-master-0 1/1 Running 0 2m ```
Advanced Configuration Options
Requesty's Helm chart supports numerous advanced configurations to optimize your deployment for specific use cases.
External Database Configuration
For production environments, you may want to use managed database services:
```yaml postgresql: deploy: false external: host: your-rds-instance.region.rds.amazonaws.com port: 5432 database: requesty username: requesty passwordSecret: name: db-credentials key: postgres-password ```
High Availability Setup
Enable high availability for critical components:
```yaml requesty: replicaCount: 5 affinity: podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution:
weight: 100
podAffinityTerm: labelSelector: matchExpressions:
key: app.kubernetes.io/name
operator: In values:
requesty
topologyKey: kubernetes.io/hostname ```
Custom Model Configuration
Configure specific models and their routing preferences:
```yaml requesty: config: models:
Configure model-specific settings
gpt-4o: maxTokens: 8192 temperature: 0.7 claude-4: maxTokens: 16384 temperature: 0.5
Smart routing preferences
smartRouting: costWeight: 0.4 latencyWeight: 0.3 qualityWeight: 0.3 ```
Resource Monitoring
Enable Prometheus metrics for monitoring:
```yaml metrics: enabled: true serviceMonitor: enabled: true namespace: monitoring labels: prometheus: kube-prometheus ```
Security Best Practices
When self-hosting Requesty, security should be a top priority. Here are essential security configurations:
Network Policies
Implement strict network policies to control traffic:
```yaml networkPolicy: enabled: true ingress:
from:
namespaceSelector:
matchLabels: name: requesty
podSelector:
matchLabels: app: requesty ```
Secret Management
Use external secret management solutions:
```yaml externalSecrets: enabled: true backend: vault vaultServer: https://vault.yourdomain.com auth: method: kubernetes role: requesty ```
Enable Guardrails
Requesty's security guardrails protect against various threats:
```yaml requesty: config: guardrails: promptInjection: enabled: true action: block piiRedaction: enabled: true patterns:
ssn
credit_card
email
contentFiltering: enabled: true categories:
violence
hate_speech
```
Troubleshooting Common Issues
Here are solutions to common deployment challenges:
Pod Restart Loops
If pods are restarting frequently, check resource limits:
Increase memory limits if OOMKilled
Check database connectivity
Verify secret configurations
Slow Response Times
Optimize caching and routing:
Ensure Redis has sufficient memory
Enable auto-caching for frequently used prompts
Configure fallback policies for model availability
Storage Issues
For persistent volume problems:
Verify StorageClass supports dynamic provisioning
Check available disk space on nodes
Ensure proper permissions for volume mounts
Monitoring and Maintenance
Once deployed, maintain optimal performance with these practices:
Health Checks
Configure comprehensive health checks:
```yaml requesty: livenessProbe: httpGet: path: /health port: 8080 initialDelaySeconds: 30 periodSeconds: 10 readinessProbe: httpGet: path: /ready port: 8080 initialDelaySeconds: 5 periodSeconds: 5 ```
Backup Strategy
Implement regular backups:
Database snapshots every 6 hours
Redis persistence for cache data
Configuration backups in version control
Scaling Policies
Configure horizontal pod autoscaling:
```yaml autoscaling: enabled: true minReplicas: 3 maxReplicas: 10 targetCPUUtilizationPercentage: 70 targetMemoryUtilizationPercentage: 80 ```
Integration with Your Applications
After deployment, integrate your applications with self-hosted Requesty:
API Configuration
Update your application to use the self-hosted endpoint:
```python from openai import OpenAI
client = OpenAI( api_key="your-requesty-api-key", base_url="https://requesty.yourdomain.com/v1" )
Use any of the 160+ models
response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Hello!"}] ) ```
Enable Advanced Features
Take advantage of Requesty's advanced features:
Smart routing for automatic model selection
Structured outputs for consistent JSON responses
Streaming for real-time responses
Conclusion
Self-hosting Requesty on Kubernetes gives you complete control over your LLM infrastructure while maintaining all the benefits of unified routing, intelligent caching, and cost optimization. With proper configuration and monitoring, your self-hosted Requesty deployment can handle millions of requests while reducing costs by up to 80%.
Whether you're managing sensitive data, operating in air-gapped environments, or simply prefer full control over your infrastructure, Requesty's Helm chart makes deployment straightforward and maintainable.
Ready to get started? Sign up for Requesty to get your API keys and access to our complete documentation. For enterprise deployments and dedicated support, check out our enterprise features.
Have questions about self-hosting? Join our Discord community where our team and 15k+ developers are ready to help you optimize your LLM infrastructure.