As organizations scale their AI applications, many are looking for ways to maintain complete control over their infrastructure while leveraging the power of unified LLM routing. Self-hosting Requesty on Kubernetes provides the perfect balance of flexibility, security, and performance for teams with specific compliance requirements or existing Kubernetes infrastructure.

In this comprehensive guide, we'll walk through everything you need to know about deploying Requesty on Kubernetes using Helm, from initial setup to production-ready configurations. Whether you're managing air-gapped environments or simply want full control over your LLM gateway, this guide will help you get Requesty running smoothly in your Kubernetes cluster.

Why Self-Host Requesty on Kubernetes?

Before diving into the technical details, let's explore why self-hosting Requesty might be the right choice for your organization.

Complete Data Control: When you self-host Requesty, all your LLM traffic, caching data, and usage analytics remain within your infrastructure. This is crucial for organizations handling sensitive data or operating in regulated industries.

Compliance and Data Residency: Many enterprises have strict requirements about where data can be stored and processed. Self-hosting ensures you meet these requirements while still benefiting from Requesty's smart routing and optimization features.

Air-Gapped Environments: For organizations operating in secure, isolated networks, self-hosting is often the only option. Requesty's Helm chart supports air-gapped installations, allowing you to deploy without internet access.

Cost Optimization at Scale: While Requesty's cloud offering already provides up to 80% cost savings through intelligent caching and routing, self-hosting gives you additional control over resource allocation and scaling strategies.

Prerequisites and System Requirements

Before beginning your Requesty deployment, ensure you have the following prerequisites in place:

Kubernetes Cluster Requirements

Kubernetes Version: 1.19 or higher (1.24+ recommended for latest features)
Cluster Resources: Minimum 3 nodes with 4 CPU cores and 16GB RAM each
Storage: SSD-class persistent storage with at least 100GB available
Networking: Cluster networking configured with DNS and ingress controller

Required Tools

Helm: Version 3.8 or higher installed on your local machine
kubectl: Configured with access to your target cluster
Storage Class: Dynamic provisioning enabled for persistent volumes
Ingress Controller: NGINX or similar for external access

Recommended Resources for Production

For production deployments handling significant traffic through Requesty's 160+ supported models, we recommend:

API Gateway Pods: 3 replicas with 2 CPU / 4GB RAM each
Cache Layer: Redis with 8GB+ memory for optimal performance
Database: PostgreSQL with 4 CPU / 16GB RAM for analytics and metadata
Object Storage: S3-compatible storage for long-term cache persistence

Step-by-Step Deployment Guide

Now let's walk through the complete deployment process for Requesty on Kubernetes.

Step 1: Add the Requesty Helm Repository

First, add the official Requesty Helm repository to your local Helm installation:

```bash helm repo add requesty https://helm.requesty.ai helm repo update ```

For air-gapped environments, download the chart package separately and transfer it to your isolated network.

Step 2: Create a Namespace

Create a dedicated namespace for your Requesty deployment:

```bash kubectl create namespace requesty ```

This isolation helps with resource management and security policies.

Step 3: Configure Your values.yaml

Create a custom `values.yaml` file to configure your Requesty deployment. Here's a production-ready example:

```yaml

Requesty API Configuration

requesty: replicaCount: 3 resources: limits: cpu: "2" memory: "4Gi" requests: cpu: "1" memory: "2Gi"

Core configuration

config:

Enable all routing features

smartRouting: enabled: true fallbackPolicies: enabled: true caching: enabled: true ttl: 3600

Security settings

guardrails: enabled: true promptInjection: true piiRedaction: true

PostgreSQL Configuration

postgresql: deploy: true # Set to false for external database auth: database: requesty username: requesty password: "changeme" # Use secrets in production resources: limits: cpu: "4" memory: "16Gi" requests: cpu: "2" memory: "8Gi" persistence: enabled: true size: 100Gi storageClass: fast-ssd

Redis Configuration

redis: deploy: true # Set to false for external Redis auth: enabled: true password: "changeme" # Use secrets in production master: resources: limits: cpu: "2" memory: "8Gi" requests: cpu: "1" memory: "4Gi" persistence: enabled: true size: 50Gi

Ingress Configuration

ingress: enabled: true className: nginx annotations: cert-manager.io/cluster-issuer: letsencrypt-prod hosts:

host: requesty.yourdomain.com

paths:

path: /

pathType: Prefix tls:

secretName: requesty-tls

hosts:

requesty.yourdomain.com

Object Storage (S3-compatible)

objectStorage: provider: s3 s3: bucket: requesty-cache region: us-east-1 endpoint: https://s3.amazonaws.com accessKeyId: valueFrom: secretKeyRef: name: s3-credentials key: access-key-id secretAccessKey: valueFrom: secretKeyRef: name: s3-credentials key: secret-access-key ```

Step 4: Create Required Secrets

Before deploying, create the necessary Kubernetes secrets:

```bash

S3 credentials

kubectl create secret generic s3-credentials \ --from-literal=access-key-id=YOUR_ACCESS_KEY \ --from-literal=secret-access-key=YOUR_SECRET_KEY \ -n requesty

Database passwords (if using external)

kubectl create secret generic db-credentials \ --from-literal=postgres-password=YOUR_DB_PASSWORD \ --from-literal=redis-password=YOUR_REDIS_PASSWORD \ -n requesty ```

Step 5: Deploy Requesty

Now deploy Requesty using your custom configuration:

```bash helm install requesty requesty/requesty \ --namespace requesty \ --values values.yaml \ --wait ```

The `--wait` flag ensures Helm waits for all pods to be ready before completing.

Step 6: Verify the Deployment

Check that all pods are running successfully:

```bash kubectl get pods -n requesty ```

You should see output similar to: ``` NAME READY STATUS RESTARTS AGE requesty-api-7d9b8c6f5-abc123 1/1 Running 0 2m requesty-api-7d9b8c6f5-def456 1/1 Running 0 2m requesty-api-7d9b8c6f5-ghi789 1/1 Running 0 2m requesty-postgresql-0 1/1 Running 0 2m requesty-redis-master-0 1/1 Running 0 2m ```

Advanced Configuration Options

Requesty's Helm chart supports numerous advanced configurations to optimize your deployment for specific use cases.

External Database Configuration

For production environments, you may want to use managed database services:

```yaml postgresql: deploy: false external: host: your-rds-instance.region.rds.amazonaws.com port: 5432 database: requesty username: requesty passwordSecret: name: db-credentials key: postgres-password ```

High Availability Setup

Enable high availability for critical components:

```yaml requesty: replicaCount: 5 affinity: podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution:

weight: 100

podAffinityTerm: labelSelector: matchExpressions:

key: app.kubernetes.io/name

operator: In values:

requesty

topologyKey: kubernetes.io/hostname ```

Custom Model Configuration

Configure specific models and their routing preferences:

```yaml requesty: config: models:

Configure model-specific settings

gpt-4o: maxTokens: 8192 temperature: 0.7 claude-4: maxTokens: 16384 temperature: 0.5

Smart routing preferences

smartRouting: costWeight: 0.4 latencyWeight: 0.3 qualityWeight: 0.3 ```

Resource Monitoring

Enable Prometheus metrics for monitoring:

```yaml metrics: enabled: true serviceMonitor: enabled: true namespace: monitoring labels: prometheus: kube-prometheus ```

Security Best Practices

When self-hosting Requesty, security should be a top priority. Here are essential security configurations:

Network Policies

Implement strict network policies to control traffic:

```yaml networkPolicy: enabled: true ingress:

from:
namespaceSelector:

matchLabels: name: requesty

podSelector:

matchLabels: app: requesty ```

Secret Management

Use external secret management solutions:

```yaml externalSecrets: enabled: true backend: vault vaultServer: https://vault.yourdomain.com auth: method: kubernetes role: requesty ```

Enable Guardrails

Requesty's security guardrails protect against various threats:

```yaml requesty: config: guardrails: promptInjection: enabled: true action: block piiRedaction: enabled: true patterns:

ssn
credit_card
email

contentFiltering: enabled: true categories:

violence
hate_speech

```

Troubleshooting Common Issues

Here are solutions to common deployment challenges:

Pod Restart Loops

If pods are restarting frequently, check resource limits:

Increase memory limits if OOMKilled
Check database connectivity
Verify secret configurations

Slow Response Times

Optimize caching and routing:

Ensure Redis has sufficient memory
Enable auto-caching for frequently used prompts
Configure fallback policies for model availability

Storage Issues

For persistent volume problems:

Verify StorageClass supports dynamic provisioning
Check available disk space on nodes
Ensure proper permissions for volume mounts

Monitoring and Maintenance

Once deployed, maintain optimal performance with these practices:

Health Checks

Configure comprehensive health checks:

```yaml requesty: livenessProbe: httpGet: path: /health port: 8080 initialDelaySeconds: 30 periodSeconds: 10 readinessProbe: httpGet: path: /ready port: 8080 initialDelaySeconds: 5 periodSeconds: 5 ```

Backup Strategy

Implement regular backups:

Database snapshots every 6 hours
Redis persistence for cache data
Configuration backups in version control

Scaling Policies

Configure horizontal pod autoscaling:

```yaml autoscaling: enabled: true minReplicas: 3 maxReplicas: 10 targetCPUUtilizationPercentage: 70 targetMemoryUtilizationPercentage: 80 ```

Integration with Your Applications

After deployment, integrate your applications with self-hosted Requesty:

API Configuration

Update your application to use the self-hosted endpoint:

```python from openai import OpenAI

client = OpenAI( api_key="your-requesty-api-key", base_url="https://requesty.yourdomain.com/v1" )

Use any of the 160+ models

response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Hello!"}] ) ```

Enable Advanced Features

Take advantage of Requesty's advanced features:

Smart routing for automatic model selection
Structured outputs for consistent JSON responses
Streaming for real-time responses

Conclusion

Self-hosting Requesty on Kubernetes gives you complete control over your LLM infrastructure while maintaining all the benefits of unified routing, intelligent caching, and cost optimization. With proper configuration and monitoring, your self-hosted Requesty deployment can handle millions of requests while reducing costs by up to 80%.

Whether you're managing sensitive data, operating in air-gapped environments, or simply prefer full control over your infrastructure, Requesty's Helm chart makes deployment straightforward and maintainable.

Ready to get started? Sign up for Requesty to get your API keys and access to our complete documentation. For enterprise deployments and dedicated support, check out our enterprise features.

Have questions about self-hosting? Join our Discord community where our team and 15k+ developers are ready to help you optimize your LLM infrastructure.

Self-Hosting Requesty on Kubernetes: The Complete Helm Deployment Guide