Helm Deployment
Deploy Actyze to Kubernetes using Helm charts for production environments.
Overview
Helm charts provide production-ready Kubernetes deployment with:
- High availability with multiple replicas
- Horizontal pod autoscaling based on CPU/memory
- Persistent storage for databases and models
- Ingress configuration for external access
- Health checks and monitoring
- Rolling updates with zero downtime
- Always pulls latest images from Docker Hub for automatic updates
Prerequisites
- Kubernetes cluster (v1.24+)
- Helm 3.x installed
kubectlconfigured to access your cluster- 4GB+ RAM per node
- Storage class for persistent volumes
Repository Structure
Helm charts are maintained in a separate repository:
Repository: https://github.com/actyze/helm-charts
helm-charts/
├── dashboard/
│ ├── Chart.yaml # Chart metadata
│ ├── values.yaml # Main configuration
│ ├── values-secrets.yaml.template # Secrets template
│ ├── templates/ # Kubernetes manifests
│ │ ├── frontend-deployment.yaml
│ │ ├── nexus-deployment.yaml
│ │ ├── schema-service-deployment.yaml
│ │ ├── postgres.yaml
│ │ ├── trino-deployment.yaml
│ │ ├── ingress.yaml
│ │ └── secrets.yaml
│ ├── VALUES_README.md # Configuration reference
│ ├── LLM_PROVIDERS.md # LLM setup guide
│ └── MIGRATIONS_README.md # Database migrations
├── DEPLOYMENT.md # Deployment guide
└── README.md # Repository overview
Quick Start
1. Clone Helm Charts Repository
git clone https://github.com/actyze/helm-charts.git
cd helm-charts
2. Configure Secrets
# Copy secrets template
cp dashboard/values-secrets.yaml.template dashboard/values-secrets.yaml
# Edit with your credentials
nano dashboard/values-secrets.yaml
Add your configuration:
secrets:
# External LLM API Key
externalLLM:
apiKey: "your-api-key-here"
# PostgreSQL Password
postgres:
password: "your-secure-password"
# Trino Credentials (if using external Trino)
trino:
user: "your-trino-username"
password: "your-trino-password"
3. Deploy to Kubernetes
helm install dashboard ./dashboard \
--namespace actyze \
--create-namespace \
--values dashboard/values.yaml \
--values dashboard/values-secrets.yaml \
--wait
4. Verify Deployment
# Check pod status
kubectl get pods -n actyze
# Check services
kubectl get svc -n actyze
# Check ingress
kubectl get ingress -n actyze
Expected output:
NAME READY STATUS RESTARTS AGE
dashboard-frontend-xxx 1/1 Running 0 2m
dashboard-nexus-xxx 1/1 Running 0 2m
dashboard-schema-service-xxx 1/1 Running 0 2m
dashboard-postgres-0 1/1 Running 0 2m
dashboard-trino-xxx 1/1 Running 0 2m
Configuration
Main Configuration File
The values.yaml file contains all non-sensitive configuration:
# Service toggles
services:
frontend:
enabled: true
replicas: 2
nexus:
enabled: true
replicas: 3
schemaService:
enabled: true
replicas: 2
postgres:
enabled: true
trino:
enabled: true
# LLM Configuration
modelStrategy:
externalLLM:
enabled: true
provider: "anthropic"
model: "claude-sonnet-4-20250514"
baseUrl: "https://api.anthropic.com/v1/messages"
authType: "x-api-key"
extraHeaders: '{"anthropic-version": "2023-06-01"}'
maxTokens: 4096
temperature: 0.1
# Ingress Configuration
ingress:
enabled: true
className: "nginx"
hosts:
- host: analytics.yourcompany.com
paths:
- path: /
pathType: Prefix
service: frontend
- path: /api
pathType: Prefix
service: nexus
See: VALUES_README.md for all options.
Docker Image Configuration
Helm charts are configured to always pull the latest images from Docker Hub:
nexus:
image:
repository: actyze/dashboard-nexus
tag: main-llm-flex
pullPolicy: Always # Always pull latest from Docker Hub
frontend:
image:
repository: actyze/dashboard-frontend
tag: latest
pullPolicy: Always # Always pull latest from Docker Hub
schemaService:
image:
repository: actyze/dashboard-schema-service
tag: latest
pullPolicy: Always # Always pull latest from Docker Hub
Benefits:
- Automatic updates: Get the latest features and bug fixes
- No local builds: No need to build images from source
- Consistent deployments: All environments use the same images from Docker Hub
- Faster deployments: Images are pre-built and optimized
Image Pull Policy:
Always: Kubernetes always pulls the image, even if it exists locally- Ensures you're running the latest version on every deployment
- Critical for images tagged with
latestor version tags that may be updated
Service Ports and Networking
Internal Service Ports (within Kubernetes cluster):
| Service | Internal Port | Protocol | Purpose |
|---|---|---|---|
| Frontend | 80 | HTTP | React UI (nginx) |
| Nexus | 8002 | HTTP | FastAPI backend API |
| Schema Service | 8000 | HTTP | FAISS recommendations |
| PostgreSQL | 5432 | TCP | Database server |
| Trino | 8080 | HTTP | SQL query engine |
External Access:
Access to Actyze is configured through Kubernetes Ingress, not direct port exposure:
ingress:
enabled: true
className: "nginx"
hosts:
- host: analytics.yourcompany.com
paths:
- path: / # Routes to Frontend (port 80)
pathType: Prefix
service: frontend
- path: /api # Routes to Nexus (port 8002)
pathType: Prefix
service: nexus
Key Points:
- Services communicate internally using Kubernetes DNS (e.g.,
http://dashboard-nexus:8002) - External users access via Ingress hostname (e.g.,
https://analytics.yourcompany.com) - TLS/SSL termination happens at Ingress level
- No NodePort or LoadBalancer services required
For local development/testing, use port-forwarding:
kubectl port-forward -n actyze svc/dashboard-frontend 3000:80
kubectl port-forward -n actyze svc/dashboard-nexus 8002:8002
See Helm Setup Guide - Production Access for complete Ingress configuration including SSL/TLS.
LLM Provider Configuration
Actyze supports multiple LLM providers. Configure in values.yaml:
Anthropic Claude (Recommended):
modelStrategy:
externalLLM:
enabled: true
provider: "anthropic"
model: "claude-sonnet-4-20250514"
baseUrl: "https://api.anthropic.com/v1/messages"
authType: "x-api-key"
extraHeaders: '{"anthropic-version": "2023-06-01"}'
OpenAI GPT-4:
modelStrategy:
externalLLM:
enabled: true
provider: "openai"
model: "gpt-4"
baseUrl: "https://api.openai.com/v1/chat/completions"
authType: "bearer"
extraHeaders: ''
Perplexity:
modelStrategy:
externalLLM:
enabled: true
provider: "perplexity"
model: "sonar-reasoning-pro"
baseUrl: "https://api.perplexity.ai/chat/completions"
authType: "bearer"
extraHeaders: ''
See:
- AI Providers - All 100+ supported providers
- LLM Provider Configuration - Detailed setup guide
Resource Configuration
Configure resource requests and limits for production-grade performance. Choose the configuration tier that matches your deployment size and performance requirements.
Production Resource Tiers
Minimum Configuration (Development/Testing):
- Best for: Development, testing, POC environments
- Total cluster requirements: ~2 CPUs, ~4Gi RAM (requests)
- Note: Not recommended for production workloads
- Use file:
values-production-optimized.yamlfrom helm-charts repo
Recommended Configuration (Production-Grade, No Bottlenecks):
- Best for: Standard production deployments with good performance
- Total cluster requirements: ~12 CPUs, ~20Gi RAM (requests)
- This is the recommended production baseline
- Handles large schemas, complex queries, and concurrent users
- Default settings in
values.yaml
Enterprise Configuration (High-Performance, Large Scale):
- Best for: Enterprise deployments, very large data, high concurrency
- Total cluster requirements: ~22 CPUs, ~36Gi RAM (requests)
- Maximum performance with no bottlenecks
- Supports hundreds of concurrent users and complex federated queries
Resource Specifications by Service
Frontend (React/Nginx) - Lightweight static content server
| Tier | Replicas | CPU Request | Memory Request | CPU Limit | Memory Limit |
|---|---|---|---|---|---|
| Minimum | 2 | 50m | 64Mi | 100m | 128Mi |
| Recommended | 2 | 150m | 256Mi | 300m | 512Mi |
| Enterprise | 3-5 | 200m | 512Mi | 500m | 1Gi |
Nexus (FastAPI Backend) - API orchestration and LLM integration
| Tier | Replicas | CPU Request | Memory Request | CPU Limit | Memory Limit |
|---|---|---|---|---|---|
| Minimum | 2 | 150m | 256Mi | 300m | 512Mi |
| Recommended | 3 | 500m | 1Gi | 1000m | 2Gi |
| Enterprise | 5-10 | 1000m | 2Gi | 2000m | 4Gi |
Schema Service (FAISS + Embeddings) - CPU-intensive similarity search and NLP
| Tier | Replicas | CPU Request | Memory Request | CPU Limit | Memory Limit |
|---|---|---|---|---|---|
| Minimum | 1 | 500m | 1Gi | 1000m | 2Gi |
| Recommended | 1 | 4000m | 6Gi | 8000m | 12Gi |
| Enterprise | 1 | 8000m | 12Gi | 16000m | 24Gi |
Notes on Schema Service:
- CPU-intensive for FAISS similarity search and sentence transformer embeddings
- Requires high CPU allocation for fast schema recommendations (<100ms response time)
- Memory scales with schema size (columns, tables, metadata)
- Does not benefit from horizontal scaling (FAISS index is in-memory)
PostgreSQL (Operational Database) - User data, query history, metadata
| Tier | Replicas | CPU Request | Memory Request | CPU Limit | Memory Limit | Storage |
|---|---|---|---|---|---|---|
| Minimum | 1 | 100m | 256Mi | 250m | 512Mi | 10Gi |
| Recommended | 1 | 500m | 2Gi | 1000m | 4Gi | 50Gi |
| Enterprise | 1 | 1000m | 4Gi | 2000m | 8Gi | 100Gi |
Trino (Query Engine) - Distributed SQL engine for federated queries
| Tier | Replicas | CPU Request | Memory Request | CPU Limit | Memory Limit | JVM Heap |
|---|---|---|---|---|---|---|
| Minimum | 1 | 1000m | 2Gi | 2000m | 3Gi | 1.5G |
| Recommended | 1 | 6000m | 12Gi | 8000m | 16Gi | 9G |
| Enterprise | 1 | 12000m | 24Gi | 16000m | 32Gi | 18G |
Notes on Trino:
- Memory-intensive for query processing, joins, and aggregations
- CPU-intensive for parsing, planning, and execution
- JVM heap should be 75% of memory request
- For large datasets and complex queries, more resources = significantly better performance
- Consider Trino workers (additional replicas) for distributed processing of very large queries
Example: Minimum Configuration (Development/Testing Only)
# Frontend
frontend:
replicaCount: 2
resources:
requests:
memory: "64Mi"
cpu: "50m"
limits:
memory: "128Mi"
cpu: "100m"
# Nexus
nexus:
replicaCount: 2
resources:
requests:
memory: "256Mi"
cpu: "150m"
limits:
memory: "512Mi"
cpu: "300m"
# Schema Service
schemaService:
replicaCount: 1
resources:
requests:
memory: "1Gi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "1000m"
storage:
size: 2Gi
probes:
liveness:
initialDelaySeconds: 120
# PostgreSQL
postgres:
resources:
requests:
memory: "256Mi"
cpu: "100m"
limits:
memory: "512Mi"
cpu: "250m"
persistence:
size: 10Gi
# Trino
trino:
replicaCount: 1
resources:
requests:
memory: "2Gi"
cpu: "1000m"
limits:
memory: "3Gi"
cpu: "2000m"
jvm:
maxHeapSize: "1536M" # 75% of 2Gi
# Note: This configuration will have performance limitations
# Use only for development, testing, or POC environments
Example: Recommended Configuration (Production-Grade)
# Frontend - Handles static content and routing
frontend:
replicaCount: 2
resources:
requests:
memory: "256Mi"
cpu: "150m"
limits:
memory: "512Mi"
cpu: "300m"
# Nexus - API orchestration with LLM integration
nexus:
replicaCount: 3 # HA with 3 replicas
resources:
requests:
memory: "1Gi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "1000m"
# Schema Service - CPU-intensive FAISS and embeddings
schemaService:
replicaCount: 1
resources:
requests:
memory: "6Gi" # Large memory for schema metadata
cpu: "4000m" # 4 CPUs for fast similarity search
limits:
memory: "12Gi" # Burst capacity for large schemas
cpu: "8000m" # 8 CPUs for peak performance
storage:
size: 10Gi
probes:
liveness:
initialDelaySeconds: 300
readiness:
initialDelaySeconds: 60
# PostgreSQL - Operational database
postgres:
resources:
requests:
memory: "2Gi"
cpu: "500m"
limits:
memory: "4Gi"
cpu: "1000m"
persistence:
size: 50Gi # Sufficient for production metadata and query history
# Trino - Query engine for federated queries
trino:
replicaCount: 1
resources:
requests:
memory: "12Gi" # Large memory for query processing
cpu: "6000m" # 6 CPUs for complex queries
limits:
memory: "16Gi" # Burst capacity for heavy queries
cpu: "8000m" # 8 CPUs for peak performance
jvm:
maxHeapSize: "9G" # 75% of 12Gi
Example: Enterprise Configuration (Maximum Performance)
# Frontend - High availability for many concurrent users
frontend:
replicaCount: 3
resources:
requests:
memory: "512Mi"
cpu: "200m"
limits:
memory: "1Gi"
cpu: "500m"
# Nexus - Scaled for high concurrency and throughput
nexus:
replicaCount: 5 # High availability with 5 replicas
resources:
requests:
memory: "2Gi"
cpu: "1000m" # 1 CPU per replica
limits:
memory: "4Gi"
cpu: "2000m" # 2 CPUs burst capacity
env:
cache:
queryMaxSize: 500 # Larger cache for high traffic
queryTtl: 3600
llmMaxSize: 1000
llmTtl: 14400
# Schema Service - Maximum CPU for instant recommendations
schemaService:
replicaCount: 1
resources:
requests:
memory: "12Gi" # Large memory for extensive schemas
cpu: "8000m" # 8 CPUs for sub-50ms response times
limits:
memory: "24Gi" # Burst capacity for very large schemas
cpu: "16000m" # 16 CPUs for peak performance
storage:
size: 20Gi
probes:
liveness:
initialDelaySeconds: 300
readiness:
initialDelaySeconds: 60
# PostgreSQL - High-performance database
postgres:
resources:
requests:
memory: "4Gi"
cpu: "1000m"
limits:
memory: "8Gi"
cpu: "2000m"
persistence:
size: 100Gi # Large storage for extensive query history
# Trino - Maximum performance for complex federated queries
trino:
replicaCount: 1 # Consider adding workers for distributed processing
resources:
requests:
memory: "24Gi" # Large memory for complex joins and aggregations
cpu: "12000m" # 12 CPUs for fast query execution
limits:
memory: "32Gi" # Burst capacity for very complex queries
cpu: "16000m" # 16 CPUs for peak performance
jvm:
maxHeapSize: "18G" # 75% of 24Gi
additionalOptions:
- "--add-opens=java.base/java.nio=ALL-UNNAMED"
- "-XX:+UseG1GC"
- "-XX:G1HeapRegionSize=32M"
Cluster Size Calculator
Minimum Configuration Total:
- CPU Requests: ~2.0 CPUs
- Memory Requests: ~4.0Gi
- CPU Limits: ~4.05 CPUs
- Memory Limits: ~6.75Gi
- Recommended cluster: 1-2 nodes × 4 CPU × 8Gi RAM
- Use case: Development, testing, POC only
Recommended Configuration Total (Production-Grade):
- CPU Requests: ~12.65 CPUs (4 CPUs Schema + 6 CPUs Trino + 1.5 CPUs Nexus + 0.5 CPU Postgres + 0.3 CPU Frontend + 0.35 CPU system)
- Memory Requests: ~21.25Gi (6Gi Schema + 12Gi Trino + 3Gi Nexus + 2Gi Postgres + 0.5Gi Frontend + ~0.75Gi system)
- CPU Limits: ~21.3 CPUs
- Memory Limits: ~34.5Gi
- Recommended cluster: 4-5 nodes × 8 CPU × 16Gi RAM OR 3 nodes × 16 CPU × 32Gi RAM
- Use case: Standard production with no performance bottlenecks
Enterprise Configuration Total (Maximum Performance):
- CPU Requests: ~22.7 CPUs (8 CPUs Schema + 12 CPUs Trino + 5 CPUs Nexus + 1 CPU Postgres + 0.6 CPU Frontend + 0.1 CPU system)
- Memory Requests: ~38.6Gi (12Gi Schema + 24Gi Trino + 10Gi Nexus + 4Gi Postgres + 2.56Gi Frontend + ~2Gi system)
- CPU Limits: ~37 CPUs
- Memory Limits: ~69Gi
- Recommended cluster: 3 nodes × 16 CPU × 32Gi RAM OR 2 nodes × 32 CPU × 64Gi RAM
- Use case: Enterprise deployments, very large data, hundreds of concurrent users
Autoscaling Configuration
Enable horizontal pod autoscaling for dynamic traffic. Autoscaling is recommended for production deployments to handle traffic spikes efficiently.
Minimum Configuration (Conservative):
autoscaling:
nexus:
enabled: true
minReplicas: 2
maxReplicas: 4
targetCPUUtilizationPercentage: 80 # Scale less aggressively
frontend:
enabled: true
minReplicas: 2
maxReplicas: 3
targetCPUUtilizationPercentage: 80
schemaService:
enabled: false # Typically doesn't benefit from scaling
Recommended Configuration (Production Standard):
autoscaling:
nexus:
enabled: true
minReplicas: 3
maxReplicas: 10
targetCPUUtilizationPercentage: 70 # Scale proactively
frontend:
enabled: true
minReplicas: 2
maxReplicas: 5
targetCPUUtilizationPercentage: 70
schemaService:
enabled: false # Does not benefit from horizontal scaling
Enterprise Configuration (Maximum Performance):
autoscaling:
nexus:
enabled: true
minReplicas: 5
maxReplicas: 20 # High scale-out capacity
targetCPUUtilizationPercentage: 60 # Very aggressive scaling
frontend:
enabled: true
minReplicas: 3
maxReplicas: 10
targetCPUUtilizationPercentage: 60
schemaService:
enabled: false
Notes:
- Schema Service does not benefit from horizontal scaling (FAISS index is in-memory, not distributed)
- For very large schemas, scale vertically (more CPU/memory) not horizontally
- Lower
targetCPUUtilizationPercentage= more aggressive scaling = better performance but higher cost - Enterprise tier scales early to maintain consistently low response times under any load
Storage Configuration
Configure persistent volumes:
persistence:
postgres:
enabled: true
storageClass: "standard"
size: "10Gi"
schemaService:
enabled: true
storageClass: "standard"
size: "5Gi"
Operational Configuration
Configure timeouts, caching, connection pools, and other operational parameters for production performance.
Cache Configuration
Actyze uses in-memory caching to reduce load on databases and LLM APIs. Configure cache sizes and TTLs based on your workload:
Development/Testing:
nexus:
env:
cache:
enabled: true
type: "memory"
queryMaxSize: 100 # Number of query results to cache
queryTtl: 1800 # 30 minutes
llmMaxSize: 200 # Number of LLM responses to cache
llmTtl: 7200 # 2 hours (LLM calls are expensive)
Recommended (Production):
nexus:
env:
cache:
enabled: true
type: "memory"
queryMaxSize: 1000 # Larger cache for high traffic
queryTtl: 3600 # 1 hour
llmMaxSize: 500 # More LLM response caching
llmTtl: 14400 # 4 hours
schemaMaxSize: 1000 # FAISS schema recommendations
schemaTtl: 7200 # 2 hours
metadataMaxSize: 500 # Schema metadata
metadataTtl: 3600 # 1 hour
Enterprise (High Performance):
nexus:
env:
cache:
enabled: true
type: "memory"
queryMaxSize: 5000 # Very large cache
queryTtl: 7200 # 2 hours
llmMaxSize: 2000 # Extensive LLM caching
llmTtl: 28800 # 8 hours
schemaMaxSize: 5000 # Large schema cache
schemaTtl: 14400 # 4 hours
metadataMaxSize: 2000 # Extensive metadata cache
metadataTtl: 7200 # 2 hours
Cache Guidelines:
- Query Cache: Caches SQL query results. Higher values reduce database load but use more memory.
- LLM Cache: Caches LLM API responses. Essential for cost reduction (LLM calls are expensive).
- Schema Cache: Caches FAISS similarity search results. Reduces load on Schema Service.
- TTL (Time-To-Live): Balance between freshness and performance. Longer TTL = fewer API/DB calls but potentially stale data.
Timeout Configuration
Configure timeouts to prevent hung requests and ensure system responsiveness:
Development/Testing:
nexus:
env:
# SQL Execution Timeouts
# defaultTimeoutSeconds also drives the frontend HTTP timeout (+30s buffer)
sqlExecution:
defaultTimeoutSeconds: 120 # Query execution timeout (frontend waits 150s)
defaultMaxResults: 100 # Max rows returned
# External Service Timeouts
schemaServiceTimeout: 10 # Schema service calls
llmServiceTimeout: 120 # LLM API calls
trinoTimeout: 120 # Trino query timeout
# Ingress Timeouts — must exceed defaultTimeoutSeconds + buffer
ingress:
annotations:
nginx.ingress.kubernetes.io/proxy-read-timeout: "300" # 5 minutes
nginx.ingress.kubernetes.io/proxy-send-timeout: "300"
nginx.ingress.kubernetes.io/proxy-connect-timeout: "60"
Recommended (Production):
nexus:
env:
# SQL Execution Timeouts
# defaultTimeoutSeconds also drives the frontend HTTP timeout (+30s buffer)
sqlExecution:
defaultTimeoutSeconds: 120 # Allow complex queries (frontend waits 150s)
defaultMaxResults: 1000 # More results for analytics
# External Service Timeouts
schemaServiceTimeout: 15 # FAISS can be CPU-intensive
llmServiceTimeout: 120 # LLMs can be slow
trinoTimeout: 300 # Complex federated queries need time
# Ingress Timeouts — must exceed defaultTimeoutSeconds + buffer
ingress:
annotations:
nginx.ingress.kubernetes.io/proxy-read-timeout: "600" # 10 minutes
nginx.ingress.kubernetes.io/proxy-send-timeout: "600"
nginx.ingress.kubernetes.io/proxy-connect-timeout: "120"
Enterprise (Long-Running Queries):
nexus:
env:
# SQL Execution Timeouts
sqlExecution:
defaultTimeoutSeconds: 300 # 5 minutes for complex analytics
defaultMaxResults: 10000 # Large result sets
# External Service Timeouts
schemaServiceTimeout: 30 # More time for large schemas
llmServiceTimeout: 120 # Complex LLM reasoning
trinoTimeout: 600 # 10 minutes for very complex queries
# Ingress Timeouts
ingress:
annotations:
nginx.ingress.kubernetes.io/proxy-read-timeout: "1800" # 30 minutes
nginx.ingress.kubernetes.io/proxy-send-timeout: "1800"
nginx.ingress.kubernetes.io/proxy-connect-timeout: "300"
Timeout Guidelines:
- SQL Timeouts: Set based on expected query complexity. Too short = failed queries, too long = hung connections.
- LLM Timeouts: LLM APIs can be slow during high demand. Set conservatively.
- Ingress Timeouts: Must be >= longest expected API call. Important for file uploads and long-running queries.
Database Connection Pool
Configure PostgreSQL connection pooling for optimal performance:
Development/Testing:
nexus:
env:
postgres:
poolSize: 10 # Connections per Nexus replica
maxOverflow: 10 # Additional connections during spikes
poolTimeout: 30 # Seconds to wait for connection
poolRecycle: 3600 # Recycle connections after 1 hour
Recommended (Production):
nexus:
env:
postgres:
poolSize: 20 # Base connection pool
maxOverflow: 30 # Allow burst traffic
poolTimeout: 30
poolRecycle: 3600
poolPrePing: true # Verify connections before use
Enterprise (High Concurrency):
nexus:
env:
postgres:
poolSize: 50 # Large pool for high concurrency
maxOverflow: 50 # Substantial overflow capacity
poolTimeout: 60 # More patient during load spikes
poolRecycle: 1800 # Recycle more frequently
poolPrePing: true
Connection Pool Guidelines:
- Pool Size: Base connections maintained per Nexus replica. Total = poolSize × replicas.
- Max Overflow: Additional connections during traffic spikes. Prevents connection exhaustion.
- Pool Timeout: How long to wait for an available connection. Too short = connection errors.
- Calculate Total: With 3 Nexus replicas, poolSize=20, maxOverflow=30 = max 150 connections.
- PostgreSQL max_connections: Ensure PostgreSQL
max_connections> total pool capacity.
Retry & Circuit Breaker
Configure retry logic for transient failures:
Recommended (Production):
nexus:
env:
# Schema Service Retries
schemaServiceRetries: 3
schemaServiceRetryDelay: 1 # Seconds between retries
schemaServiceRetryBackoff: 2 # Exponential backoff multiplier
# LLM API Retries
llmServiceRetries: 3
llmServiceRetryDelay: 2
llmServiceRetryBackoff: 2
# Trino Query Retries
trinoRetries: 2
trinoRetryDelay: 5
Enterprise (High Reliability):
nexus:
env:
# More aggressive retries for mission-critical operations
schemaServiceRetries: 5
schemaServiceRetryDelay: 1
schemaServiceRetryBackoff: 2
schemaServiceCircuitBreakerThreshold: 5 # Open circuit after 5 failures
schemaServiceCircuitBreakerTimeout: 60 # Try again after 60s
llmServiceRetries: 5
llmServiceRetryDelay: 2
llmServiceRetryBackoff: 2
llmServiceCircuitBreakerThreshold: 5
llmServiceCircuitBreakerTimeout: 120
trinoRetries: 3
trinoRetryDelay: 5
Rate Limiting
Configure rate limiting for API protection:
Ingress Rate Limiting:
ingress:
annotations:
nginx.ingress.kubernetes.io/limit-rps: "100" # Requests per second per IP
nginx.ingress.kubernetes.io/limit-burst-multiplier: "5" # Allow bursts up to 5x
nginx.ingress.kubernetes.io/limit-connections: "50" # Concurrent connections per IP
Application-Level Rate Limiting:
nexus:
env:
rateLimit:
enabled: true
queriesPerMinute: 60 # Per user
queriesPerHour: 1000
llmCallsPerHour: 100 # Expensive operations
Query Result Limits
Prevent memory exhaustion from large result sets:
nexus:
env:
sqlExecution:
defaultMaxResults: 1000 # Default row limit
maxMaxResults: 100000 # Absolute maximum
streamingThreshold: 10000 # Stream results above this size
File Upload Limits
Configure file upload limits for CSV/Excel features:
ingress:
annotations:
nginx.ingress.kubernetes.io/proxy-body-size: "100m" # Max upload size
nexus:
env:
fileUpload:
maxSize: 104857600 # 100MB in bytes
allowedExtensions: [".csv", ".xlsx", ".xls"]
maxRows: 1000000 # 1 million rows max
Complete Production Example
nexus:
replicaCount: 3
resources:
requests:
memory: "1Gi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "1000m"
env:
debug: false
logLevel: "INFO"
# Cache Configuration
cache:
enabled: true
queryMaxSize: 1000
queryTtl: 3600
llmMaxSize: 500
llmTtl: 14400
schemaMaxSize: 1000
schemaTtl: 7200
# Timeout Configuration
sqlExecution:
defaultTimeoutSeconds: 60
defaultMaxResults: 1000
schemaServiceTimeout: 15
llmServiceTimeout: 60
trinoTimeout: 120
# Connection Pool
postgres:
poolSize: 20
maxOverflow: 30
poolTimeout: 30
poolRecycle: 3600
poolPrePing: true
# Retry Logic
schemaServiceRetries: 3
schemaServiceRetryDelay: 1
llmServiceRetries: 3
llmServiceRetryDelay: 2
trinoRetries: 2
# Rate Limiting
rateLimit:
enabled: true
queriesPerMinute: 60
llmCallsPerHour: 100
# Ingress Configuration
ingress:
enabled: true
annotations:
# Timeouts
nginx.ingress.kubernetes.io/proxy-read-timeout: "600"
nginx.ingress.kubernetes.io/proxy-send-timeout: "600"
nginx.ingress.kubernetes.io/proxy-connect-timeout: "120"
# Rate Limiting
nginx.ingress.kubernetes.io/limit-rps: "100"
nginx.ingress.kubernetes.io/limit-connections: "50"
# File Upload
nginx.ingress.kubernetes.io/proxy-body-size: "100m"
Ingress Setup
Basic Ingress
ingress:
enabled: true
className: "nginx"
hosts:
- host: analytics.yourcompany.com
paths:
- path: /
pathType: Prefix
service: frontend
- path: /api
pathType: Prefix
service: nexus
SSL/TLS with cert-manager
ingress:
enabled: true
className: "nginx"
annotations:
cert-manager.io/cluster-issuer: "letsencrypt-prod"
nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
hosts:
- host: analytics.yourcompany.com
paths:
- path: /
pathType: Prefix
service: frontend
- path: /api
pathType: Prefix
service: nexus
tls:
- secretName: dashboard-tls
hosts:
- analytics.yourcompany.com
Cloud Provider Ingress
AWS ALB:
ingress:
className: "alb"
annotations:
alb.ingress.kubernetes.io/scheme: "internet-facing"
alb.ingress.kubernetes.io/target-type: "ip"
alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80}, {"HTTPS": 443}]'
GCP GKE:
ingress:
className: "gce"
annotations:
kubernetes.io/ingress.global-static-ip-name: "dashboard-ip"
Azure AKS:
ingress:
className: "nginx"
annotations:
kubernetes.io/ingress.class: "nginx"
nginx.ingress.kubernetes.io/use-regex: "true"
Access Methods
Port Forwarding (Development)
For local testing without Ingress:
# Forward frontend port (internal port 80 → local port 3000)
kubectl port-forward -n actyze svc/dashboard-frontend 3000:80
# Forward API port (internal port 8002 → local port 8002)
kubectl port-forward -n actyze svc/dashboard-nexus 8002:8002
Open http://localhost:3000 for the UI and http://localhost:8002/docs for the API.
Ingress (Production)
Configure DNS to point to your ingress controller:
# Get ingress IP
kubectl get ingress -n actyze
# Add DNS A record
# analytics.yourcompany.com → <INGRESS_IP>
Access at https://analytics.yourcompany.com
Management Commands
Upgrade Deployment
# Pull latest chart changes
cd helm-charts
git pull origin main
# Upgrade release
helm upgrade dashboard ./dashboard \
-f dashboard/values.yaml \
-f dashboard/values-secrets.yaml \
-n actyze
Rollback Deployment
# View release history
helm history dashboard -n actyze
# Rollback to previous version
helm rollback dashboard -n actyze
# Rollback to specific revision
helm rollback dashboard 2 -n actyze
View Configuration
# View current values
helm get values dashboard -n actyze
# View all values (including defaults)
helm get values dashboard -n actyze --all
# View rendered manifests
helm get manifest dashboard -n actyze
Uninstall
# Uninstall release (keeps PVCs)
helm uninstall dashboard -n actyze
# Delete namespace and all resources
kubectl delete namespace actyze
Troubleshooting
Pods Not Starting
# Check pod status
kubectl get pods -n actyze
# Describe pod for events
kubectl describe pod <pod-name> -n actyze
# Check logs
kubectl logs <pod-name> -n actyze
# Check previous logs (if crashed)
kubectl logs <pod-name> -n actyze --previous
Common issues:
- ImagePullBackOff: Check image names and registry access
- CrashLoopBackOff: Check logs for application errors
- Pending: Check resource availability and storage class
- OOMKilled: Increase memory limits
Database Connection Issues
# Check PostgreSQL pod
kubectl get pods -n actyze -l app=postgres
# View PostgreSQL logs
kubectl logs -n actyze deployment/dashboard-postgres
# Verify secret
kubectl get secret dashboard-secrets -n actyze -o yaml
# Test connection from Nexus pod
kubectl exec -it -n actyze deployment/dashboard-nexus -- \
psql -h dashboard-postgres -U dashboard_user -d dashboard
LLM API Issues
# Check Nexus logs for API errors
kubectl logs -n actyze deployment/dashboard-nexus | grep -i "llm\|error"
# Verify environment variables
kubectl exec -n actyze deployment/dashboard-nexus -- env | grep EXTERNAL_LLM
# Check secret
kubectl get secret dashboard-secrets -n actyze -o jsonpath='{.data.EXTERNAL_LLM_API_KEY}' | base64 -d
Ingress Issues
# Check ingress status
kubectl get ingress -n actyze
kubectl describe ingress dashboard-ingress -n actyze
# Check ingress controller
kubectl get pods -n ingress-nginx
# View ingress logs
kubectl logs -n ingress-nginx deployment/ingress-nginx-controller
# Test DNS resolution
nslookup analytics.yourcompany.com
Storage Issues
# Check PVCs
kubectl get pvc -n actyze
# Check storage class
kubectl get storageclass
# Describe PVC for events
kubectl describe pvc <pvc-name> -n actyze
# Check PV
kubectl get pv
Performance Issues
# Check resource usage
kubectl top pods -n actyze
kubectl top nodes
# Check HPA status
kubectl get hpa -n actyze
kubectl describe hpa dashboard-nexus-hpa -n actyze
# View metrics
kubectl get --raw /apis/metrics.k8s.io/v1beta1/nodes
Monitoring
Health Checks
# Port-forward and test health endpoint
kubectl port-forward -n actyze svc/dashboard-nexus 8000:8000
curl http://localhost:8000/health
View Logs
# All pods
kubectl logs -n actyze -l app.kubernetes.io/name=dashboard --tail=100
# Specific service
kubectl logs -n actyze deployment/dashboard-nexus --tail=100 -f
# Multiple pods
kubectl logs -n actyze -l app=nexus --tail=50 --all-containers=true
Resource Usage
# Pod resource usage
kubectl top pods -n actyze
# Node resource usage
kubectl top nodes
# Detailed pod info
kubectl describe pod <pod-name> -n actyze
Production Best Practices
High Availability
services:
nexus:
replicas: 3
frontend:
replicas: 2
schemaService:
replicas: 2
# Enable pod disruption budgets
podDisruptionBudget:
enabled: true
minAvailable: 1
# Anti-affinity rules
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
topologyKey: kubernetes.io/hostname
Security
# Use Kubernetes secrets
secrets:
externalLLM:
apiKey: "use-sealed-secrets-or-external-secrets-operator"
# Enable network policies
networkPolicy:
enabled: true
# Use service accounts
serviceAccount:
create: true
name: "dashboard-sa"
# Security context
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
Monitoring & Logging
# Enable Prometheus metrics
monitoring:
enabled: true
serviceMonitor:
enabled: true
# Configure logging
logging:
level: "INFO"
format: "json"
# Health checks
livenessProbe:
enabled: true
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
enabled: true
initialDelaySeconds: 10
periodSeconds: 5
Backup & Recovery
# Backup PostgreSQL
kubectl exec -n actyze deployment/dashboard-postgres -- \
pg_dump -U dashboard_user dashboard > backup.sql
# Backup PVC
kubectl get pvc -n actyze
# Use Velero or cloud provider backup solutions
# Restore database
kubectl exec -i -n actyze deployment/dashboard-postgres -- \
psql -U dashboard_user dashboard < backup.sql
Database Migrations
Migrations run automatically via Kubernetes Job:
# Check migration status
kubectl get jobs -n actyze
kubectl logs -n actyze job/dashboard-db-migration
# Manual migration (if needed)
kubectl exec -it -n actyze deployment/dashboard-nexus -- \
alembic upgrade head
# Rollback migration
kubectl exec -it -n actyze deployment/dashboard-nexus -- \
alembic downgrade -1
Upgrading
Update Helm Charts
# Pull latest changes
cd helm-charts
git pull origin main
# Review changes
git log --oneline -10
git diff HEAD~5 dashboard/
# Upgrade deployment
helm upgrade dashboard ./dashboard \
-f dashboard/values.yaml \
-f dashboard/values-secrets.yaml \
-n actyze
Update Docker Images
# Update image tags in values.yaml
services:
nexus:
image:
tag: "v1.2.0" # Update version
# Upgrade
helm upgrade dashboard ./dashboard \
-f dashboard/values.yaml \
-f dashboard/values-secrets.yaml \
-n actyze
# Or force pod restart
kubectl rollout restart deployment/dashboard-nexus -n actyze
CI/CD Integration
GitHub Actions Example
name: Deploy to Kubernetes
on:
push:
branches: [main]
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Configure kubectl
uses: azure/k8s-set-context@v1
with:
kubeconfig: ${{ secrets.KUBE_CONFIG }}
- name: Deploy with Helm
run: |
helm upgrade --install dashboard ./dashboard \
-f dashboard/values.yaml \
--set secrets.externalLLM.apiKey=${{ secrets.LLM_API_KEY }} \
-n actyze \
--create-namespace
Next Steps
- AI Providers - Choose from 100+ AI providers
- Configure LLM Provider - Set up your AI model
- Database Connectors - Connect to Trino
- Monitoring Setup - Set up observability
- API Reference - Integrate with your apps
Additional Resources
- Helm Charts Repository - Source code
- VALUES_README.md - All configuration options
- LLM_PROVIDERS.md - LLM setup guide
- DEPLOYMENT.md - Deployment guide
Support
For issues and questions:
- GitHub Issues: github.com/actyze/helm-charts/issues
- Documentation: docs.actyze.com
- Kubernetes Docs: kubernetes.io