Skip to main content

Helm Deployment

Deploy Actyze to Kubernetes using Helm charts for production environments.

Overview

Helm charts provide production-ready Kubernetes deployment with:

  • High availability with multiple replicas
  • Horizontal pod autoscaling based on CPU/memory
  • Persistent storage for databases and models
  • Ingress configuration for external access
  • Health checks and monitoring
  • Rolling updates with zero downtime
  • Always pulls latest images from Docker Hub for automatic updates

Prerequisites

  • Kubernetes cluster (v1.24+)
  • Helm 3.x installed
  • kubectl configured to access your cluster
  • 4GB+ RAM per node
  • Storage class for persistent volumes

Repository Structure

Helm charts are maintained in a separate repository:

Repository: https://github.com/actyze/helm-charts

helm-charts/
├── dashboard/
│ ├── Chart.yaml # Chart metadata
│ ├── values.yaml # Main configuration
│ ├── values-secrets.yaml.template # Secrets template
│ ├── templates/ # Kubernetes manifests
│ │ ├── frontend-deployment.yaml
│ │ ├── nexus-deployment.yaml
│ │ ├── schema-service-deployment.yaml
│ │ ├── postgres.yaml
│ │ ├── trino-deployment.yaml
│ │ ├── ingress.yaml
│ │ └── secrets.yaml
│ ├── VALUES_README.md # Configuration reference
│ ├── LLM_PROVIDERS.md # LLM setup guide
│ └── MIGRATIONS_README.md # Database migrations
├── DEPLOYMENT.md # Deployment guide
└── README.md # Repository overview

Quick Start

1. Clone Helm Charts Repository

git clone https://github.com/actyze/helm-charts.git
cd helm-charts

2. Configure Secrets

# Copy secrets template
cp dashboard/values-secrets.yaml.template dashboard/values-secrets.yaml

# Edit with your credentials
nano dashboard/values-secrets.yaml

Add your configuration:

secrets:
# External LLM API Key
externalLLM:
apiKey: "your-api-key-here"

# PostgreSQL Password
postgres:
password: "your-secure-password"

# Trino Credentials (if using external Trino)
trino:
user: "your-trino-username"
password: "your-trino-password"

3. Deploy to Kubernetes

helm install dashboard ./dashboard \
--namespace actyze \
--create-namespace \
--values dashboard/values.yaml \
--values dashboard/values-secrets.yaml \
--wait

4. Verify Deployment

# Check pod status
kubectl get pods -n actyze

# Check services
kubectl get svc -n actyze

# Check ingress
kubectl get ingress -n actyze

Expected output:

NAME                                READY   STATUS    RESTARTS   AGE
dashboard-frontend-xxx 1/1 Running 0 2m
dashboard-nexus-xxx 1/1 Running 0 2m
dashboard-schema-service-xxx 1/1 Running 0 2m
dashboard-postgres-0 1/1 Running 0 2m
dashboard-trino-xxx 1/1 Running 0 2m

Configuration

Main Configuration File

The values.yaml file contains all non-sensitive configuration:

# Service toggles
services:
frontend:
enabled: true
replicas: 2
nexus:
enabled: true
replicas: 3
schemaService:
enabled: true
replicas: 2
postgres:
enabled: true
trino:
enabled: true

# LLM Configuration
modelStrategy:
externalLLM:
enabled: true
provider: "anthropic"
model: "claude-sonnet-4-20250514"
baseUrl: "https://api.anthropic.com/v1/messages"
authType: "x-api-key"
extraHeaders: '{"anthropic-version": "2023-06-01"}'
maxTokens: 4096
temperature: 0.1

# Ingress Configuration
ingress:
enabled: true
className: "nginx"
hosts:
- host: analytics.yourcompany.com
paths:
- path: /
pathType: Prefix
service: frontend
- path: /api
pathType: Prefix
service: nexus

See: VALUES_README.md for all options.

Docker Image Configuration

Helm charts are configured to always pull the latest images from Docker Hub:

nexus:
image:
repository: actyze/dashboard-nexus
tag: main-llm-flex
pullPolicy: Always # Always pull latest from Docker Hub

frontend:
image:
repository: actyze/dashboard-frontend
tag: latest
pullPolicy: Always # Always pull latest from Docker Hub

schemaService:
image:
repository: actyze/dashboard-schema-service
tag: latest
pullPolicy: Always # Always pull latest from Docker Hub

Benefits:

  • Automatic updates: Get the latest features and bug fixes
  • No local builds: No need to build images from source
  • Consistent deployments: All environments use the same images from Docker Hub
  • Faster deployments: Images are pre-built and optimized

Image Pull Policy:

  • Always: Kubernetes always pulls the image, even if it exists locally
  • Ensures you're running the latest version on every deployment
  • Critical for images tagged with latest or version tags that may be updated

Service Ports and Networking

Internal Service Ports (within Kubernetes cluster):

ServiceInternal PortProtocolPurpose
Frontend80HTTPReact UI (nginx)
Nexus8002HTTPFastAPI backend API
Schema Service8000HTTPFAISS recommendations
PostgreSQL5432TCPDatabase server
Trino8080HTTPSQL query engine

External Access:

Access to Actyze is configured through Kubernetes Ingress, not direct port exposure:

ingress:
enabled: true
className: "nginx"
hosts:
- host: analytics.yourcompany.com
paths:
- path: / # Routes to Frontend (port 80)
pathType: Prefix
service: frontend
- path: /api # Routes to Nexus (port 8002)
pathType: Prefix
service: nexus

Key Points:

  • Services communicate internally using Kubernetes DNS (e.g., http://dashboard-nexus:8002)
  • External users access via Ingress hostname (e.g., https://analytics.yourcompany.com)
  • TLS/SSL termination happens at Ingress level
  • No NodePort or LoadBalancer services required

For local development/testing, use port-forwarding:

kubectl port-forward -n actyze svc/dashboard-frontend 3000:80
kubectl port-forward -n actyze svc/dashboard-nexus 8002:8002

See Helm Setup Guide - Production Access for complete Ingress configuration including SSL/TLS.

LLM Provider Configuration

Actyze supports multiple LLM providers. Configure in values.yaml:

Anthropic Claude (Recommended):

modelStrategy:
externalLLM:
enabled: true
provider: "anthropic"
model: "claude-sonnet-4-20250514"
baseUrl: "https://api.anthropic.com/v1/messages"
authType: "x-api-key"
extraHeaders: '{"anthropic-version": "2023-06-01"}'

OpenAI GPT-4:

modelStrategy:
externalLLM:
enabled: true
provider: "openai"
model: "gpt-4"
baseUrl: "https://api.openai.com/v1/chat/completions"
authType: "bearer"
extraHeaders: ''

Perplexity:

modelStrategy:
externalLLM:
enabled: true
provider: "perplexity"
model: "sonar-reasoning-pro"
baseUrl: "https://api.perplexity.ai/chat/completions"
authType: "bearer"
extraHeaders: ''

See:

Resource Configuration

Configure resource requests and limits for production-grade performance. Choose the configuration tier that matches your deployment size and performance requirements.

Production Resource Tiers

Minimum Configuration (Development/Testing):

  • Best for: Development, testing, POC environments
  • Total cluster requirements: ~2 CPUs, ~4Gi RAM (requests)
  • Note: Not recommended for production workloads
  • Use file: values-production-optimized.yaml from helm-charts repo

Recommended Configuration (Production-Grade, No Bottlenecks):

  • Best for: Standard production deployments with good performance
  • Total cluster requirements: ~12 CPUs, ~20Gi RAM (requests)
  • This is the recommended production baseline
  • Handles large schemas, complex queries, and concurrent users
  • Default settings in values.yaml

Enterprise Configuration (High-Performance, Large Scale):

  • Best for: Enterprise deployments, very large data, high concurrency
  • Total cluster requirements: ~22 CPUs, ~36Gi RAM (requests)
  • Maximum performance with no bottlenecks
  • Supports hundreds of concurrent users and complex federated queries

Resource Specifications by Service

Frontend (React/Nginx) - Lightweight static content server

TierReplicasCPU RequestMemory RequestCPU LimitMemory Limit
Minimum250m64Mi100m128Mi
Recommended2150m256Mi300m512Mi
Enterprise3-5200m512Mi500m1Gi

Nexus (FastAPI Backend) - API orchestration and LLM integration

TierReplicasCPU RequestMemory RequestCPU LimitMemory Limit
Minimum2150m256Mi300m512Mi
Recommended3500m1Gi1000m2Gi
Enterprise5-101000m2Gi2000m4Gi

Schema Service (FAISS + Embeddings) - CPU-intensive similarity search and NLP

TierReplicasCPU RequestMemory RequestCPU LimitMemory Limit
Minimum1500m1Gi1000m2Gi
Recommended14000m6Gi8000m12Gi
Enterprise18000m12Gi16000m24Gi

Notes on Schema Service:

  • CPU-intensive for FAISS similarity search and sentence transformer embeddings
  • Requires high CPU allocation for fast schema recommendations (<100ms response time)
  • Memory scales with schema size (columns, tables, metadata)
  • Does not benefit from horizontal scaling (FAISS index is in-memory)

PostgreSQL (Operational Database) - User data, query history, metadata

TierReplicasCPU RequestMemory RequestCPU LimitMemory LimitStorage
Minimum1100m256Mi250m512Mi10Gi
Recommended1500m2Gi1000m4Gi50Gi
Enterprise11000m4Gi2000m8Gi100Gi

Trino (Query Engine) - Distributed SQL engine for federated queries

TierReplicasCPU RequestMemory RequestCPU LimitMemory LimitJVM Heap
Minimum11000m2Gi2000m3Gi1.5G
Recommended16000m12Gi8000m16Gi9G
Enterprise112000m24Gi16000m32Gi18G

Notes on Trino:

  • Memory-intensive for query processing, joins, and aggregations
  • CPU-intensive for parsing, planning, and execution
  • JVM heap should be 75% of memory request
  • For large datasets and complex queries, more resources = significantly better performance
  • Consider Trino workers (additional replicas) for distributed processing of very large queries

Example: Minimum Configuration (Development/Testing Only)

# Frontend
frontend:
replicaCount: 2
resources:
requests:
memory: "64Mi"
cpu: "50m"
limits:
memory: "128Mi"
cpu: "100m"

# Nexus
nexus:
replicaCount: 2
resources:
requests:
memory: "256Mi"
cpu: "150m"
limits:
memory: "512Mi"
cpu: "300m"

# Schema Service
schemaService:
replicaCount: 1
resources:
requests:
memory: "1Gi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "1000m"
storage:
size: 2Gi
probes:
liveness:
initialDelaySeconds: 120

# PostgreSQL
postgres:
resources:
requests:
memory: "256Mi"
cpu: "100m"
limits:
memory: "512Mi"
cpu: "250m"
persistence:
size: 10Gi

# Trino
trino:
replicaCount: 1
resources:
requests:
memory: "2Gi"
cpu: "1000m"
limits:
memory: "3Gi"
cpu: "2000m"
jvm:
maxHeapSize: "1536M" # 75% of 2Gi

# Note: This configuration will have performance limitations
# Use only for development, testing, or POC environments
# Frontend - Handles static content and routing
frontend:
replicaCount: 2
resources:
requests:
memory: "256Mi"
cpu: "150m"
limits:
memory: "512Mi"
cpu: "300m"

# Nexus - API orchestration with LLM integration
nexus:
replicaCount: 3 # HA with 3 replicas
resources:
requests:
memory: "1Gi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "1000m"

# Schema Service - CPU-intensive FAISS and embeddings
schemaService:
replicaCount: 1
resources:
requests:
memory: "6Gi" # Large memory for schema metadata
cpu: "4000m" # 4 CPUs for fast similarity search
limits:
memory: "12Gi" # Burst capacity for large schemas
cpu: "8000m" # 8 CPUs for peak performance
storage:
size: 10Gi
probes:
liveness:
initialDelaySeconds: 300
readiness:
initialDelaySeconds: 60

# PostgreSQL - Operational database
postgres:
resources:
requests:
memory: "2Gi"
cpu: "500m"
limits:
memory: "4Gi"
cpu: "1000m"
persistence:
size: 50Gi # Sufficient for production metadata and query history

# Trino - Query engine for federated queries
trino:
replicaCount: 1
resources:
requests:
memory: "12Gi" # Large memory for query processing
cpu: "6000m" # 6 CPUs for complex queries
limits:
memory: "16Gi" # Burst capacity for heavy queries
cpu: "8000m" # 8 CPUs for peak performance
jvm:
maxHeapSize: "9G" # 75% of 12Gi

Example: Enterprise Configuration (Maximum Performance)

# Frontend - High availability for many concurrent users
frontend:
replicaCount: 3
resources:
requests:
memory: "512Mi"
cpu: "200m"
limits:
memory: "1Gi"
cpu: "500m"

# Nexus - Scaled for high concurrency and throughput
nexus:
replicaCount: 5 # High availability with 5 replicas
resources:
requests:
memory: "2Gi"
cpu: "1000m" # 1 CPU per replica
limits:
memory: "4Gi"
cpu: "2000m" # 2 CPUs burst capacity
env:
cache:
queryMaxSize: 500 # Larger cache for high traffic
queryTtl: 3600
llmMaxSize: 1000
llmTtl: 14400

# Schema Service - Maximum CPU for instant recommendations
schemaService:
replicaCount: 1
resources:
requests:
memory: "12Gi" # Large memory for extensive schemas
cpu: "8000m" # 8 CPUs for sub-50ms response times
limits:
memory: "24Gi" # Burst capacity for very large schemas
cpu: "16000m" # 16 CPUs for peak performance
storage:
size: 20Gi
probes:
liveness:
initialDelaySeconds: 300
readiness:
initialDelaySeconds: 60

# PostgreSQL - High-performance database
postgres:
resources:
requests:
memory: "4Gi"
cpu: "1000m"
limits:
memory: "8Gi"
cpu: "2000m"
persistence:
size: 100Gi # Large storage for extensive query history

# Trino - Maximum performance for complex federated queries
trino:
replicaCount: 1 # Consider adding workers for distributed processing
resources:
requests:
memory: "24Gi" # Large memory for complex joins and aggregations
cpu: "12000m" # 12 CPUs for fast query execution
limits:
memory: "32Gi" # Burst capacity for very complex queries
cpu: "16000m" # 16 CPUs for peak performance
jvm:
maxHeapSize: "18G" # 75% of 24Gi
additionalOptions:
- "--add-opens=java.base/java.nio=ALL-UNNAMED"
- "-XX:+UseG1GC"
- "-XX:G1HeapRegionSize=32M"

Cluster Size Calculator

Minimum Configuration Total:

  • CPU Requests: ~2.0 CPUs
  • Memory Requests: ~4.0Gi
  • CPU Limits: ~4.05 CPUs
  • Memory Limits: ~6.75Gi
  • Recommended cluster: 1-2 nodes × 4 CPU × 8Gi RAM
  • Use case: Development, testing, POC only

Recommended Configuration Total (Production-Grade):

  • CPU Requests: ~12.65 CPUs (4 CPUs Schema + 6 CPUs Trino + 1.5 CPUs Nexus + 0.5 CPU Postgres + 0.3 CPU Frontend + 0.35 CPU system)
  • Memory Requests: ~21.25Gi (6Gi Schema + 12Gi Trino + 3Gi Nexus + 2Gi Postgres + 0.5Gi Frontend + ~0.75Gi system)
  • CPU Limits: ~21.3 CPUs
  • Memory Limits: ~34.5Gi
  • Recommended cluster: 4-5 nodes × 8 CPU × 16Gi RAM OR 3 nodes × 16 CPU × 32Gi RAM
  • Use case: Standard production with no performance bottlenecks

Enterprise Configuration Total (Maximum Performance):

  • CPU Requests: ~22.7 CPUs (8 CPUs Schema + 12 CPUs Trino + 5 CPUs Nexus + 1 CPU Postgres + 0.6 CPU Frontend + 0.1 CPU system)
  • Memory Requests: ~38.6Gi (12Gi Schema + 24Gi Trino + 10Gi Nexus + 4Gi Postgres + 2.56Gi Frontend + ~2Gi system)
  • CPU Limits: ~37 CPUs
  • Memory Limits: ~69Gi
  • Recommended cluster: 3 nodes × 16 CPU × 32Gi RAM OR 2 nodes × 32 CPU × 64Gi RAM
  • Use case: Enterprise deployments, very large data, hundreds of concurrent users

Autoscaling Configuration

Enable horizontal pod autoscaling for dynamic traffic. Autoscaling is recommended for production deployments to handle traffic spikes efficiently.

Minimum Configuration (Conservative):

autoscaling:
nexus:
enabled: true
minReplicas: 2
maxReplicas: 4
targetCPUUtilizationPercentage: 80 # Scale less aggressively
frontend:
enabled: true
minReplicas: 2
maxReplicas: 3
targetCPUUtilizationPercentage: 80
schemaService:
enabled: false # Typically doesn't benefit from scaling

Recommended Configuration (Production Standard):

autoscaling:
nexus:
enabled: true
minReplicas: 3
maxReplicas: 10
targetCPUUtilizationPercentage: 70 # Scale proactively
frontend:
enabled: true
minReplicas: 2
maxReplicas: 5
targetCPUUtilizationPercentage: 70
schemaService:
enabled: false # Does not benefit from horizontal scaling

Enterprise Configuration (Maximum Performance):

autoscaling:
nexus:
enabled: true
minReplicas: 5
maxReplicas: 20 # High scale-out capacity
targetCPUUtilizationPercentage: 60 # Very aggressive scaling
frontend:
enabled: true
minReplicas: 3
maxReplicas: 10
targetCPUUtilizationPercentage: 60
schemaService:
enabled: false

Notes:

  • Schema Service does not benefit from horizontal scaling (FAISS index is in-memory, not distributed)
  • For very large schemas, scale vertically (more CPU/memory) not horizontally
  • Lower targetCPUUtilizationPercentage = more aggressive scaling = better performance but higher cost
  • Enterprise tier scales early to maintain consistently low response times under any load

Storage Configuration

Configure persistent volumes:

persistence:
postgres:
enabled: true
storageClass: "standard"
size: "10Gi"

schemaService:
enabled: true
storageClass: "standard"
size: "5Gi"

Operational Configuration

Configure timeouts, caching, connection pools, and other operational parameters for production performance.

Cache Configuration

Actyze uses in-memory caching to reduce load on databases and LLM APIs. Configure cache sizes and TTLs based on your workload:

Development/Testing:

nexus:
env:
cache:
enabled: true
type: "memory"
queryMaxSize: 100 # Number of query results to cache
queryTtl: 1800 # 30 minutes
llmMaxSize: 200 # Number of LLM responses to cache
llmTtl: 7200 # 2 hours (LLM calls are expensive)

Recommended (Production):

nexus:
env:
cache:
enabled: true
type: "memory"
queryMaxSize: 1000 # Larger cache for high traffic
queryTtl: 3600 # 1 hour
llmMaxSize: 500 # More LLM response caching
llmTtl: 14400 # 4 hours
schemaMaxSize: 1000 # FAISS schema recommendations
schemaTtl: 7200 # 2 hours
metadataMaxSize: 500 # Schema metadata
metadataTtl: 3600 # 1 hour

Enterprise (High Performance):

nexus:
env:
cache:
enabled: true
type: "memory"
queryMaxSize: 5000 # Very large cache
queryTtl: 7200 # 2 hours
llmMaxSize: 2000 # Extensive LLM caching
llmTtl: 28800 # 8 hours
schemaMaxSize: 5000 # Large schema cache
schemaTtl: 14400 # 4 hours
metadataMaxSize: 2000 # Extensive metadata cache
metadataTtl: 7200 # 2 hours

Cache Guidelines:

  • Query Cache: Caches SQL query results. Higher values reduce database load but use more memory.
  • LLM Cache: Caches LLM API responses. Essential for cost reduction (LLM calls are expensive).
  • Schema Cache: Caches FAISS similarity search results. Reduces load on Schema Service.
  • TTL (Time-To-Live): Balance between freshness and performance. Longer TTL = fewer API/DB calls but potentially stale data.

Timeout Configuration

Configure timeouts to prevent hung requests and ensure system responsiveness:

Development/Testing:

nexus:
env:
# SQL Execution Timeouts
# defaultTimeoutSeconds also drives the frontend HTTP timeout (+30s buffer)
sqlExecution:
defaultTimeoutSeconds: 120 # Query execution timeout (frontend waits 150s)
defaultMaxResults: 100 # Max rows returned

# External Service Timeouts
schemaServiceTimeout: 10 # Schema service calls
llmServiceTimeout: 120 # LLM API calls
trinoTimeout: 120 # Trino query timeout

# Ingress Timeouts — must exceed defaultTimeoutSeconds + buffer
ingress:
annotations:
nginx.ingress.kubernetes.io/proxy-read-timeout: "300" # 5 minutes
nginx.ingress.kubernetes.io/proxy-send-timeout: "300"
nginx.ingress.kubernetes.io/proxy-connect-timeout: "60"

Recommended (Production):

nexus:
env:
# SQL Execution Timeouts
# defaultTimeoutSeconds also drives the frontend HTTP timeout (+30s buffer)
sqlExecution:
defaultTimeoutSeconds: 120 # Allow complex queries (frontend waits 150s)
defaultMaxResults: 1000 # More results for analytics

# External Service Timeouts
schemaServiceTimeout: 15 # FAISS can be CPU-intensive
llmServiceTimeout: 120 # LLMs can be slow
trinoTimeout: 300 # Complex federated queries need time

# Ingress Timeouts — must exceed defaultTimeoutSeconds + buffer
ingress:
annotations:
nginx.ingress.kubernetes.io/proxy-read-timeout: "600" # 10 minutes
nginx.ingress.kubernetes.io/proxy-send-timeout: "600"
nginx.ingress.kubernetes.io/proxy-connect-timeout: "120"

Enterprise (Long-Running Queries):

nexus:
env:
# SQL Execution Timeouts
sqlExecution:
defaultTimeoutSeconds: 300 # 5 minutes for complex analytics
defaultMaxResults: 10000 # Large result sets

# External Service Timeouts
schemaServiceTimeout: 30 # More time for large schemas
llmServiceTimeout: 120 # Complex LLM reasoning
trinoTimeout: 600 # 10 minutes for very complex queries

# Ingress Timeouts
ingress:
annotations:
nginx.ingress.kubernetes.io/proxy-read-timeout: "1800" # 30 minutes
nginx.ingress.kubernetes.io/proxy-send-timeout: "1800"
nginx.ingress.kubernetes.io/proxy-connect-timeout: "300"

Timeout Guidelines:

  • SQL Timeouts: Set based on expected query complexity. Too short = failed queries, too long = hung connections.
  • LLM Timeouts: LLM APIs can be slow during high demand. Set conservatively.
  • Ingress Timeouts: Must be >= longest expected API call. Important for file uploads and long-running queries.

Database Connection Pool

Configure PostgreSQL connection pooling for optimal performance:

Development/Testing:

nexus:
env:
postgres:
poolSize: 10 # Connections per Nexus replica
maxOverflow: 10 # Additional connections during spikes
poolTimeout: 30 # Seconds to wait for connection
poolRecycle: 3600 # Recycle connections after 1 hour

Recommended (Production):

nexus:
env:
postgres:
poolSize: 20 # Base connection pool
maxOverflow: 30 # Allow burst traffic
poolTimeout: 30
poolRecycle: 3600
poolPrePing: true # Verify connections before use

Enterprise (High Concurrency):

nexus:
env:
postgres:
poolSize: 50 # Large pool for high concurrency
maxOverflow: 50 # Substantial overflow capacity
poolTimeout: 60 # More patient during load spikes
poolRecycle: 1800 # Recycle more frequently
poolPrePing: true

Connection Pool Guidelines:

  • Pool Size: Base connections maintained per Nexus replica. Total = poolSize × replicas.
  • Max Overflow: Additional connections during traffic spikes. Prevents connection exhaustion.
  • Pool Timeout: How long to wait for an available connection. Too short = connection errors.
  • Calculate Total: With 3 Nexus replicas, poolSize=20, maxOverflow=30 = max 150 connections.
  • PostgreSQL max_connections: Ensure PostgreSQL max_connections > total pool capacity.

Retry & Circuit Breaker

Configure retry logic for transient failures:

Recommended (Production):

nexus:
env:
# Schema Service Retries
schemaServiceRetries: 3
schemaServiceRetryDelay: 1 # Seconds between retries
schemaServiceRetryBackoff: 2 # Exponential backoff multiplier

# LLM API Retries
llmServiceRetries: 3
llmServiceRetryDelay: 2
llmServiceRetryBackoff: 2

# Trino Query Retries
trinoRetries: 2
trinoRetryDelay: 5

Enterprise (High Reliability):

nexus:
env:
# More aggressive retries for mission-critical operations
schemaServiceRetries: 5
schemaServiceRetryDelay: 1
schemaServiceRetryBackoff: 2
schemaServiceCircuitBreakerThreshold: 5 # Open circuit after 5 failures
schemaServiceCircuitBreakerTimeout: 60 # Try again after 60s

llmServiceRetries: 5
llmServiceRetryDelay: 2
llmServiceRetryBackoff: 2
llmServiceCircuitBreakerThreshold: 5
llmServiceCircuitBreakerTimeout: 120

trinoRetries: 3
trinoRetryDelay: 5

Rate Limiting

Configure rate limiting for API protection:

Ingress Rate Limiting:

ingress:
annotations:
nginx.ingress.kubernetes.io/limit-rps: "100" # Requests per second per IP
nginx.ingress.kubernetes.io/limit-burst-multiplier: "5" # Allow bursts up to 5x
nginx.ingress.kubernetes.io/limit-connections: "50" # Concurrent connections per IP

Application-Level Rate Limiting:

nexus:
env:
rateLimit:
enabled: true
queriesPerMinute: 60 # Per user
queriesPerHour: 1000
llmCallsPerHour: 100 # Expensive operations

Query Result Limits

Prevent memory exhaustion from large result sets:

nexus:
env:
sqlExecution:
defaultMaxResults: 1000 # Default row limit
maxMaxResults: 100000 # Absolute maximum
streamingThreshold: 10000 # Stream results above this size

File Upload Limits

Configure file upload limits for CSV/Excel features:

ingress:
annotations:
nginx.ingress.kubernetes.io/proxy-body-size: "100m" # Max upload size

nexus:
env:
fileUpload:
maxSize: 104857600 # 100MB in bytes
allowedExtensions: [".csv", ".xlsx", ".xls"]
maxRows: 1000000 # 1 million rows max

Complete Production Example

nexus:
replicaCount: 3
resources:
requests:
memory: "1Gi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "1000m"
env:
debug: false
logLevel: "INFO"

# Cache Configuration
cache:
enabled: true
queryMaxSize: 1000
queryTtl: 3600
llmMaxSize: 500
llmTtl: 14400
schemaMaxSize: 1000
schemaTtl: 7200

# Timeout Configuration
sqlExecution:
defaultTimeoutSeconds: 60
defaultMaxResults: 1000
schemaServiceTimeout: 15
llmServiceTimeout: 60
trinoTimeout: 120

# Connection Pool
postgres:
poolSize: 20
maxOverflow: 30
poolTimeout: 30
poolRecycle: 3600
poolPrePing: true

# Retry Logic
schemaServiceRetries: 3
schemaServiceRetryDelay: 1
llmServiceRetries: 3
llmServiceRetryDelay: 2
trinoRetries: 2

# Rate Limiting
rateLimit:
enabled: true
queriesPerMinute: 60
llmCallsPerHour: 100

# Ingress Configuration
ingress:
enabled: true
annotations:
# Timeouts
nginx.ingress.kubernetes.io/proxy-read-timeout: "600"
nginx.ingress.kubernetes.io/proxy-send-timeout: "600"
nginx.ingress.kubernetes.io/proxy-connect-timeout: "120"

# Rate Limiting
nginx.ingress.kubernetes.io/limit-rps: "100"
nginx.ingress.kubernetes.io/limit-connections: "50"

# File Upload
nginx.ingress.kubernetes.io/proxy-body-size: "100m"

Ingress Setup

Basic Ingress

ingress:
enabled: true
className: "nginx"
hosts:
- host: analytics.yourcompany.com
paths:
- path: /
pathType: Prefix
service: frontend
- path: /api
pathType: Prefix
service: nexus

SSL/TLS with cert-manager

ingress:
enabled: true
className: "nginx"
annotations:
cert-manager.io/cluster-issuer: "letsencrypt-prod"
nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
hosts:
- host: analytics.yourcompany.com
paths:
- path: /
pathType: Prefix
service: frontend
- path: /api
pathType: Prefix
service: nexus
tls:
- secretName: dashboard-tls
hosts:
- analytics.yourcompany.com

Cloud Provider Ingress

AWS ALB:

ingress:
className: "alb"
annotations:
alb.ingress.kubernetes.io/scheme: "internet-facing"
alb.ingress.kubernetes.io/target-type: "ip"
alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80}, {"HTTPS": 443}]'

GCP GKE:

ingress:
className: "gce"
annotations:
kubernetes.io/ingress.global-static-ip-name: "dashboard-ip"

Azure AKS:

ingress:
className: "nginx"
annotations:
kubernetes.io/ingress.class: "nginx"
nginx.ingress.kubernetes.io/use-regex: "true"

Access Methods

Port Forwarding (Development)

For local testing without Ingress:

# Forward frontend port (internal port 80 → local port 3000)
kubectl port-forward -n actyze svc/dashboard-frontend 3000:80

# Forward API port (internal port 8002 → local port 8002)
kubectl port-forward -n actyze svc/dashboard-nexus 8002:8002

Open http://localhost:3000 for the UI and http://localhost:8002/docs for the API.

Ingress (Production)

Configure DNS to point to your ingress controller:

# Get ingress IP
kubectl get ingress -n actyze

# Add DNS A record
# analytics.yourcompany.com → <INGRESS_IP>

Access at https://analytics.yourcompany.com

Management Commands

Upgrade Deployment

# Pull latest chart changes
cd helm-charts
git pull origin main

# Upgrade release
helm upgrade dashboard ./dashboard \
-f dashboard/values.yaml \
-f dashboard/values-secrets.yaml \
-n actyze

Rollback Deployment

# View release history
helm history dashboard -n actyze

# Rollback to previous version
helm rollback dashboard -n actyze

# Rollback to specific revision
helm rollback dashboard 2 -n actyze

View Configuration

# View current values
helm get values dashboard -n actyze

# View all values (including defaults)
helm get values dashboard -n actyze --all

# View rendered manifests
helm get manifest dashboard -n actyze

Uninstall

# Uninstall release (keeps PVCs)
helm uninstall dashboard -n actyze

# Delete namespace and all resources
kubectl delete namespace actyze

Troubleshooting

Pods Not Starting

# Check pod status
kubectl get pods -n actyze

# Describe pod for events
kubectl describe pod <pod-name> -n actyze

# Check logs
kubectl logs <pod-name> -n actyze

# Check previous logs (if crashed)
kubectl logs <pod-name> -n actyze --previous

Common issues:

  • ImagePullBackOff: Check image names and registry access
  • CrashLoopBackOff: Check logs for application errors
  • Pending: Check resource availability and storage class
  • OOMKilled: Increase memory limits

Database Connection Issues

# Check PostgreSQL pod
kubectl get pods -n actyze -l app=postgres

# View PostgreSQL logs
kubectl logs -n actyze deployment/dashboard-postgres

# Verify secret
kubectl get secret dashboard-secrets -n actyze -o yaml

# Test connection from Nexus pod
kubectl exec -it -n actyze deployment/dashboard-nexus -- \
psql -h dashboard-postgres -U dashboard_user -d dashboard

LLM API Issues

# Check Nexus logs for API errors
kubectl logs -n actyze deployment/dashboard-nexus | grep -i "llm\|error"

# Verify environment variables
kubectl exec -n actyze deployment/dashboard-nexus -- env | grep EXTERNAL_LLM

# Check secret
kubectl get secret dashboard-secrets -n actyze -o jsonpath='{.data.EXTERNAL_LLM_API_KEY}' | base64 -d

Ingress Issues

# Check ingress status
kubectl get ingress -n actyze
kubectl describe ingress dashboard-ingress -n actyze

# Check ingress controller
kubectl get pods -n ingress-nginx

# View ingress logs
kubectl logs -n ingress-nginx deployment/ingress-nginx-controller

# Test DNS resolution
nslookup analytics.yourcompany.com

Storage Issues

# Check PVCs
kubectl get pvc -n actyze

# Check storage class
kubectl get storageclass

# Describe PVC for events
kubectl describe pvc <pvc-name> -n actyze

# Check PV
kubectl get pv

Performance Issues

# Check resource usage
kubectl top pods -n actyze
kubectl top nodes

# Check HPA status
kubectl get hpa -n actyze
kubectl describe hpa dashboard-nexus-hpa -n actyze

# View metrics
kubectl get --raw /apis/metrics.k8s.io/v1beta1/nodes

Monitoring

Health Checks

# Port-forward and test health endpoint
kubectl port-forward -n actyze svc/dashboard-nexus 8000:8000

curl http://localhost:8000/health

View Logs

# All pods
kubectl logs -n actyze -l app.kubernetes.io/name=dashboard --tail=100

# Specific service
kubectl logs -n actyze deployment/dashboard-nexus --tail=100 -f

# Multiple pods
kubectl logs -n actyze -l app=nexus --tail=50 --all-containers=true

Resource Usage

# Pod resource usage
kubectl top pods -n actyze

# Node resource usage
kubectl top nodes

# Detailed pod info
kubectl describe pod <pod-name> -n actyze

Production Best Practices

High Availability

services:
nexus:
replicas: 3
frontend:
replicas: 2
schemaService:
replicas: 2

# Enable pod disruption budgets
podDisruptionBudget:
enabled: true
minAvailable: 1

# Anti-affinity rules
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
topologyKey: kubernetes.io/hostname

Security

# Use Kubernetes secrets
secrets:
externalLLM:
apiKey: "use-sealed-secrets-or-external-secrets-operator"

# Enable network policies
networkPolicy:
enabled: true

# Use service accounts
serviceAccount:
create: true
name: "dashboard-sa"

# Security context
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000

Monitoring & Logging

# Enable Prometheus metrics
monitoring:
enabled: true
serviceMonitor:
enabled: true

# Configure logging
logging:
level: "INFO"
format: "json"

# Health checks
livenessProbe:
enabled: true
initialDelaySeconds: 30
periodSeconds: 10

readinessProbe:
enabled: true
initialDelaySeconds: 10
periodSeconds: 5

Backup & Recovery

# Backup PostgreSQL
kubectl exec -n actyze deployment/dashboard-postgres -- \
pg_dump -U dashboard_user dashboard > backup.sql

# Backup PVC
kubectl get pvc -n actyze
# Use Velero or cloud provider backup solutions

# Restore database
kubectl exec -i -n actyze deployment/dashboard-postgres -- \
psql -U dashboard_user dashboard < backup.sql

Database Migrations

Migrations run automatically via Kubernetes Job:

# Check migration status
kubectl get jobs -n actyze
kubectl logs -n actyze job/dashboard-db-migration

# Manual migration (if needed)
kubectl exec -it -n actyze deployment/dashboard-nexus -- \
alembic upgrade head

# Rollback migration
kubectl exec -it -n actyze deployment/dashboard-nexus -- \
alembic downgrade -1

Upgrading

Update Helm Charts

# Pull latest changes
cd helm-charts
git pull origin main

# Review changes
git log --oneline -10
git diff HEAD~5 dashboard/

# Upgrade deployment
helm upgrade dashboard ./dashboard \
-f dashboard/values.yaml \
-f dashboard/values-secrets.yaml \
-n actyze

Update Docker Images

# Update image tags in values.yaml
services:
nexus:
image:
tag: "v1.2.0" # Update version

# Upgrade
helm upgrade dashboard ./dashboard \
-f dashboard/values.yaml \
-f dashboard/values-secrets.yaml \
-n actyze

# Or force pod restart
kubectl rollout restart deployment/dashboard-nexus -n actyze

CI/CD Integration

GitHub Actions Example

name: Deploy to Kubernetes

on:
push:
branches: [main]

jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2

- name: Configure kubectl
uses: azure/k8s-set-context@v1
with:
kubeconfig: ${{ secrets.KUBE_CONFIG }}

- name: Deploy with Helm
run: |
helm upgrade --install dashboard ./dashboard \
-f dashboard/values.yaml \
--set secrets.externalLLM.apiKey=${{ secrets.LLM_API_KEY }} \
-n actyze \
--create-namespace

Next Steps

Additional Resources

Support

For issues and questions: