Helm Deployment

Deploy Actyze to Kubernetes using Helm charts for production environments.

Overview

Helm charts provide production-ready Kubernetes deployment with:

High availability with multiple replicas
Horizontal pod autoscaling based on CPU/memory
Persistent storage for databases and models
Ingress configuration for external access
Health checks and monitoring
Rolling updates with zero downtime
Always pulls latest images from Docker Hub for automatic updates

Prerequisites

Kubernetes cluster (v1.24+)
Helm 3.x installed
kubectl configured to access your cluster
4GB+ RAM per node
Storage class for persistent volumes

Repository Structure

Helm charts are maintained in a separate repository:

Repository: https://github.com/actyze/helm-charts

helm-charts/
├── dashboard/
│   ├── Chart.yaml                      # Chart metadata
│   ├── values.yaml                     # Main configuration
│   ├── values-secrets.yaml.template    # Secrets template
│   ├── templates/                      # Kubernetes manifests
│   │   ├── frontend-deployment.yaml
│   │   ├── nexus-deployment.yaml
│   │   ├── schema-service-deployment.yaml
│   │   ├── postgres.yaml
│   │   ├── trino-deployment.yaml
│   │   ├── ingress.yaml
│   │   └── secrets.yaml
│   ├── VALUES_README.md                # Configuration reference
│   ├── LLM_PROVIDERS.md                # LLM setup guide
│   └── MIGRATIONS_README.md            # Database migrations
├── DEPLOYMENT.md                       # Deployment guide
└── README.md                           # Repository overview

Quick Start

1. Clone Helm Charts Repository

git clone https://github.com/actyze/helm-charts.git
cd helm-charts

2. Configure Secrets

# Copy secrets template
cp dashboard/values-secrets.yaml.template dashboard/values-secrets.yaml

# Edit with your credentials
nano dashboard/values-secrets.yaml

Add your configuration:

secrets:
  # External LLM API Key
  externalLLM:
    apiKey: "your-api-key-here"
  
  # PostgreSQL Password
  postgres:
    password: "your-secure-password"
  
  # Trino Credentials (if using external Trino)
  trino:
    user: "your-trino-username"
    password: "your-trino-password"

3. Deploy to Kubernetes

helm install dashboard ./dashboard \
  --namespace actyze \
  --create-namespace \
  --values dashboard/values.yaml \
  --values dashboard/values-secrets.yaml \
  --wait

4. Verify Deployment

# Check pod status
kubectl get pods -n actyze

# Check services
kubectl get svc -n actyze

# Check ingress
kubectl get ingress -n actyze

Expected output:

NAME                                READY   STATUS    RESTARTS   AGE
dashboard-frontend-xxx              1/1     Running   0          2m
dashboard-nexus-xxx                 1/1     Running   0          2m
dashboard-schema-service-xxx        1/1     Running   0          2m
dashboard-postgres-0                1/1     Running   0          2m
dashboard-trino-xxx                 1/1     Running   0          2m

Configuration

Main Configuration File

The values.yaml file contains all non-sensitive configuration:

# Service toggles
services:
  frontend:
    enabled: true
    replicas: 2
  nexus:
    enabled: true
    replicas: 3
  schemaService:
    enabled: true
  replicas: 2
  postgres:
    enabled: true
  trino:
    enabled: true

# LLM Configuration
modelStrategy:
  externalLLM:
    enabled: true
    provider: "anthropic"
    model: "claude-sonnet-4-20250514"
    baseUrl: "https://api.anthropic.com/v1/messages"
    authType: "x-api-key"
    extraHeaders: '{"anthropic-version": "2023-06-01"}'
    maxTokens: 4096
    temperature: 0.1

# Ingress Configuration
ingress:
  enabled: true
  className: "nginx"
  hosts:
    - host: analytics.yourcompany.com
      paths:
        - path: /
          pathType: Prefix
          service: frontend
        - path: /api
          pathType: Prefix
          service: nexus

See: VALUES_README.md for all options.

Docker Image Configuration

Helm charts are configured to always pull the latest images from Docker Hub:

nexus:
  image:
    repository: actyze/dashboard-nexus
    tag: main-llm-flex
    pullPolicy: Always  # Always pull latest from Docker Hub

frontend:
  image:
    repository: actyze/dashboard-frontend
    tag: latest
    pullPolicy: Always  # Always pull latest from Docker Hub

schemaService:
  image:
    repository: actyze/dashboard-schema-service
    tag: latest
    pullPolicy: Always  # Always pull latest from Docker Hub

Benefits:

Automatic updates: Get the latest features and bug fixes
No local builds: No need to build images from source
Consistent deployments: All environments use the same images from Docker Hub
Faster deployments: Images are pre-built and optimized

Image Pull Policy:

Always: Kubernetes always pulls the image, even if it exists locally
Ensures you're running the latest version on every deployment
Critical for images tagged with latest or version tags that may be updated

Service Ports and Networking

Internal Service Ports (within Kubernetes cluster):

Service	Internal Port	Protocol	Purpose
Frontend	80	HTTP	React UI (nginx)
Nexus	8002	HTTP	FastAPI backend API
Schema Service	8000	HTTP	FAISS recommendations
PostgreSQL	5432	TCP	Database server
Trino	8080	HTTP	SQL query engine

External Access:

Access to Actyze is configured through Kubernetes Ingress, not direct port exposure:

ingress:
  enabled: true
  className: "nginx"
  hosts:
    - host: analytics.yourcompany.com
      paths:
        - path: /           # Routes to Frontend (port 80)
          pathType: Prefix
          service: frontend
        - path: /api        # Routes to Nexus (port 8002)
          pathType: Prefix
          service: nexus

Key Points:

Services communicate internally using Kubernetes DNS (e.g., http://dashboard-nexus:8002)
External users access via Ingress hostname (e.g., https://analytics.yourcompany.com)
TLS/SSL termination happens at Ingress level
No NodePort or LoadBalancer services required

For local development/testing, use port-forwarding:

kubectl port-forward -n actyze svc/dashboard-frontend 3000:80
kubectl port-forward -n actyze svc/dashboard-nexus 8002:8002

See Helm Setup Guide - Production Access for complete Ingress configuration including SSL/TLS.

LLM Provider Configuration

Actyze supports multiple LLM providers. Configure in values.yaml:

Anthropic Claude (Recommended):

modelStrategy:
  externalLLM:
    enabled: true
    provider: "anthropic"
    model: "claude-sonnet-4-20250514"
    baseUrl: "https://api.anthropic.com/v1/messages"
    authType: "x-api-key"
    extraHeaders: '{"anthropic-version": "2023-06-01"}'

OpenAI GPT-4:

modelStrategy:
  externalLLM:
    enabled: true
    provider: "openai"
    model: "gpt-4"
    baseUrl: "https://api.openai.com/v1/chat/completions"
    authType: "bearer"
    extraHeaders: ''

Perplexity:

modelStrategy:
  externalLLM:
    enabled: true
    provider: "perplexity"
    model: "sonar-reasoning-pro"
    baseUrl: "https://api.perplexity.ai/chat/completions"
    authType: "bearer"
    extraHeaders: ''

See:

AI Providers - All 100+ supported providers
LLM Provider Configuration - Detailed setup guide

Resource Configuration

Configure resource requests and limits for production-grade performance. Choose the configuration tier that matches your deployment size and performance requirements.

Production Resource Tiers

Minimum Configuration (Development/Testing):

Best for: Development, testing, POC environments
Total cluster requirements: ~2 CPUs, ~4Gi RAM (requests)
Note: Not recommended for production workloads
Use file: values-production-optimized.yaml from helm-charts repo

Recommended Configuration (Production-Grade, No Bottlenecks):

Best for: Standard production deployments with good performance
Total cluster requirements: ~12 CPUs, ~20Gi RAM (requests)
This is the recommended production baseline
Handles large schemas, complex queries, and concurrent users
Default settings in values.yaml

Enterprise Configuration (High-Performance, Large Scale):

Best for: Enterprise deployments, very large data, high concurrency
Total cluster requirements: ~22 CPUs, ~36Gi RAM (requests)
Maximum performance with no bottlenecks
Supports hundreds of concurrent users and complex federated queries

Resource Specifications by Service

Frontend (React/Nginx) - Lightweight static content server

Tier	Replicas	CPU Request	Memory Request	CPU Limit	Memory Limit
Minimum	2	50m	64Mi	100m	128Mi
Recommended	2	150m	256Mi	300m	512Mi
Enterprise	3-5	200m	512Mi	500m	1Gi

Nexus (FastAPI Backend) - API orchestration and LLM integration

Tier	Replicas	CPU Request	Memory Request	CPU Limit	Memory Limit
Minimum	2	150m	256Mi	300m	512Mi
Recommended	3	500m	1Gi	1000m	2Gi
Enterprise	5-10	1000m	2Gi	2000m	4Gi

Schema Service (FAISS + Embeddings) - CPU-intensive similarity search and NLP

Tier	Replicas	CPU Request	Memory Request	CPU Limit	Memory Limit
Minimum	1	500m	1Gi	1000m	2Gi
Recommended	1	4000m	6Gi	8000m	12Gi
Enterprise	1	8000m	12Gi	16000m	24Gi

Notes on Schema Service:

CPU-intensive for FAISS similarity search and sentence transformer embeddings
Requires high CPU allocation for fast schema recommendations (<100ms response time)
Memory scales with schema size (columns, tables, metadata)
Does not benefit from horizontal scaling (FAISS index is in-memory)

PostgreSQL (Operational Database) - User data, query history, metadata

Tier	Replicas	CPU Request	Memory Request	CPU Limit	Memory Limit	Storage
Minimum	1	100m	256Mi	250m	512Mi	10Gi
Recommended	1	500m	2Gi	1000m	4Gi	50Gi
Enterprise	1	1000m	4Gi	2000m	8Gi	100Gi

Trino (Query Engine) - Distributed SQL engine for federated queries

Tier	Replicas	CPU Request	Memory Request	CPU Limit	Memory Limit	JVM Heap
Minimum	1	1000m	2Gi	2000m	3Gi	1.5G
Recommended	1	6000m	12Gi	8000m	16Gi	9G
Enterprise	1	12000m	24Gi	16000m	32Gi	18G

Notes on Trino:

Memory-intensive for query processing, joins, and aggregations
CPU-intensive for parsing, planning, and execution
JVM heap should be 75% of memory request
For large datasets and complex queries, more resources = significantly better performance
Consider Trino workers (additional replicas) for distributed processing of very large queries

Example: Minimum Configuration (Development/Testing Only)

# Frontend
frontend:
  replicaCount: 2
  resources:
    requests:
      memory: "64Mi"
      cpu: "50m"
    limits:
      memory: "128Mi"
      cpu: "100m"

# Nexus
nexus:
  replicaCount: 2
  resources:
    requests:
      memory: "256Mi"
      cpu: "150m"
    limits:
      memory: "512Mi"
      cpu: "300m"

# Schema Service
schemaService:
  replicaCount: 1
  resources:
    requests:
      memory: "1Gi"
      cpu: "500m"
    limits:
      memory: "2Gi"
      cpu: "1000m"
  storage:
    size: 2Gi
  probes:
    liveness:
      initialDelaySeconds: 120

# PostgreSQL
postgres:
  resources:
    requests:
      memory: "256Mi"
      cpu: "100m"
    limits:
      memory: "512Mi"
      cpu: "250m"
  persistence:
    size: 10Gi

# Trino
trino:
  replicaCount: 1
  resources:
    requests:
      memory: "2Gi"
      cpu: "1000m"
    limits:
      memory: "3Gi"
      cpu: "2000m"
  jvm:
    maxHeapSize: "1536M"  # 75% of 2Gi

# Note: This configuration will have performance limitations
# Use only for development, testing, or POC environments

Example: Recommended Configuration (Production-Grade)

# Frontend - Handles static content and routing
frontend:
  replicaCount: 2
  resources:
    requests:
      memory: "256Mi"
      cpu: "150m"
    limits:
      memory: "512Mi"
      cpu: "300m"

# Nexus - API orchestration with LLM integration
nexus:
  replicaCount: 3  # HA with 3 replicas
  resources:
    requests:
      memory: "1Gi"
      cpu: "500m"
    limits:
      memory: "2Gi"
      cpu: "1000m"

# Schema Service - CPU-intensive FAISS and embeddings
schemaService:
  replicaCount: 1
  resources:
    requests:
      memory: "6Gi"      # Large memory for schema metadata
      cpu: "4000m"       # 4 CPUs for fast similarity search
    limits:
      memory: "12Gi"     # Burst capacity for large schemas
      cpu: "8000m"       # 8 CPUs for peak performance
  storage:
    size: 10Gi
  probes:
    liveness:
      initialDelaySeconds: 300
    readiness:
      initialDelaySeconds: 60

# PostgreSQL - Operational database
postgres:
  resources:
    requests:
      memory: "2Gi"
      cpu: "500m"
    limits:
      memory: "4Gi"
      cpu: "1000m"
  persistence:
    size: 50Gi  # Sufficient for production metadata and query history

# Trino - Query engine for federated queries
trino:
  replicaCount: 1
  resources:
    requests:
      memory: "12Gi"     # Large memory for query processing
      cpu: "6000m"       # 6 CPUs for complex queries
    limits:
      memory: "16Gi"     # Burst capacity for heavy queries
      cpu: "8000m"       # 8 CPUs for peak performance
  jvm:
    maxHeapSize: "9G"    # 75% of 12Gi

Example: Enterprise Configuration (Maximum Performance)

# Frontend - High availability for many concurrent users
frontend:
  replicaCount: 3
  resources:
    requests:
      memory: "512Mi"
      cpu: "200m"
    limits:
      memory: "1Gi"
      cpu: "500m"

# Nexus - Scaled for high concurrency and throughput
nexus:
  replicaCount: 5  # High availability with 5 replicas
  resources:
    requests:
      memory: "2Gi"
      cpu: "1000m"     # 1 CPU per replica
    limits:
      memory: "4Gi"
      cpu: "2000m"     # 2 CPUs burst capacity
  env:
    cache:
      queryMaxSize: 500   # Larger cache for high traffic
      queryTtl: 3600
      llmMaxSize: 1000
      llmTtl: 14400

# Schema Service - Maximum CPU for instant recommendations
schemaService:
  replicaCount: 1
  resources:
    requests:
      memory: "12Gi"     # Large memory for extensive schemas
      cpu: "8000m"       # 8 CPUs for sub-50ms response times
    limits:
      memory: "24Gi"     # Burst capacity for very large schemas
      cpu: "16000m"      # 16 CPUs for peak performance
  storage:
    size: 20Gi
  probes:
    liveness:
      initialDelaySeconds: 300
    readiness:
      initialDelaySeconds: 60

# PostgreSQL - High-performance database
postgres:
  resources:
    requests:
      memory: "4Gi"
      cpu: "1000m"
    limits:
      memory: "8Gi"
      cpu: "2000m"
  persistence:
    size: 100Gi  # Large storage for extensive query history

# Trino - Maximum performance for complex federated queries
trino:
  replicaCount: 1  # Consider adding workers for distributed processing
  resources:
    requests:
      memory: "24Gi"     # Large memory for complex joins and aggregations
      cpu: "12000m"      # 12 CPUs for fast query execution
    limits:
      memory: "32Gi"     # Burst capacity for very complex queries
      cpu: "16000m"      # 16 CPUs for peak performance
  jvm:
    maxHeapSize: "18G"   # 75% of 24Gi
    additionalOptions:
      - "--add-opens=java.base/java.nio=ALL-UNNAMED"
      - "-XX:+UseG1GC"
      - "-XX:G1HeapRegionSize=32M"

Cluster Size Calculator

Minimum Configuration Total:

CPU Requests: ~2.0 CPUs
Memory Requests: ~4.0Gi
CPU Limits: ~4.05 CPUs
Memory Limits: ~6.75Gi
Recommended cluster: 1-2 nodes × 4 CPU × 8Gi RAM
Use case: Development, testing, POC only

Recommended Configuration Total (Production-Grade):

CPU Requests: ~12.65 CPUs (4 CPUs Schema + 6 CPUs Trino + 1.5 CPUs Nexus + 0.5 CPU Postgres + 0.3 CPU Frontend + 0.35 CPU system)
Memory Requests: ~21.25Gi (6Gi Schema + 12Gi Trino + 3Gi Nexus + 2Gi Postgres + 0.5Gi Frontend + ~0.75Gi system)
CPU Limits: ~21.3 CPUs
Memory Limits: ~34.5Gi
Recommended cluster: 4-5 nodes × 8 CPU × 16Gi RAM OR 3 nodes × 16 CPU × 32Gi RAM
Use case: Standard production with no performance bottlenecks

Enterprise Configuration Total (Maximum Performance):

CPU Requests: ~22.7 CPUs (8 CPUs Schema + 12 CPUs Trino + 5 CPUs Nexus + 1 CPU Postgres + 0.6 CPU Frontend + 0.1 CPU system)
Memory Requests: ~38.6Gi (12Gi Schema + 24Gi Trino + 10Gi Nexus + 4Gi Postgres + 2.56Gi Frontend + ~2Gi system)
CPU Limits: ~37 CPUs
Memory Limits: ~69Gi
Recommended cluster: 3 nodes × 16 CPU × 32Gi RAM OR 2 nodes × 32 CPU × 64Gi RAM
Use case: Enterprise deployments, very large data, hundreds of concurrent users

Autoscaling Configuration

Enable horizontal pod autoscaling for dynamic traffic. Autoscaling is recommended for production deployments to handle traffic spikes efficiently.

Minimum Configuration (Conservative):

autoscaling:
  nexus:
    enabled: true
    minReplicas: 2
    maxReplicas: 4
    targetCPUUtilizationPercentage: 80  # Scale less aggressively
  frontend:
    enabled: true
    minReplicas: 2
    maxReplicas: 3
    targetCPUUtilizationPercentage: 80
  schemaService:
    enabled: false  # Typically doesn't benefit from scaling

Recommended Configuration (Production Standard):

autoscaling:
  nexus:
    enabled: true
    minReplicas: 3
    maxReplicas: 10
    targetCPUUtilizationPercentage: 70  # Scale proactively
  frontend:
    enabled: true
    minReplicas: 2
    maxReplicas: 5
    targetCPUUtilizationPercentage: 70
  schemaService:
    enabled: false  # Does not benefit from horizontal scaling

Enterprise Configuration (Maximum Performance):

autoscaling:
  nexus:
    enabled: true
    minReplicas: 5
    maxReplicas: 20      # High scale-out capacity
    targetCPUUtilizationPercentage: 60  # Very aggressive scaling
  frontend:
    enabled: true
    minReplicas: 3
    maxReplicas: 10
    targetCPUUtilizationPercentage: 60
  schemaService:
    enabled: false

Notes:

Schema Service does not benefit from horizontal scaling (FAISS index is in-memory, not distributed)
For very large schemas, scale vertically (more CPU/memory) not horizontally
Lower targetCPUUtilizationPercentage = more aggressive scaling = better performance but higher cost
Enterprise tier scales early to maintain consistently low response times under any load

Storage Configuration

Configure persistent volumes:

persistence:
  postgres:
    enabled: true
    storageClass: "standard"
    size: "10Gi"
  
  schemaService:
    enabled: true
    storageClass: "standard"
    size: "5Gi"

Operational Configuration

Configure timeouts, caching, connection pools, and other operational parameters for production performance.

Cache Configuration

Actyze uses in-memory caching to reduce load on databases and LLM APIs. Configure cache sizes and TTLs based on your workload:

Development/Testing:

nexus:
  env:
    cache:
      enabled: true
      type: "memory"
      queryMaxSize: 100        # Number of query results to cache
      queryTtl: 1800           # 30 minutes
      llmMaxSize: 200          # Number of LLM responses to cache
      llmTtl: 7200             # 2 hours (LLM calls are expensive)

Recommended (Production):

nexus:
  env:
    cache:
      enabled: true
      type: "memory"
      queryMaxSize: 1000       # Larger cache for high traffic
      queryTtl: 3600           # 1 hour
      llmMaxSize: 500          # More LLM response caching
      llmTtl: 14400            # 4 hours
      schemaMaxSize: 1000      # FAISS schema recommendations
      schemaTtl: 7200          # 2 hours
      metadataMaxSize: 500     # Schema metadata
      metadataTtl: 3600        # 1 hour

Enterprise (High Performance):

nexus:
  env:
    cache:
      enabled: true
      type: "memory"
      queryMaxSize: 5000       # Very large cache
      queryTtl: 7200           # 2 hours
      llmMaxSize: 2000         # Extensive LLM caching
      llmTtl: 28800            # 8 hours
      schemaMaxSize: 5000      # Large schema cache
      schemaTtl: 14400         # 4 hours
      metadataMaxSize: 2000    # Extensive metadata cache
      metadataTtl: 7200        # 2 hours

Cache Guidelines:

Query Cache: Caches SQL query results. Higher values reduce database load but use more memory.
LLM Cache: Caches LLM API responses. Essential for cost reduction (LLM calls are expensive).
Schema Cache: Caches FAISS similarity search results. Reduces load on Schema Service.
TTL (Time-To-Live): Balance between freshness and performance. Longer TTL = fewer API/DB calls but potentially stale data.

Timeout Configuration

Configure timeouts to prevent hung requests and ensure system responsiveness:

Development/Testing:

nexus:
  env:
    # SQL Execution Timeouts
    # defaultTimeoutSeconds also drives the frontend HTTP timeout (+30s buffer)
    sqlExecution:
      defaultTimeoutSeconds: 120       # Query execution timeout (frontend waits 150s)
      defaultMaxResults: 100           # Max rows returned
    
    # External Service Timeouts
    schemaServiceTimeout: 10           # Schema service calls
    llmServiceTimeout: 120             # LLM API calls
    trinoTimeout: 120                  # Trino query timeout

# Ingress Timeouts — must exceed defaultTimeoutSeconds + buffer
ingress:
  annotations:
    nginx.ingress.kubernetes.io/proxy-read-timeout: "300"    # 5 minutes
    nginx.ingress.kubernetes.io/proxy-send-timeout: "300"
    nginx.ingress.kubernetes.io/proxy-connect-timeout: "60"

Recommended (Production):

nexus:
  env:
    # SQL Execution Timeouts
    # defaultTimeoutSeconds also drives the frontend HTTP timeout (+30s buffer)
    sqlExecution:
      defaultTimeoutSeconds: 120       # Allow complex queries (frontend waits 150s)
      defaultMaxResults: 1000          # More results for analytics
    
    # External Service Timeouts
    schemaServiceTimeout: 15           # FAISS can be CPU-intensive
    llmServiceTimeout: 120             # LLMs can be slow
    trinoTimeout: 300                  # Complex federated queries need time

# Ingress Timeouts — must exceed defaultTimeoutSeconds + buffer
ingress:
  annotations:
    nginx.ingress.kubernetes.io/proxy-read-timeout: "600"    # 10 minutes
    nginx.ingress.kubernetes.io/proxy-send-timeout: "600"
    nginx.ingress.kubernetes.io/proxy-connect-timeout: "120"

Enterprise (Long-Running Queries):

nexus:
  env:
    # SQL Execution Timeouts
    sqlExecution:
      defaultTimeoutSeconds: 300       # 5 minutes for complex analytics
      defaultMaxResults: 10000         # Large result sets
    
    # External Service Timeouts
    schemaServiceTimeout: 30           # More time for large schemas
    llmServiceTimeout: 120             # Complex LLM reasoning
    trinoTimeout: 600                  # 10 minutes for very complex queries

# Ingress Timeouts
ingress:
  annotations:
    nginx.ingress.kubernetes.io/proxy-read-timeout: "1800"   # 30 minutes
    nginx.ingress.kubernetes.io/proxy-send-timeout: "1800"
    nginx.ingress.kubernetes.io/proxy-connect-timeout: "300"

Timeout Guidelines:

SQL Timeouts: Set based on expected query complexity. Too short = failed queries, too long = hung connections.
LLM Timeouts: LLM APIs can be slow during high demand. Set conservatively.
Ingress Timeouts: Must be >= longest expected API call. Important for file uploads and long-running queries.

Database Connection Pool

Configure PostgreSQL connection pooling for optimal performance:

Development/Testing:

nexus:
  env:
    postgres:
      poolSize: 10              # Connections per Nexus replica
      maxOverflow: 10           # Additional connections during spikes
      poolTimeout: 30           # Seconds to wait for connection
      poolRecycle: 3600         # Recycle connections after 1 hour

Recommended (Production):

nexus:
  env:
    postgres:
      poolSize: 20              # Base connection pool
      maxOverflow: 30           # Allow burst traffic
      poolTimeout: 30           
      poolRecycle: 3600
      poolPrePing: true         # Verify connections before use

Enterprise (High Concurrency):

nexus:
  env:
    postgres:
      poolSize: 50              # Large pool for high concurrency
      maxOverflow: 50           # Substantial overflow capacity
      poolTimeout: 60           # More patient during load spikes
      poolRecycle: 1800         # Recycle more frequently
      poolPrePing: true

Connection Pool Guidelines:

Pool Size: Base connections maintained per Nexus replica. Total = poolSize × replicas.
Max Overflow: Additional connections during traffic spikes. Prevents connection exhaustion.
Pool Timeout: How long to wait for an available connection. Too short = connection errors.
Calculate Total: With 3 Nexus replicas, poolSize=20, maxOverflow=30 = max 150 connections.
PostgreSQL max_connections: Ensure PostgreSQL max_connections > total pool capacity.

Retry & Circuit Breaker

Configure retry logic for transient failures:

Recommended (Production):

nexus:
  env:
    # Schema Service Retries
    schemaServiceRetries: 3
    schemaServiceRetryDelay: 1          # Seconds between retries
    schemaServiceRetryBackoff: 2        # Exponential backoff multiplier
    
    # LLM API Retries
    llmServiceRetries: 3
    llmServiceRetryDelay: 2
    llmServiceRetryBackoff: 2
    
    # Trino Query Retries
    trinoRetries: 2
    trinoRetryDelay: 5

Enterprise (High Reliability):

nexus:
  env:
    # More aggressive retries for mission-critical operations
    schemaServiceRetries: 5
    schemaServiceRetryDelay: 1
    schemaServiceRetryBackoff: 2
    schemaServiceCircuitBreakerThreshold: 5    # Open circuit after 5 failures
    schemaServiceCircuitBreakerTimeout: 60     # Try again after 60s
    
    llmServiceRetries: 5
    llmServiceRetryDelay: 2
    llmServiceRetryBackoff: 2
    llmServiceCircuitBreakerThreshold: 5
    llmServiceCircuitBreakerTimeout: 120
    
    trinoRetries: 3
    trinoRetryDelay: 5

Rate Limiting

Configure rate limiting for API protection:

Ingress Rate Limiting:

ingress:
  annotations:
    nginx.ingress.kubernetes.io/limit-rps: "100"          # Requests per second per IP
    nginx.ingress.kubernetes.io/limit-burst-multiplier: "5"  # Allow bursts up to 5x
    nginx.ingress.kubernetes.io/limit-connections: "50"   # Concurrent connections per IP

Application-Level Rate Limiting:

nexus:
  env:
    rateLimit:
      enabled: true
      queriesPerMinute: 60              # Per user
      queriesPerHour: 1000
      llmCallsPerHour: 100              # Expensive operations

Query Result Limits

Prevent memory exhaustion from large result sets:

nexus:
  env:
    sqlExecution:
      defaultMaxResults: 1000           # Default row limit
      maxMaxResults: 100000             # Absolute maximum
      streamingThreshold: 10000         # Stream results above this size

File Upload Limits

Configure file upload limits for CSV/Excel features:

ingress:
  annotations:
    nginx.ingress.kubernetes.io/proxy-body-size: "100m"   # Max upload size

nexus:
  env:
    fileUpload:
      maxSize: 104857600                # 100MB in bytes
      allowedExtensions: [".csv", ".xlsx", ".xls"]
      maxRows: 1000000                  # 1 million rows max

Complete Production Example

nexus:
  replicaCount: 3
  resources:
    requests:
      memory: "1Gi"
      cpu: "500m"
    limits:
      memory: "2Gi"
      cpu: "1000m"
  env:
    debug: false
    logLevel: "INFO"
    
    # Cache Configuration
    cache:
      enabled: true
      queryMaxSize: 1000
      queryTtl: 3600
      llmMaxSize: 500
      llmTtl: 14400
      schemaMaxSize: 1000
      schemaTtl: 7200
    
    # Timeout Configuration
    sqlExecution:
      defaultTimeoutSeconds: 60
      defaultMaxResults: 1000
    schemaServiceTimeout: 15
    llmServiceTimeout: 60
    trinoTimeout: 120
    
    # Connection Pool
    postgres:
      poolSize: 20
      maxOverflow: 30
      poolTimeout: 30
      poolRecycle: 3600
      poolPrePing: true
    
    # Retry Logic
    schemaServiceRetries: 3
    schemaServiceRetryDelay: 1
    llmServiceRetries: 3
    llmServiceRetryDelay: 2
    trinoRetries: 2
    
    # Rate Limiting
    rateLimit:
      enabled: true
      queriesPerMinute: 60
      llmCallsPerHour: 100

# Ingress Configuration
ingress:
  enabled: true
  annotations:
    # Timeouts
    nginx.ingress.kubernetes.io/proxy-read-timeout: "600"
    nginx.ingress.kubernetes.io/proxy-send-timeout: "600"
    nginx.ingress.kubernetes.io/proxy-connect-timeout: "120"
    
    # Rate Limiting
    nginx.ingress.kubernetes.io/limit-rps: "100"
    nginx.ingress.kubernetes.io/limit-connections: "50"
    
    # File Upload
    nginx.ingress.kubernetes.io/proxy-body-size: "100m"

Ingress Setup

Basic Ingress

ingress:
  enabled: true
  className: "nginx"
  hosts:
    - host: analytics.yourcompany.com
      paths:
        - path: /
          pathType: Prefix
          service: frontend
        - path: /api
          pathType: Prefix
          service: nexus

SSL/TLS with cert-manager

ingress:
  enabled: true
  className: "nginx"
  annotations:
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
    nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
  hosts:
    - host: analytics.yourcompany.com
      paths:
        - path: /
          pathType: Prefix
          service: frontend
        - path: /api
          pathType: Prefix
          service: nexus
  tls:
    - secretName: dashboard-tls
      hosts:
        - analytics.yourcompany.com

Cloud Provider Ingress

AWS ALB:

ingress:
  className: "alb"
  annotations:
    alb.ingress.kubernetes.io/scheme: "internet-facing"
    alb.ingress.kubernetes.io/target-type: "ip"
    alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80}, {"HTTPS": 443}]'

GCP GKE:

ingress:
  className: "gce"
  annotations:
    kubernetes.io/ingress.global-static-ip-name: "dashboard-ip"

Azure AKS:

ingress:
  className: "nginx"
  annotations:
    kubernetes.io/ingress.class: "nginx"
    nginx.ingress.kubernetes.io/use-regex: "true"

Access Methods

Port Forwarding (Development)

For local testing without Ingress:

# Forward frontend port (internal port 80 → local port 3000)
kubectl port-forward -n actyze svc/dashboard-frontend 3000:80

# Forward API port (internal port 8002 → local port 8002)
kubectl port-forward -n actyze svc/dashboard-nexus 8002:8002

Open http://localhost:3000 for the UI and http://localhost:8002/docs for the API.

Ingress (Production)

Configure DNS to point to your ingress controller:

# Get ingress IP
kubectl get ingress -n actyze

# Add DNS A record
# analytics.yourcompany.com → <INGRESS_IP>

Access at https://analytics.yourcompany.com

Management Commands

Upgrade Deployment

# Pull latest chart changes
cd helm-charts
git pull origin main

# Upgrade release
helm upgrade dashboard ./dashboard \
  -f dashboard/values.yaml \
  -f dashboard/values-secrets.yaml \
  -n actyze

Rollback Deployment

# View release history
helm history dashboard -n actyze

# Rollback to previous version
helm rollback dashboard -n actyze

# Rollback to specific revision
helm rollback dashboard 2 -n actyze

View Configuration

# View current values
helm get values dashboard -n actyze

# View all values (including defaults)
helm get values dashboard -n actyze --all

# View rendered manifests
helm get manifest dashboard -n actyze

Uninstall

# Uninstall release (keeps PVCs)
helm uninstall dashboard -n actyze

# Delete namespace and all resources
kubectl delete namespace actyze

Troubleshooting

Pods Not Starting

# Check pod status
kubectl get pods -n actyze

# Describe pod for events
kubectl describe pod <pod-name> -n actyze

# Check logs
kubectl logs <pod-name> -n actyze

# Check previous logs (if crashed)
kubectl logs <pod-name> -n actyze --previous

Common issues:

ImagePullBackOff: Check image names and registry access
CrashLoopBackOff: Check logs for application errors
Pending: Check resource availability and storage class
OOMKilled: Increase memory limits

Database Connection Issues

# Check PostgreSQL pod
kubectl get pods -n actyze -l app=postgres

# View PostgreSQL logs
kubectl logs -n actyze deployment/dashboard-postgres

# Verify secret
kubectl get secret dashboard-secrets -n actyze -o yaml

# Test connection from Nexus pod
kubectl exec -it -n actyze deployment/dashboard-nexus -- \
  psql -h dashboard-postgres -U dashboard_user -d dashboard

LLM API Issues

# Check Nexus logs for API errors
kubectl logs -n actyze deployment/dashboard-nexus | grep -i "llm\|error"

# Verify environment variables
kubectl exec -n actyze deployment/dashboard-nexus -- env | grep EXTERNAL_LLM

# Check secret
kubectl get secret dashboard-secrets -n actyze -o jsonpath='{.data.EXTERNAL_LLM_API_KEY}' | base64 -d

Ingress Issues

# Check ingress status
kubectl get ingress -n actyze
kubectl describe ingress dashboard-ingress -n actyze

# Check ingress controller
kubectl get pods -n ingress-nginx

# View ingress logs
kubectl logs -n ingress-nginx deployment/ingress-nginx-controller

# Test DNS resolution
nslookup analytics.yourcompany.com

Storage Issues

# Check PVCs
kubectl get pvc -n actyze

# Check storage class
kubectl get storageclass

# Describe PVC for events
kubectl describe pvc <pvc-name> -n actyze

# Check PV
kubectl get pv

Performance Issues

# Check resource usage
kubectl top pods -n actyze
kubectl top nodes

# Check HPA status
kubectl get hpa -n actyze
kubectl describe hpa dashboard-nexus-hpa -n actyze

# View metrics
kubectl get --raw /apis/metrics.k8s.io/v1beta1/nodes

Monitoring

Health Checks

# Port-forward and test health endpoint
kubectl port-forward -n actyze svc/dashboard-nexus 8000:8000

curl http://localhost:8000/health

View Logs

# All pods
kubectl logs -n actyze -l app.kubernetes.io/name=dashboard --tail=100

# Specific service
kubectl logs -n actyze deployment/dashboard-nexus --tail=100 -f

# Multiple pods
kubectl logs -n actyze -l app=nexus --tail=50 --all-containers=true

Resource Usage

# Pod resource usage
kubectl top pods -n actyze

# Node resource usage
kubectl top nodes

# Detailed pod info
kubectl describe pod <pod-name> -n actyze

Production Best Practices

High Availability

services:
  nexus:
    replicas: 3
  frontend:
    replicas: 2
  schemaService:
    replicas: 2

# Enable pod disruption budgets
podDisruptionBudget:
  enabled: true
  minAvailable: 1

# Anti-affinity rules
affinity:
  podAntiAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        podAffinityTerm:
          topologyKey: kubernetes.io/hostname

Security

# Use Kubernetes secrets
secrets:
  externalLLM:
    apiKey: "use-sealed-secrets-or-external-secrets-operator"

# Enable network policies
networkPolicy:
  enabled: true

# Use service accounts
serviceAccount:
  create: true
  name: "dashboard-sa"

# Security context
securityContext:
  runAsNonRoot: true
  runAsUser: 1000
  fsGroup: 1000

Monitoring & Logging

# Enable Prometheus metrics
monitoring:
  enabled: true
  serviceMonitor:
    enabled: true

# Configure logging
logging:
  level: "INFO"
  format: "json"

# Health checks
livenessProbe:
  enabled: true
  initialDelaySeconds: 30
  periodSeconds: 10

readinessProbe:
  enabled: true
  initialDelaySeconds: 10
  periodSeconds: 5

Backup & Recovery

# Backup PostgreSQL
kubectl exec -n actyze deployment/dashboard-postgres -- \
  pg_dump -U dashboard_user dashboard > backup.sql

# Backup PVC
kubectl get pvc -n actyze
# Use Velero or cloud provider backup solutions

# Restore database
kubectl exec -i -n actyze deployment/dashboard-postgres -- \
  psql -U dashboard_user dashboard < backup.sql

Database Migrations

Migrations run automatically via Kubernetes Job:

# Check migration status
kubectl get jobs -n actyze
kubectl logs -n actyze job/dashboard-db-migration

# Manual migration (if needed)
kubectl exec -it -n actyze deployment/dashboard-nexus -- \
  alembic upgrade head

# Rollback migration
kubectl exec -it -n actyze deployment/dashboard-nexus -- \
  alembic downgrade -1

Upgrading

Update Helm Charts

# Pull latest changes
cd helm-charts
git pull origin main

# Review changes
git log --oneline -10
git diff HEAD~5 dashboard/

# Upgrade deployment
helm upgrade dashboard ./dashboard \
  -f dashboard/values.yaml \
  -f dashboard/values-secrets.yaml \
  -n actyze

Update Docker Images

# Update image tags in values.yaml
services:
  nexus:
    image:
      tag: "v1.2.0"  # Update version

# Upgrade
helm upgrade dashboard ./dashboard \
  -f dashboard/values.yaml \
  -f dashboard/values-secrets.yaml \
  -n actyze

# Or force pod restart
kubectl rollout restart deployment/dashboard-nexus -n actyze

CI/CD Integration

GitHub Actions Example

name: Deploy to Kubernetes

on:
  push:
    branches: [main]

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      
      - name: Configure kubectl
        uses: azure/k8s-set-context@v1
        with:
          kubeconfig: ${{ secrets.KUBE_CONFIG }}
      
      - name: Deploy with Helm
        run: |
          helm upgrade --install dashboard ./dashboard \
            -f dashboard/values.yaml \
            --set secrets.externalLLM.apiKey=${{ secrets.LLM_API_KEY }} \
            -n actyze \
            --create-namespace

Next Steps

AI Providers - Choose from 100+ AI providers
Configure LLM Provider - Set up your AI model
Database Connectors - Connect to Trino
Monitoring Setup - Set up observability
API Reference - Integrate with your apps

Additional Resources

Helm Charts Repository - Source code
VALUES_README.md - All configuration options
LLM_PROVIDERS.md - LLM setup guide
DEPLOYMENT.md - Deployment guide

Support

For issues and questions:

GitHub Issues: github.com/actyze/helm-charts/issues
Documentation: docs.actyze.com
Kubernetes Docs: kubernetes.io

Overview​

Prerequisites​

Repository Structure​

Quick Start​

1. Clone Helm Charts Repository​

2. Configure Secrets​

3. Deploy to Kubernetes​

4. Verify Deployment​

Configuration​

Main Configuration File​

Docker Image Configuration​

Service Ports and Networking​

LLM Provider Configuration​

Resource Configuration​

Production Resource Tiers​

Resource Specifications by Service​

Example: Minimum Configuration (Development/Testing Only)​

Example: Recommended Configuration (Production-Grade)​

Example: Enterprise Configuration (Maximum Performance)​

Cluster Size Calculator​

Autoscaling Configuration​

Storage Configuration​

Operational Configuration​

Cache Configuration​

Timeout Configuration​

Database Connection Pool​

Retry & Circuit Breaker​

Rate Limiting​

Query Result Limits​

File Upload Limits​

Complete Production Example​

Ingress Setup​

Basic Ingress​

SSL/TLS with cert-manager​

Cloud Provider Ingress​

Access Methods​

Port Forwarding (Development)​

Ingress (Production)​

Management Commands​

Upgrade Deployment​

Rollback Deployment​

View Configuration​

Uninstall​

Troubleshooting​

Pods Not Starting​

Database Connection Issues​

LLM API Issues​

Ingress Issues​

Storage Issues​

Performance Issues​

Monitoring​

Health Checks​

View Logs​

Resource Usage​

Production Best Practices​

High Availability​

Security​

Monitoring & Logging​

Backup & Recovery​

Database Migrations​

Upgrading​

Update Helm Charts​

Update Docker Images​

CI/CD Integration​

GitHub Actions Example​

Next Steps​

Additional Resources​

Support​

Overview

Prerequisites

Repository Structure

Quick Start

1. Clone Helm Charts Repository

2. Configure Secrets

3. Deploy to Kubernetes

4. Verify Deployment

Configuration

Main Configuration File

Docker Image Configuration

Service Ports and Networking

LLM Provider Configuration

Resource Configuration

Production Resource Tiers

Resource Specifications by Service

Example: Minimum Configuration (Development/Testing Only)

Example: Recommended Configuration (Production-Grade)

Example: Enterprise Configuration (Maximum Performance)

Cluster Size Calculator

Autoscaling Configuration

Storage Configuration

Operational Configuration

Cache Configuration

Timeout Configuration

Database Connection Pool

Retry & Circuit Breaker

Rate Limiting

Query Result Limits

File Upload Limits

Complete Production Example

Ingress Setup

Basic Ingress

SSL/TLS with cert-manager

Cloud Provider Ingress

Access Methods

Port Forwarding (Development)

Ingress (Production)

Management Commands

Upgrade Deployment

Rollback Deployment

View Configuration

Uninstall

Troubleshooting

Pods Not Starting

Database Connection Issues

LLM API Issues

Ingress Issues

Storage Issues

Performance Issues

Monitoring

Health Checks

View Logs

Resource Usage

Production Best Practices

High Availability

Security

Monitoring & Logging

Backup & Recovery

Database Migrations

Upgrading

Update Helm Charts

Update Docker Images

CI/CD Integration

GitHub Actions Example

Next Steps

Additional Resources

Support