Predictive Intelligence

Create ML-powered predictions from your data — no data science knowledge required. Choose an outcome, select your data, and Actyze handles model selection, training, and deployment automatically.

Prediction Types

Forecast

Predict future values over a time horizon.

Best for: revenue projections, demand forecasting, traffic predictions, inventory planning.

Example: "Forecast daily revenue for the next 30 days"
→ AutoGluon trains an ensemble of ARIMA + ETS + Theta + tree models
→ Output: predicted values with confidence intervals for each day

Models used: AutoGluon TimeSeries (preferred), XGBoost Supports covariates: forecast revenue considering ad spend, promotions, and seasonality together.

Classify

Predict which category a record falls into (binary or multi-class).

Best for: churn prediction, fraud detection, lead scoring, trial-to-paid conversion.

Example: "Which customers are likely to churn in the next 90 days?"
→ LightGBM trains a gradient-boosted classifier
→ Output: churn probability per customer (0.0–1.0)

Models used: LightGBM (preferred for >100K rows), XGBoost

Estimate

Predict a continuous numeric value.

Best for: customer lifetime value, pricing optimization, support ticket priority, maintenance cost.

Example: "Estimate the lifetime value of each customer"
→ XGBoost trains a regression model
→ Output: predicted CLV per customer with confidence range

Models used: LightGBM, XGBoost

Detect Anomalies

Find unusual data points — fully unsupervised, no labels needed.

Best for: suspicious transactions, sensor anomalies, billing outliers, process deviations.

Example: "Detect anomalous transactions in the payments table"
→ Isolation Forest scores each row
→ Output: anomaly flag + anomaly score per record

Models used: XGBoost (Isolation Forest)

Data Sources

Predictions can be trained from two source types:

Source	Description	Best for
KPI	Use a materialized scheduled KPI table	Simple, reliable — data is already clean and on a schedule
Custom SQL	Write any SQL query, including multi-table JOINs via Trino	Complex features from multiple tables

Creating a Pipeline

Step 1: Choose Outcome

Select the prediction type (Forecast, Classify, Estimate, or Detect). Then choose your data source — a KPI or custom SQL.

Step 2: Configure

Target column — the column you want to predict (not needed for anomaly detection)
Feature columns — columns the model should learn from (auto-selected if omitted)
Time column — required for forecasting
Forecast horizon — how many periods to predict ahead (forecasting only)

Step 3: Train

Click Train and the system:

Validates data quality — checks for sufficient rows, missing values, class imbalance
Selects the best model — based on data size and prediction type
Trains the model — hyperparameters tuned automatically
Writes predictions — output table created in prediction_data schema
Registers with FAISS — predictions become queryable via natural language

Data Quality Gates

Actyze validates your data before training begins:

Blocking Issues (training won't start)

Fewer than the minimum required rows (e.g., 60 data points for a 30-day forecast)
Target column has too many missing values
Extreme class imbalance (e.g., 99.5% / 0.5% split)

Warnings (training proceeds with caution)

Low class balance confidence
Time-aggregated data used for classification (may need entity-level columns)
Limited number of distinct classes

All issues are shown in the UI with specific, actionable guidance.

ML Models

Actyze deploys three ML workers as separate containers. Each can be scaled independently.

XGBoost Worker

Tasks: classification, regression, anomaly detection (Isolation Forest)
When used: default for all prediction types, any data size
Docker image: prediction-worker-xgboost

LightGBM Worker

Tasks: classification, regression
When used: preferred for large datasets (>100K rows) for faster training
Docker image: prediction-worker-lightgbm

AutoGluon Worker

Tasks: time-series forecasting (univariate and multivariate)
When used: preferred for forecasting when deployed (superior accuracy via ensemble)
Docker image: prediction-worker-autogluon
Note: larger image size (~2GB), optional deployment

Model Selection Logic

Forecast?
  → AutoGluon deployed? → AutoGluon
  → else → XGBoost

Classify or Estimate?
  → Rows > 100K and LightGBM deployed? → LightGBM
  → else → XGBoost

Detect Anomalies?
  → Always XGBoost (Isolation Forest)

Training Triggers

Trigger	Description	Best for
After KPI Collection	Retrain automatically when the linked KPI gets new data	Production cadence — predictions stay current with latest data
Scheduled	Retrain every N hours (1–720)	When you want a fixed retraining interval
Manual	Retrain on demand via the UI	Experimentation, one-off analysis

Accuracy Display

Actyze shows business-friendly accuracy instead of raw ML metrics:

Type	Example display
Forecast	"Predictions within ±8% of actual values"
Classify	"Correctly identifies 92% of churned customers"
Estimate	"Estimates within ±$12.50 of actual values"
Detect	"Found 47 anomalies (10.2%) in 461 rows"

Querying Predictions

Prediction output tables are stored in the prediction_data schema (e.g., pred_revenue_forecast_30d) and registered with FAISS. You can:

Ask in natural language: "Show me the churn predictions" — the AI discovers the prediction table automatically
Explore in Queries: click "Explore in Queries" on a completed run to open the full query page with charts and CSV export
Get Recommendations: click "Get Recommendations" to have the LLM analyze results and suggest actionable next steps

Configuration

Environment Variables

Variable	Default	Description
`PREDICTION_WORKER_XGBOOST_URL`	`http://prediction-worker-xgboost:8000`	XGBoost worker endpoint
`PREDICTION_WORKER_LIGHTGBM_URL`	`http://prediction-worker-lightgbm:8000`	LightGBM worker endpoint
`PREDICTION_WORKER_AUTOGLUON_URL`	`http://prediction-worker-autogluon:8000`	AutoGluon worker endpoint

Helm Deployment

Each worker is an independently deployable Helm subchart:

# values.yaml
predictionWorkerXgboost:
  enabled: true
  replicas: 1
  resources:
    requests:
      cpu: 500m
      memory: 1Gi

predictionWorkerLightgbm:
  enabled: true
  replicas: 1

predictionWorkerAutogluon:
  enabled: false  # Enable for forecasting
  replicas: 1
  resources:
    requests:
      cpu: 1000m
      memory: 2Gi

Docker Compose

Workers are included in docker-compose.yml and start automatically:

prediction-worker-xgboost:
  build: ./docker/prediction-worker-xgboost
  ports:
    - "8010:8000"

prediction-worker-lightgbm:
  build: ./docker/prediction-worker-lightgbm
  ports:
    - "8011:8000"

prediction-worker-autogluon:
  build: ./docker/prediction-worker-autogluon
  ports:
    - "8012:8000"

Industry Use Cases

Industry	Forecast	Classify	Estimate	Detect
E-commerce	Demand forecasting, revenue projections	Customer churn, fraud detection	Customer lifetime value, lead scoring	Unusual transactions, pricing anomalies
SaaS	MRR/ARR forecasting, usage trends	Churn risk, trial-to-paid conversion	Expansion revenue, support ticket priority	Usage spikes, abnormal API patterns
Finance	Cash flow forecasting, market trends	Credit risk, transaction fraud	Portfolio value, risk scoring	Suspicious transactions, market anomalies
Healthcare	Patient volume, resource planning	Readmission risk, diagnosis classification	Treatment cost, length of stay	Abnormal lab results, billing outliers
Manufacturing	Production demand, supply planning	Quality defect detection, equipment failure	Maintenance cost, yield optimization	Sensor anomalies, process deviations
Logistics	Shipment volume, delivery time	Route optimization, delay risk	Freight cost, warehouse capacity	Delivery exceptions, fleet irregularities

Prediction Types​

Forecast​

Classify​

Estimate​

Detect Anomalies​

Data Sources​

Creating a Pipeline​

Step 1: Choose Outcome​

Step 2: Configure​

Step 3: Train​

Data Quality Gates​

Blocking Issues (training won't start)​

Warnings (training proceeds with caution)​

ML Models​

XGBoost Worker​

LightGBM Worker​

AutoGluon Worker​

Model Selection Logic​

Training Triggers​

Accuracy Display​

Querying Predictions​

Configuration​

Environment Variables​

Helm Deployment​

Docker Compose​

Industry Use Cases​