Skip to main content

Predictive Intelligence

Create ML-powered predictions from your data — no data science knowledge required. Choose an outcome, select your data, and Actyze handles model selection, training, and deployment automatically.

Prediction Types

Forecast

Predict future values over a time horizon.

Best for: revenue projections, demand forecasting, traffic predictions, inventory planning.

Example: "Forecast daily revenue for the next 30 days"
→ AutoGluon trains an ensemble of ARIMA + ETS + Theta + tree models
→ Output: predicted values with confidence intervals for each day

Models used: AutoGluon TimeSeries (preferred), XGBoost Supports covariates: forecast revenue considering ad spend, promotions, and seasonality together.

Classify

Predict which category a record falls into (binary or multi-class).

Best for: churn prediction, fraud detection, lead scoring, trial-to-paid conversion.

Example: "Which customers are likely to churn in the next 90 days?"
→ LightGBM trains a gradient-boosted classifier
→ Output: churn probability per customer (0.0–1.0)

Models used: LightGBM (preferred for >100K rows), XGBoost

Estimate

Predict a continuous numeric value.

Best for: customer lifetime value, pricing optimization, support ticket priority, maintenance cost.

Example: "Estimate the lifetime value of each customer"
→ XGBoost trains a regression model
→ Output: predicted CLV per customer with confidence range

Models used: LightGBM, XGBoost

Detect Anomalies

Find unusual data points — fully unsupervised, no labels needed.

Best for: suspicious transactions, sensor anomalies, billing outliers, process deviations.

Example: "Detect anomalous transactions in the payments table"
→ Isolation Forest scores each row
→ Output: anomaly flag + anomaly score per record

Models used: XGBoost (Isolation Forest)

Data Sources

Predictions can be trained from two source types:

SourceDescriptionBest for
KPIUse a materialized scheduled KPI tableSimple, reliable — data is already clean and on a schedule
Custom SQLWrite any SQL query, including multi-table JOINs via TrinoComplex features from multiple tables

Creating a Pipeline

Step 1: Choose Outcome

Select the prediction type (Forecast, Classify, Estimate, or Detect). Then choose your data source — a KPI or custom SQL.

Step 2: Configure

  • Target column — the column you want to predict (not needed for anomaly detection)
  • Feature columns — columns the model should learn from (auto-selected if omitted)
  • Time column — required for forecasting
  • Forecast horizon — how many periods to predict ahead (forecasting only)

Step 3: Train

Click Train and the system:

  1. Validates data quality — checks for sufficient rows, missing values, class imbalance
  2. Selects the best model — based on data size and prediction type
  3. Trains the model — hyperparameters tuned automatically
  4. Writes predictions — output table created in prediction_data schema
  5. Registers with FAISS — predictions become queryable via natural language

Data Quality Gates

Actyze validates your data before training begins:

Blocking Issues (training won't start)

  • Fewer than the minimum required rows (e.g., 60 data points for a 30-day forecast)
  • Target column has too many missing values
  • Extreme class imbalance (e.g., 99.5% / 0.5% split)

Warnings (training proceeds with caution)

  • Low class balance confidence
  • Time-aggregated data used for classification (may need entity-level columns)
  • Limited number of distinct classes

All issues are shown in the UI with specific, actionable guidance.

ML Models

Actyze deploys three ML workers as separate containers. Each can be scaled independently.

XGBoost Worker

  • Tasks: classification, regression, anomaly detection (Isolation Forest)
  • When used: default for all prediction types, any data size
  • Docker image: prediction-worker-xgboost

LightGBM Worker

  • Tasks: classification, regression
  • When used: preferred for large datasets (>100K rows) for faster training
  • Docker image: prediction-worker-lightgbm

AutoGluon Worker

  • Tasks: time-series forecasting (univariate and multivariate)
  • When used: preferred for forecasting when deployed (superior accuracy via ensemble)
  • Docker image: prediction-worker-autogluon
  • Note: larger image size (~2GB), optional deployment

Model Selection Logic

Forecast?
→ AutoGluon deployed? → AutoGluon
→ else → XGBoost

Classify or Estimate?
→ Rows > 100K and LightGBM deployed? → LightGBM
→ else → XGBoost

Detect Anomalies?
→ Always XGBoost (Isolation Forest)

Training Triggers

TriggerDescriptionBest for
After KPI CollectionRetrain automatically when the linked KPI gets new dataProduction cadence — predictions stay current with latest data
ScheduledRetrain every N hours (1–720)When you want a fixed retraining interval
ManualRetrain on demand via the UIExperimentation, one-off analysis

Accuracy Display

Actyze shows business-friendly accuracy instead of raw ML metrics:

TypeExample display
Forecast"Predictions within ±8% of actual values"
Classify"Correctly identifies 92% of churned customers"
Estimate"Estimates within ±$12.50 of actual values"
Detect"Found 47 anomalies (10.2%) in 461 rows"

Querying Predictions

Prediction output tables are stored in the prediction_data schema (e.g., pred_revenue_forecast_30d) and registered with FAISS. You can:

  1. Ask in natural language: "Show me the churn predictions" — the AI discovers the prediction table automatically
  2. Explore in Queries: click "Explore in Queries" on a completed run to open the full query page with charts and CSV export
  3. Get Recommendations: click "Get Recommendations" to have the LLM analyze results and suggest actionable next steps

Configuration

Environment Variables

VariableDefaultDescription
PREDICTION_WORKER_XGBOOST_URLhttp://prediction-worker-xgboost:8000XGBoost worker endpoint
PREDICTION_WORKER_LIGHTGBM_URLhttp://prediction-worker-lightgbm:8000LightGBM worker endpoint
PREDICTION_WORKER_AUTOGLUON_URLhttp://prediction-worker-autogluon:8000AutoGluon worker endpoint

Helm Deployment

Each worker is an independently deployable Helm subchart:

# values.yaml
predictionWorkerXgboost:
enabled: true
replicas: 1
resources:
requests:
cpu: 500m
memory: 1Gi

predictionWorkerLightgbm:
enabled: true
replicas: 1

predictionWorkerAutogluon:
enabled: false # Enable for forecasting
replicas: 1
resources:
requests:
cpu: 1000m
memory: 2Gi

Docker Compose

Workers are included in docker-compose.yml and start automatically:

prediction-worker-xgboost:
build: ./docker/prediction-worker-xgboost
ports:
- "8010:8000"

prediction-worker-lightgbm:
build: ./docker/prediction-worker-lightgbm
ports:
- "8011:8000"

prediction-worker-autogluon:
build: ./docker/prediction-worker-autogluon
ports:
- "8012:8000"

Industry Use Cases

IndustryForecastClassifyEstimateDetect
E-commerceDemand forecasting, revenue projectionsCustomer churn, fraud detectionCustomer lifetime value, lead scoringUnusual transactions, pricing anomalies
SaaSMRR/ARR forecasting, usage trendsChurn risk, trial-to-paid conversionExpansion revenue, support ticket priorityUsage spikes, abnormal API patterns
FinanceCash flow forecasting, market trendsCredit risk, transaction fraudPortfolio value, risk scoringSuspicious transactions, market anomalies
HealthcarePatient volume, resource planningReadmission risk, diagnosis classificationTreatment cost, length of stayAbnormal lab results, billing outliers
ManufacturingProduction demand, supply planningQuality defect detection, equipment failureMaintenance cost, yield optimizationSensor anomalies, process deviations
LogisticsShipment volume, delivery timeRoute optimization, delay riskFreight cost, warehouse capacityDelivery exceptions, fleet irregularities