Predictive Intelligence
Create ML-powered predictions from your data — no data science knowledge required. Choose an outcome, select your data, and Actyze handles model selection, training, and deployment automatically.
Prediction Types
Forecast
Predict future values over a time horizon.
Best for: revenue projections, demand forecasting, traffic predictions, inventory planning.
Example: "Forecast daily revenue for the next 30 days"
→ AutoGluon trains an ensemble of ARIMA + ETS + Theta + tree models
→ Output: predicted values with confidence intervals for each day
Models used: AutoGluon TimeSeries (preferred), XGBoost Supports covariates: forecast revenue considering ad spend, promotions, and seasonality together.
Classify
Predict which category a record falls into (binary or multi-class).
Best for: churn prediction, fraud detection, lead scoring, trial-to-paid conversion.
Example: "Which customers are likely to churn in the next 90 days?"
→ LightGBM trains a gradient-boosted classifier
→ Output: churn probability per customer (0.0–1.0)
Models used: LightGBM (preferred for >100K rows), XGBoost
Estimate
Predict a continuous numeric value.
Best for: customer lifetime value, pricing optimization, support ticket priority, maintenance cost.
Example: "Estimate the lifetime value of each customer"
→ XGBoost trains a regression model
→ Output: predicted CLV per customer with confidence range
Models used: LightGBM, XGBoost
Detect Anomalies
Find unusual data points — fully unsupervised, no labels needed.
Best for: suspicious transactions, sensor anomalies, billing outliers, process deviations.
Example: "Detect anomalous transactions in the payments table"
→ Isolation Forest scores each row
→ Output: anomaly flag + anomaly score per record
Models used: XGBoost (Isolation Forest)
Data Sources
Predictions can be trained from two source types:
| Source | Description | Best for |
|---|---|---|
| KPI | Use a materialized scheduled KPI table | Simple, reliable — data is already clean and on a schedule |
| Custom SQL | Write any SQL query, including multi-table JOINs via Trino | Complex features from multiple tables |
Creating a Pipeline
Step 1: Choose Outcome
Select the prediction type (Forecast, Classify, Estimate, or Detect). Then choose your data source — a KPI or custom SQL.
Step 2: Configure
- Target column — the column you want to predict (not needed for anomaly detection)
- Feature columns — columns the model should learn from (auto-selected if omitted)
- Time column — required for forecasting
- Forecast horizon — how many periods to predict ahead (forecasting only)
Step 3: Train
Click Train and the system:
- Validates data quality — checks for sufficient rows, missing values, class imbalance
- Selects the best model — based on data size and prediction type
- Trains the model — hyperparameters tuned automatically
- Writes predictions — output table created in
prediction_dataschema - Registers with FAISS — predictions become queryable via natural language
Data Quality Gates
Actyze validates your data before training begins:
Blocking Issues (training won't start)
- Fewer than the minimum required rows (e.g., 60 data points for a 30-day forecast)
- Target column has too many missing values
- Extreme class imbalance (e.g., 99.5% / 0.5% split)
Warnings (training proceeds with caution)
- Low class balance confidence
- Time-aggregated data used for classification (may need entity-level columns)
- Limited number of distinct classes
All issues are shown in the UI with specific, actionable guidance.
ML Models
Actyze deploys three ML workers as separate containers. Each can be scaled independently.
XGBoost Worker
- Tasks: classification, regression, anomaly detection (Isolation Forest)
- When used: default for all prediction types, any data size
- Docker image:
prediction-worker-xgboost
LightGBM Worker
- Tasks: classification, regression
- When used: preferred for large datasets (>100K rows) for faster training
- Docker image:
prediction-worker-lightgbm
AutoGluon Worker
- Tasks: time-series forecasting (univariate and multivariate)
- When used: preferred for forecasting when deployed (superior accuracy via ensemble)
- Docker image:
prediction-worker-autogluon - Note: larger image size (~2GB), optional deployment
Model Selection Logic
Forecast?
→ AutoGluon deployed? → AutoGluon
→ else → XGBoost
Classify or Estimate?
→ Rows > 100K and LightGBM deployed? → LightGBM
→ else → XGBoost
Detect Anomalies?
→ Always XGBoost (Isolation Forest)
Training Triggers
| Trigger | Description | Best for |
|---|---|---|
| After KPI Collection | Retrain automatically when the linked KPI gets new data | Production cadence — predictions stay current with latest data |
| Scheduled | Retrain every N hours (1–720) | When you want a fixed retraining interval |
| Manual | Retrain on demand via the UI | Experimentation, one-off analysis |
Accuracy Display
Actyze shows business-friendly accuracy instead of raw ML metrics:
| Type | Example display |
|---|---|
| Forecast | "Predictions within ±8% of actual values" |
| Classify | "Correctly identifies 92% of churned customers" |
| Estimate | "Estimates within ±$12.50 of actual values" |
| Detect | "Found 47 anomalies (10.2%) in 461 rows" |
Querying Predictions
Prediction output tables are stored in the prediction_data schema (e.g., pred_revenue_forecast_30d) and registered with FAISS. You can:
- Ask in natural language: "Show me the churn predictions" — the AI discovers the prediction table automatically
- Explore in Queries: click "Explore in Queries" on a completed run to open the full query page with charts and CSV export
- Get Recommendations: click "Get Recommendations" to have the LLM analyze results and suggest actionable next steps
Configuration
Environment Variables
| Variable | Default | Description |
|---|---|---|
PREDICTION_WORKER_XGBOOST_URL | http://prediction-worker-xgboost:8000 | XGBoost worker endpoint |
PREDICTION_WORKER_LIGHTGBM_URL | http://prediction-worker-lightgbm:8000 | LightGBM worker endpoint |
PREDICTION_WORKER_AUTOGLUON_URL | http://prediction-worker-autogluon:8000 | AutoGluon worker endpoint |
Helm Deployment
Each worker is an independently deployable Helm subchart:
# values.yaml
predictionWorkerXgboost:
enabled: true
replicas: 1
resources:
requests:
cpu: 500m
memory: 1Gi
predictionWorkerLightgbm:
enabled: true
replicas: 1
predictionWorkerAutogluon:
enabled: false # Enable for forecasting
replicas: 1
resources:
requests:
cpu: 1000m
memory: 2Gi
Docker Compose
Workers are included in docker-compose.yml and start automatically:
prediction-worker-xgboost:
build: ./docker/prediction-worker-xgboost
ports:
- "8010:8000"
prediction-worker-lightgbm:
build: ./docker/prediction-worker-lightgbm
ports:
- "8011:8000"
prediction-worker-autogluon:
build: ./docker/prediction-worker-autogluon
ports:
- "8012:8000"
Industry Use Cases
| Industry | Forecast | Classify | Estimate | Detect |
|---|---|---|---|---|
| E-commerce | Demand forecasting, revenue projections | Customer churn, fraud detection | Customer lifetime value, lead scoring | Unusual transactions, pricing anomalies |
| SaaS | MRR/ARR forecasting, usage trends | Churn risk, trial-to-paid conversion | Expansion revenue, support ticket priority | Usage spikes, abnormal API patterns |
| Finance | Cash flow forecasting, market trends | Credit risk, transaction fraud | Portfolio value, risk scoring | Suspicious transactions, market anomalies |
| Healthcare | Patient volume, resource planning | Readmission risk, diagnosis classification | Treatment cost, length of stay | Abnormal lab results, billing outliers |
| Manufacturing | Production demand, supply planning | Quality defect detection, equipment failure | Maintenance cost, yield optimization | Sensor anomalies, process deviations |
| Logistics | Shipment volume, delivery time | Route optimization, delay risk | Freight cost, warehouse capacity | Delivery exceptions, fleet irregularities |