Contributing to the Semantic Layer
Actyze's semantic layer is what turns natural-language questions into accurate SQL. It relies on community-contributed knowledge packs that teach the system about industry-specific terminology, relationships, metrics, and query patterns.
This guide explains each contribution type and how to submit yours.
Synonym Packs
Synonym packs map business terms to database column names so the AI understands domain-specific language.
Each pack is a YAML file organised by industry:
# synonyms/ecommerce.yaml
synonyms:
- term: "revenue"
columns: ["total_amount", "order_total", "gross_revenue", "net_sales"]
- term: "customer"
columns: ["buyer", "client", "account", "purchaser"]
- term: "SKU"
columns: ["product_id", "item_code", "product_code"]
- term: "churn"
columns: ["is_churned", "churn_flag", "subscription_status"]
Guidelines:
- Group synonyms by industry (e-commerce, SaaS, finance, healthcare, etc.).
- Include common abbreviations and acronyms.
- Keep mappings general enough to apply across different schemas.
Relationship Graph (Implemented)
The relationship graph is a PostgreSQL-backed store of table-to-table JOIN relationships that the LLM uses during SQL generation. Instead of guessing JOINs from column names, it gets verified relationship paths with confidence scores.
How It Works
Relationships are populated through three layers:
| Layer | How | Confidence | Effort |
|---|---|---|---|
| Convention inference | Auto-detects *_id, *_uuid, *_fk patterns | 0.4–0.6 | Zero (runs in background) |
| Query history mining | Parses JOINs from successful past SQL | 0.5–0.9 | Zero (runs in background) |
| Admin curation | Manual entry via Data Intelligence → Relationships UI | 1.0 | Low (point and click) |
Managing Relationships
- Go to Data Intelligence → Relationships tab
- Use Auto-Infer to detect relationships from column naming conventions
- Use Mine History to extract JOIN patterns from past queries
- Use + Add to manually create relationships
- Verify (✓) correct relationships → confidence becomes 100%
- Disable (🚫) incorrect ones → excluded from LLM prompts
Configuration
# Enable/disable in docker-compose.yml or Helm values
RELATIONSHIP_GRAPH_ENABLED=true # Default: true
When disabled, falls back to column-name inference (the previous behavior).
API Endpoints
GET /api/relationships # List (filterable by catalog/method)
GET /api/relationships/graph/path?tables=a,b # Find optimal join path (BFS)
POST /api/relationships # Create (admin)
POST /api/relationships/{id}/verify # Verify → confidence=1.0 (admin)
POST /api/relationships/{id}/disable # Soft delete (admin)
POST /api/relationships/infer # Trigger convention inference (admin)
POST /api/relationships/mine # Mine query history (admin)
Verified Query Templates
Verified query templates are pre-approved SQL patterns for common business questions. They serve as few-shot examples that improve accuracy for frequently asked queries.
# templates/saas_metrics.yaml
templates:
- question: "What is the monthly recurring revenue?"
sql: |
SELECT
DATE_TRUNC('month', payment_date) AS month,
SUM(amount) AS mrr
FROM subscriptions
WHERE status = 'active'
GROUP BY 1
ORDER BY 1
tags: ["saas", "revenue", "mrr"]
- question: "What is the customer churn rate?"
sql: |
SELECT
DATE_TRUNC('month', churned_at) AS month,
COUNT(*) * 100.0 / LAG(COUNT(*)) OVER (ORDER BY DATE_TRUNC('month', churned_at)) AS churn_rate
FROM customers
WHERE churned_at IS NOT NULL
GROUP BY 1
ORDER BY 1
tags: ["saas", "churn", "retention"]
Guidelines:
- Use standard SQL (ANSI) where possible; note dialect-specific syntax.
- Include multiple phrasings of the same question in the
questionfield or as aliases. - Tag templates so the system can match them to the right context.
KPI Definition Packs
KPI packs define standard metrics per industry with their formulas, so the AI can compute them correctly without guessing.
# kpis/saas.yaml
kpis:
- name: "MRR (Monthly Recurring Revenue)"
formula: "SUM(amount) WHERE status = 'active' GROUP BY month"
description: "Total recurring revenue normalised to a monthly period."
- name: "CAC (Customer Acquisition Cost)"
formula: "SUM(marketing_spend + sales_spend) / COUNT(new_customers)"
description: "Average cost to acquire a new customer."
- name: "LTV (Lifetime Value)"
formula: "AVG(revenue_per_customer) * AVG(customer_lifespan_months)"
description: "Predicted total revenue from a customer over their lifetime."
# kpis/ecommerce.yaml
kpis:
- name: "AOV (Average Order Value)"
formula: "SUM(order_total) / COUNT(DISTINCT order_id)"
description: "Average revenue per order."
- name: "Cart Abandonment Rate"
formula: "(COUNT(carts_created) - COUNT(orders_completed)) / COUNT(carts_created) * 100"
description: "Percentage of shopping carts that do not convert to orders."
Available industry packs: SaaS, E-commerce, Finance, Healthcare.
How to Test Your Contributions
-
Fork the repository and create a branch for your contribution.
-
Validate YAML syntax:
pip install yamllint
yamllint your-file.yaml -
Run the test suite to ensure your additions do not break existing behaviour:
make test-semantic -
Test with sample queries — connect a database with a matching schema and verify that your synonyms, heuristics, or templates improve query accuracy.
-
Submit a pull request with:
- A clear description of the industry or use case your contribution covers.
- Example questions that are now answered correctly thanks to your changes.
- Any edge cases or limitations you are aware of.
File Organisation
semantic/
├── synonyms/
│ ├── ecommerce.yaml
│ ├── saas.yaml
│ ├── finance.yaml
│ └── healthcare.yaml
├── relationships/
│ └── naming_conventions.yaml
├── templates/
│ ├── saas_metrics.yaml
│ ├── ecommerce_analytics.yaml
│ └── finance_reporting.yaml
└── kpis/
├── saas.yaml
├── ecommerce.yaml
├── finance.yaml
└── healthcare.yaml