Skip to main content

Smart Relationship Graph

The relationship graph is Actyze's semantic layer for table connections. It stores verified JOIN relationships between tables and provides them to the LLM at query time, so AI-generated SQL uses correct JOINs instead of guessing.

Why It Matters

Without relationship context, LLMs frequently hallucinate JOIN conditions — guessing column names, inventing foreign keys, or joining on the wrong columns. The relationship graph solves this by giving the LLM a map of how your tables actually connect.

Before (without graph):

-- LLM guesses the join condition
SELECT * FROM orders JOIN customers ON orders.cust = customers.customer_id
-- Wrong: the actual column is orders.customer_id = customers.id

After (with graph):

-- LLM uses the verified relationship
SELECT * FROM orders JOIN customers ON orders.customer_id = customers.id
-- Correct: relationship graph provided the exact join condition

Three-Layer Discovery

Relationships are discovered and managed through three layers, each with increasing confidence:

Layer 1: Inferred (Automatic)

On first schema load, Actyze scans column names for foreign-key naming patterns:

PatternExampleInferred Relationship
*_id suffixorders.customer_idorders.customer_id → customers.id
*_fk suffixinvoices.order_fkinvoices.order_fk → orders.id
Exact table name matchorders.productorders.product → products.id

Confidence: 0.4–0.6 (medium — naming conventions aren't always correct)

Layer 2: Mined from Query History

Actyze parses every successful query using sqlglot to extract JOIN patterns:

-- This successful query:
SELECT o.id, c.name
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.total > 100

-- Produces this mined relationship:
-- orders.customer_id → customers.id (confidence: 0.55)

Confidence grows with frequency: min(0.9, 0.5 + 0.05 × occurrence_count)

After 8 successful queries using the same JOIN, confidence reaches 0.9.

Key behavior:

  • Mining is triggered manually via the UI or API (POST /api/relationships/mine)
  • Only successful queries are mined (failed queries are ignored)
  • Duplicate JOINs increase confidence rather than creating new entries

Layer 3: Admin-Verified

Admins can create, verify, or disable relationships through the Data Intelligence → Relationships tab:

  • Create — add a relationship with a custom join condition
  • Verify — mark an inferred or mined relationship as verified (highest confidence)
  • Disable — disable a problematic relationship without deleting it
  • Bulk import — upload relationships from CSV

Confidence: verified relationships get the highest priority regardless of score.

How the Graph Powers Queries

When a user asks a natural language question that spans multiple tables, Actyze uses the relationship graph in a three-tier fallback:

Tier 1: Direct Relationships

If the tables are directly connected, the exact join condition and relationship type are passed to the LLM:

"Show me orders with customer names"
→ Graph provides: orders.customer_id → customers.id (1:N, confidence: 0.85)
→ LLM generates: SELECT o.*, c.name FROM orders o JOIN customers c ON o.customer_id = c.id

Tier 2: Multi-Hop Path Finding (BFS)

If the tables aren't directly connected, BFS traversal finds the shortest path:

"Show orders with country names"
→ No direct relationship between orders and nations
→ BFS finds: orders → customers → nations
→ Graph provides both join conditions:
orders.customer_id → customers.id
customers.nation_id → nations.id
→ LLM generates the correct two-JOIN query

Tier 3: Schema Fallback

If no relationship path exists, the LLM falls back to full schema context and attempts to generate JOINs from column names alone. When this query succeeds, the JOIN pattern is captured for future mining.

New tables with no relationships yet
→ LLM uses full schema context
→ Query succeeds
→ Next mining run captures the JOIN pattern
→ Future queries use the mined relationship

This creates a self-improving loop: the more you query, the smarter the graph gets.

Relationship Metadata

Each relationship stores:

FieldDescription
source_tableThe table with the foreign key
target_tableThe referenced table
join_conditionExact SQL condition (e.g., source.customer_id = target.id)
relationship_type1:1, 1:N, N:1, or M:N
source_methodHow discovered: inferred, mined, or admin
confidenceScore from 0.0 to 1.0
is_verifiedWhether an admin has verified this relationship
is_disabledWhether this relationship is excluded from LLM context
usage_countHow many times used in AI-generated queries
last_used_atTimestamp of most recent use

Admin UI

Access via Data Intelligence → Relationships tab.

Filtering

  • Filter by source method (inferred / mined / admin)
  • Filter by table name
  • Filter by confidence threshold
  • Filter by verified / unverified status

Actions

  • Verify — mark a relationship as admin-verified
  • Disable — exclude from LLM context without deleting
  • Edit — modify join condition or relationship type
  • Delete — permanently remove a relationship
  • Bulk Import — upload from CSV with columns: source_catalog, source_schema, source_table, target_catalog, target_schema, target_table, join_condition, relationship_type

Usage Insights

The table shows usage_count and last_used_at for each relationship, so you can see which connections the AI relies on most.

API Endpoints

List Relationships

GET /api/relationships
GET /api/relationships?source_method=mined&min_confidence=0.7

Find Join Path

GET /api/relationships/graph/path?tables=orders,nations
# Returns: orders → customers → nations with join conditions

Trigger Mining

POST /api/relationships/mine
# Parses query history and creates/updates relationships

Create Relationship

POST /api/relationships
Content-Type: application/json

{
"source_catalog": "postgresql",
"source_schema": "public",
"source_table": "orders",
"target_catalog": "postgresql",
"target_schema": "public",
"target_table": "customers",
"join_condition": "orders.customer_id = customers.id",
"relationship_type": "N:1"
}

Configuration

VariableDefaultDescription
RELATIONSHIP_GRAPH_ENABLEDtrueInclude relationship context in LLM prompts

When disabled, the LLM falls back to schema-only context (Tier 3 behavior).

Best Practices

Getting Started

  1. Enable the graph (on by default) — inferred relationships are created on first schema load
  2. Run mining after you have some query history — click "Mine Query History" in the Relationships tab
  3. Verify key relationships — verify the most important joins so they always rank highest

Ongoing Maintenance

  • Re-mine periodically — as users write new queries, new JOIN patterns emerge
  • Disable bad relationships — if the AI generates wrong JOINs, check for incorrect relationships and disable them
  • Review low-confidence entries — relationships with confidence < 0.5 may be unreliable; verify or disable them

For Large Schemas

  • Focus on high-traffic tables — verify relationships for tables that appear in the most queries
  • Use bulk import — if you have ERD documentation or a data dictionary, export it as CSV and import
  • Monitor usage tracking — relationships with zero usage after weeks may be candidates for cleanup