Skip to main content

Metadata Descriptions

Metadata descriptions provide organization-level context for your databases, schemas, tables, and columns. Well-written metadata significantly improves AI recommendation accuracy and helps users understand your data.

Overview

Metadata in Actyze serves multiple purposes:

  • Improves AI recommendations - Better semantic understanding of data
  • Documents data assets - Human-readable descriptions for all users
  • Clarifies business context - Explains what data means in your organization
  • Reduces ambiguity - Distinguishes between similar-sounding tables

How It Works

Without Metadata

Query: "Show customer transactions"

AI analyzes table names:
- customers_v2 (unclear what v2 means)
- txn_archive (abbreviation unclear)
- payment_data (is this transactions?)

Result: May choose wrong table or ask for clarification

With Metadata

Query: "Show customer transactions"

AI reads metadata:
- customers_v2: "Current active customers - replaces deprecated customers table"
- txn_archive: "Historical transactions older than 2 years - use for compliance only"
- payment_data: "Real-time customer payment transactions - primary source for sales analysis"

Result: Confidently selects payment_data (best match)

Metadata Levels

Database-Level Metadata

Describes the overall purpose of a database:

Database: postgres
Description: "Primary operational database containing customer, order, and product data. Updated in real-time. Use for current business intelligence and reporting."

Best For:

  • Explaining database purpose
  • Data freshness information
  • Primary use cases

Schema-Level Metadata

Describes groups of related tables:

Schema: postgres.analytics
Description: "Denormalized reporting tables optimized for BI queries. Refreshed nightly at 2 AM EST. Contains aggregated metrics from last 3 years."

Best For:

  • Data refresh schedules
  • Schema-specific conventions
  • Performance characteristics

Table-Level Metadata

Describes what a specific table contains:

Table: postgres.analytics.customer_revenue
Description: "Customer lifetime value and revenue metrics. One row per customer. Includes total revenue, average order value, and purchase frequency. Updated daily."

Key Columns:
- customer_id: Unique customer identifier
- total_revenue: Sum of all purchases (USD)
- avg_order_value: Average purchase amount (USD)
- purchase_count: Total number of orders
- last_purchase_date: Date of most recent order

Best For:

  • Table purpose and grain
  • Key business metrics
  • Important relationships

Column-Level Metadata

Describes individual columns in detail:

Column: postgres.analytics.customer_revenue.segment
Description: "Customer segmentation based on lifetime value: 'high' (>$10k), 'medium' ($1k-$10k), 'low' (<$1k). Recalculated monthly."

Format: String (enum: 'high', 'medium', 'low')
Business Rules: Based on total_revenue column
Last Updated: Monthly, first day of month

Best For:

  • Column meaning and format
  • Valid values and ranges
  • Calculation methodology

Adding Metadata

Via User Interface

Step 1: Navigate to Data Catalog

  1. Go to Data IntelligenceData Catalog
  2. Browse to database → schema → table
  3. Click on asset to view details

Step 2: Add Description

  1. Click Edit Metadata button
  2. Enter description in text field
  3. Add column descriptions (optional)
  4. Click Save

Step 3: Verify

  • Description appears in catalog
  • AI uses description for recommendations
  • All users can see metadata

Via API

Add Database Metadata:

curl -X POST https://your-actyze.com/api/metadata \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"database_name": "postgres",
"description": "Primary operational database with real-time customer and transaction data",
"tags": ["production", "real-time", "customer-data"]
}'

Add Table Metadata:

curl -X POST https://your-actyze.com/api/metadata \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"database_name": "postgres",
"schema_name": "analytics",
"table_name": "customer_revenue",
"description": "Customer lifetime value metrics - one row per customer, updated daily",
"business_owner": "Sales Analytics Team",
"data_classification": "Internal",
"tags": ["customer", "revenue", "kpi"]
}'

Add Column Metadata:

curl -X POST https://your-actyze.com/api/metadata \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"database_name": "postgres",
"schema_name": "analytics",
"table_name": "customer_revenue",
"column_name": "segment",
"description": "Customer value segment: high (>$10k), medium ($1k-$10k), low (<$1k)",
"data_type": "VARCHAR",
"is_pii": false,
"valid_values": ["high", "medium", "low"]
}'

Bulk Upload Metadata:

curl -X POST https://your-actyze.com/api/metadata/bulk \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"metadata": [
{
"database_name": "postgres",
"schema_name": "analytics",
"table_name": "customer_revenue",
"column_name": "customer_id",
"description": "Unique customer identifier - FK to customers.id"
},
{
"database_name": "postgres",
"schema_name": "analytics",
"table_name": "customer_revenue",
"column_name": "total_revenue",
"description": "Sum of all customer purchases in USD"
}
]
}'

Writing Effective Metadata

Best Practices

Be Specific and Clear:

❌ Bad: "Customer data"
✓ Good: "Active customers who made purchases in last 12 months - one row per customer"

Include Business Context:

❌ Bad: "Sales table"
✓ Good: "Daily sales transactions from POS system - use for revenue reporting and trend analysis"

Document Freshness:

❌ Bad: "Order information"
✓ Good: "Customer orders - real-time updates via CDC from operational DB, 5-second lag"

Explain Abbreviations:

❌ Bad: "TXN archive"
✓ Good: "Transaction (TXN) archive - historical payment records older than 2 years"

Note Important Caveats:

❌ Bad: "Product catalog"
✓ Good: "Product catalog - excludes discontinued items (see product_archive for historical products)"

Metadata Template

Use this template for consistent table metadata:

[Table Name]: [Brief purpose - one sentence]

Grain: [What does one row represent?]
Source: [Where does data come from?]
Refresh: [How often is data updated?]
Coverage: [Date range or scope]
Use For: [Primary business use cases]
Avoid: [What NOT to use this table for]

Key Columns:
- [column1]: [description]
- [column2]: [description]

Related Tables:
- [table1]: [relationship]
- [table2]: [relationship]

Example:

customer_revenue: Customer lifetime value and purchase metrics for revenue analysis

Grain: One row per active customer
Source: Aggregated from sales.transactions table
Refresh: Daily at 3 AM EST
Coverage: All customers with at least one purchase
Use For: Customer segmentation, lifetime value analysis, revenue reporting
Avoid: Real-time sales data (use sales.transactions instead)

Key Columns:
- customer_id: Unique identifier - FK to customers.id
- total_revenue: Sum of all purchases (USD)
- purchase_count: Number of completed orders
- segment: Value tier (high/medium/low based on total_revenue)

Related Tables:
- customers: Customer demographics and contact info
- sales.transactions: Individual purchase records

Metadata Organization

Prioritization

Focus on:

  1. Most-queried tables - High-traffic data needs good metadata
  2. Ambiguous names - Tables with unclear purposes (e.g., "data_v2", "temp_analytics")
  3. Similar tables - Distinguish between similar options (e.g., customers vs customer_archive)
  4. Complex calculations - Explain derived metrics and business logic

Lower priority:

  1. Internal/system tables
  2. Obvious table names (e.g., "customers" in "customers" table)
  3. Rarely-accessed archives

Tagging Strategy

Use tags for:

  • Data domain: customer, product, sales, finance
  • Environment: production, staging, archive
  • Freshness: real-time, daily, weekly, monthly
  • Sensitivity: pii, confidential, internal, public
  • Team ownership: sales-team, engineering, analytics

Example:

{
"table_name": "customer_revenue",
"tags": [
"customer",
"revenue",
"daily",
"internal",
"sales-team"
]
}

Governance

Assign metadata owners:

  • Database admins for database-level metadata
  • Data engineers for schema-level metadata
  • Domain experts for table/column metadata

Regular reviews:

  • Quarterly metadata audits
  • Update descriptions when data changes
  • Remove metadata for deprecated tables

Use Cases

Scenario 1: Disambiguating Similar Tables

Problem: Multiple customer tables exist with unclear purposes.

Without Metadata:

Tables:
- customers
- customers_v2
- customers_archive
- customer_master

User asks: "Show all customers"
AI: Unclear which table to use

With Metadata:

- customers: "Deprecated - do not use, replaced by customers_v2"
- customers_v2: "Current active customers - primary source for customer data"
- customers_archive: "Inactive customers from legacy system - historical reference only"
- customer_master: "Master data management staging - for ETL only, not for analysis"

User asks: "Show all customers"
AI: Confidently selects customers_v2

Scenario 2: Complex Metrics

Problem: Table contains calculated fields with unclear methodology.

Without Metadata:

Column: customer_ltv
User asks: "How is lifetime value calculated?"
No documentation available

With Metadata:

Column: customer_ltv
Description: "Customer lifetime value prediction based on historical purchase behavior.
Calculation: (Average Order Value × Purchase Frequency × Average Customer Lifespan) - Customer Acquisition Cost.
Model updated quarterly. Use for marketing budget allocation and customer segmentation."

Scenario 3: Data Freshness

Problem: Users don't know if data is current or how often it updates.

Without Metadata:

Table: sales_summary
Unknown: When was it last updated?

With Metadata:

Table: sales_summary
Description: "Daily sales aggregations by product and region. Updated every morning at 6 AM EST with previous day's data.
For real-time sales, use sales.transactions table instead."

Viewing Metadata

Data Catalog Browser

Navigate:

  1. Go to Data IntelligenceData Catalog
  2. Browse database structure
  3. Click on any asset to view metadata

View includes:

  • Description
  • Tags
  • Business owner
  • Data classification
  • Last updated
  • Usage statistics

API

Get Table Metadata:

curl -X GET "https://your-actyze.com/api/metadata?database=postgres&schema=analytics&table=customer_revenue" \
-H "Authorization: Bearer YOUR_TOKEN"

Response:

{
"database_name": "postgres",
"schema_name": "analytics",
"table_name": "customer_revenue",
"description": "Customer lifetime value metrics - one row per customer, updated daily",
"business_owner": "Sales Analytics Team",
"data_classification": "Internal",
"tags": ["customer", "revenue", "kpi"],
"columns": [
{
"column_name": "customer_id",
"description": "Unique customer identifier - FK to customers.id",
"data_type": "INTEGER",
"is_pii": false
},
{
"column_name": "total_revenue",
"description": "Sum of all customer purchases in USD",
"data_type": "DECIMAL(10,2)",
"is_pii": false
}
],
"created_at": "2025-01-10T08:00:00Z",
"updated_at": "2025-02-01T10:30:00Z",
"updated_by": "admin@company.com"
}

Search Metadata:

curl -X GET "https://your-actyze.com/api/metadata/search?q=customer+revenue&tags=kpi" \
-H "Authorization: Bearer YOUR_TOKEN"

Managing Metadata

Update Metadata

UI:

  1. Navigate to asset in Data Catalog
  2. Click Edit Metadata
  3. Modify description, tags, or other fields
  4. Click Save

API:

curl -X PATCH https://your-actyze.com/api/metadata/{metadata_id} \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"description": "Updated description with more context",
"tags": ["customer", "revenue", "kpi", "sales"]
}'

Delete Metadata

API:

curl -X DELETE https://your-actyze.com/api/metadata/{metadata_id} \
-H "Authorization: Bearer YOUR_TOKEN"

Note: Only delete metadata when the data asset itself is removed.

Version History

Track metadata changes over time:

curl -X GET https://your-actyze.com/api/metadata/{metadata_id}/history \
-H "Authorization: Bearer YOUR_TOKEN"

Impact on AI Recommendations

Semantic Search Enhancement

Metadata descriptions are included in semantic similarity calculations:

Query: "Show high-value customers"

Without metadata:
Matches "customers" table (generic name match)

With metadata:
"customer_revenue table - includes customer segmentation
by lifetime value: high (>$10k), medium, low"
Matches "customer_revenue" table (semantic + metadata)

Disambiguation

When multiple tables match a query, metadata provides tiebreaker:

Query: "Show product sales"

Tables:
1. sales.transactions (score: 0.85)
Metadata: "Individual sales transactions - real-time"

2. sales.product_summary (score: 0.82)
Metadata: "Daily product sales aggregations - use for reporting"

AI: For reporting query → selects product_summary
For transactional query → selects transactions

Column Selection

Column metadata helps AI choose relevant fields:

Query: "Show customer names and revenue"

Table: customer_revenue (50 columns)

Column metadata guides selection:
- customer_name: "Full customer name"
- total_revenue: "Lifetime revenue (USD)"

AI selects: customer_name, total_revenue
(Instead of customer_display_name, revenue_ytd, etc.)

Permissions

ActionUSERADMINREADONLY
View metadata
Add metadata
Edit metadata
Delete metadata
View audit history

Note: All users can view metadata. Only USER and ADMIN can modify.

Integration with Schema Service

Metadata descriptions are embedded and indexed:

# Schema Service process
1. Extract metadata descriptions
2. Generate embeddings using sentence transformers
3. Index in FAISS vector database
4. Enable semantic similarity search

Named Entity Recognition (NER)

spaCy NER extracts entities from metadata:

Metadata: "Customer revenue in USD for Q1 2025 sales analysis"

Entities extracted:
- MONEY: USD
- DATE: Q1 2025
- ORG: sales

Enhances: Query matching for date/money/org references

Best Practices

Start with High-Impact Tables

Priority 1: Top 20% most-queried tables

# Get query statistics
curl -X GET https://your-actyze.com/api/analytics/table-usage \
-H "Authorization: Bearer ADMIN_TOKEN"

Priority 2: Tables with ambiguous names

Examples: data_v2, temp_analytics, staging_table

Priority 3: Domain-specific tables

Finance, HR, PII-containing tables

Maintain Consistency

Use standard format:

  • Start with purpose (one sentence)
  • Explain grain (what's a row?)
  • Note refresh frequency
  • List key use cases

Consistent terminology:

  • Use organizational vocabulary
  • Match business definitions
  • Align with data dictionary

Keep Current

Update when:

  • Schema changes
  • Business logic changes
  • New tables added
  • Tables deprecated

Review schedule:

  • Monthly: High-impact tables
  • Quarterly: All tables
  • Ad-hoc: When users report issues

Encourage Contribution

Enable self-service:

  • Train users to add metadata
  • Provide metadata templates
  • Make UI easy to use

Recognize contributors:

  • Track metadata additions
  • Highlight top contributors
  • Reward quality documentation

Troubleshooting

Metadata Not Improving Recommendations

Check:

  1. Metadata is descriptive enough (not just table name)
  2. Metadata includes relevant keywords
  3. Metadata is semantically rich

Improve:

❌ Bad: "Customer table"
✓ Good: "Active customers with purchase history - use for customer segmentation and lifetime value analysis"

Cannot Add Metadata

Error: "Permission denied" or "Write access required"

Cause: READONLY users cannot add metadata.

Solution: Upgrade to USER role.

Metadata Not Showing in Catalog

Check:

  1. Metadata was saved successfully
  2. User has access to the table
  3. Browser cache (clear and refresh)

Additional Resources

Support

For metadata issues:

  1. Verify metadata is saved
  2. Check user permissions (USER or ADMIN required)
  3. Review metadata format and content
  4. Test with example queries
  5. Check FAISS indexing status (admin)