Metadata Descriptions
Metadata descriptions provide organization-level context for your databases, schemas, tables, and columns. Well-written metadata significantly improves AI recommendation accuracy and helps users understand your data.
Overview
Metadata in Actyze serves multiple purposes:
- Improves AI recommendations - Better semantic understanding of data
- Documents data assets - Human-readable descriptions for all users
- Clarifies business context - Explains what data means in your organization
- Reduces ambiguity - Distinguishes between similar-sounding tables
How It Works
Without Metadata
Query: "Show customer transactions"
AI analyzes table names:
- customers_v2 (unclear what v2 means)
- txn_archive (abbreviation unclear)
- payment_data (is this transactions?)
Result: May choose wrong table or ask for clarification
With Metadata
Query: "Show customer transactions"
AI reads metadata:
- customers_v2: "Current active customers - replaces deprecated customers table"
- txn_archive: "Historical transactions older than 2 years - use for compliance only"
- payment_data: "Real-time customer payment transactions - primary source for sales analysis"
Result: Confidently selects payment_data (best match)
Metadata Levels
Database-Level Metadata
Describes the overall purpose of a database:
Database: postgres
Description: "Primary operational database containing customer, order, and product data. Updated in real-time. Use for current business intelligence and reporting."
Best For:
- Explaining database purpose
- Data freshness information
- Primary use cases
Schema-Level Metadata
Describes groups of related tables:
Schema: postgres.analytics
Description: "Denormalized reporting tables optimized for BI queries. Refreshed nightly at 2 AM EST. Contains aggregated metrics from last 3 years."
Best For:
- Data refresh schedules
- Schema-specific conventions
- Performance characteristics
Table-Level Metadata
Describes what a specific table contains:
Table: postgres.analytics.customer_revenue
Description: "Customer lifetime value and revenue metrics. One row per customer. Includes total revenue, average order value, and purchase frequency. Updated daily."
Key Columns:
- customer_id: Unique customer identifier
- total_revenue: Sum of all purchases (USD)
- avg_order_value: Average purchase amount (USD)
- purchase_count: Total number of orders
- last_purchase_date: Date of most recent order
Best For:
- Table purpose and grain
- Key business metrics
- Important relationships
Column-Level Metadata
Describes individual columns in detail:
Column: postgres.analytics.customer_revenue.segment
Description: "Customer segmentation based on lifetime value: 'high' (>$10k), 'medium' ($1k-$10k), 'low' (<$1k). Recalculated monthly."
Format: String (enum: 'high', 'medium', 'low')
Business Rules: Based on total_revenue column
Last Updated: Monthly, first day of month
Best For:
- Column meaning and format
- Valid values and ranges
- Calculation methodology
Adding Metadata
Via User Interface
Step 1: Navigate to Data Catalog
- Go to Data Intelligence → Data Catalog
- Browse to database → schema → table
- Click on asset to view details
Step 2: Add Description
- Click Edit Metadata button
- Enter description in text field
- Add column descriptions (optional)
- Click Save
Step 3: Verify
- Description appears in catalog
- AI uses description for recommendations
- All users can see metadata
Via API
Add Database Metadata:
curl -X POST https://your-actyze.com/api/metadata \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"database_name": "postgres",
"description": "Primary operational database with real-time customer and transaction data",
"tags": ["production", "real-time", "customer-data"]
}'
Add Table Metadata:
curl -X POST https://your-actyze.com/api/metadata \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"database_name": "postgres",
"schema_name": "analytics",
"table_name": "customer_revenue",
"description": "Customer lifetime value metrics - one row per customer, updated daily",
"business_owner": "Sales Analytics Team",
"data_classification": "Internal",
"tags": ["customer", "revenue", "kpi"]
}'
Add Column Metadata:
curl -X POST https://your-actyze.com/api/metadata \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"database_name": "postgres",
"schema_name": "analytics",
"table_name": "customer_revenue",
"column_name": "segment",
"description": "Customer value segment: high (>$10k), medium ($1k-$10k), low (<$1k)",
"data_type": "VARCHAR",
"is_pii": false,
"valid_values": ["high", "medium", "low"]
}'
Bulk Upload Metadata:
curl -X POST https://your-actyze.com/api/metadata/bulk \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"metadata": [
{
"database_name": "postgres",
"schema_name": "analytics",
"table_name": "customer_revenue",
"column_name": "customer_id",
"description": "Unique customer identifier - FK to customers.id"
},
{
"database_name": "postgres",
"schema_name": "analytics",
"table_name": "customer_revenue",
"column_name": "total_revenue",
"description": "Sum of all customer purchases in USD"
}
]
}'
Writing Effective Metadata
Best Practices
Be Specific and Clear:
❌ Bad: "Customer data"
✓ Good: "Active customers who made purchases in last 12 months - one row per customer"
Include Business Context:
❌ Bad: "Sales table"
✓ Good: "Daily sales transactions from POS system - use for revenue reporting and trend analysis"
Document Freshness:
❌ Bad: "Order information"
✓ Good: "Customer orders - real-time updates via CDC from operational DB, 5-second lag"
Explain Abbreviations:
❌ Bad: "TXN archive"
✓ Good: "Transaction (TXN) archive - historical payment records older than 2 years"
Note Important Caveats:
❌ Bad: "Product catalog"
✓ Good: "Product catalog - excludes discontinued items (see product_archive for historical products)"
Metadata Template
Use this template for consistent table metadata:
[Table Name]: [Brief purpose - one sentence]
Grain: [What does one row represent?]
Source: [Where does data come from?]
Refresh: [How often is data updated?]
Coverage: [Date range or scope]
Use For: [Primary business use cases]
Avoid: [What NOT to use this table for]
Key Columns:
- [column1]: [description]
- [column2]: [description]
Related Tables:
- [table1]: [relationship]
- [table2]: [relationship]
Example:
customer_revenue: Customer lifetime value and purchase metrics for revenue analysis
Grain: One row per active customer
Source: Aggregated from sales.transactions table
Refresh: Daily at 3 AM EST
Coverage: All customers with at least one purchase
Use For: Customer segmentation, lifetime value analysis, revenue reporting
Avoid: Real-time sales data (use sales.transactions instead)
Key Columns:
- customer_id: Unique identifier - FK to customers.id
- total_revenue: Sum of all purchases (USD)
- purchase_count: Number of completed orders
- segment: Value tier (high/medium/low based on total_revenue)
Related Tables:
- customers: Customer demographics and contact info
- sales.transactions: Individual purchase records
Metadata Organization
Prioritization
Focus on:
- Most-queried tables - High-traffic data needs good metadata
- Ambiguous names - Tables with unclear purposes (e.g., "data_v2", "temp_analytics")
- Similar tables - Distinguish between similar options (e.g., customers vs customer_archive)
- Complex calculations - Explain derived metrics and business logic
Lower priority:
- Internal/system tables
- Obvious table names (e.g., "customers" in "customers" table)
- Rarely-accessed archives
Tagging Strategy
Use tags for:
- Data domain: customer, product, sales, finance
- Environment: production, staging, archive
- Freshness: real-time, daily, weekly, monthly
- Sensitivity: pii, confidential, internal, public
- Team ownership: sales-team, engineering, analytics
Example:
{
"table_name": "customer_revenue",
"tags": [
"customer",
"revenue",
"daily",
"internal",
"sales-team"
]
}
Governance
Assign metadata owners:
- Database admins for database-level metadata
- Data engineers for schema-level metadata
- Domain experts for table/column metadata
Regular reviews:
- Quarterly metadata audits
- Update descriptions when data changes
- Remove metadata for deprecated tables
Use Cases
Scenario 1: Disambiguating Similar Tables
Problem: Multiple customer tables exist with unclear purposes.
Without Metadata:
Tables:
- customers
- customers_v2
- customers_archive
- customer_master
User asks: "Show all customers"
AI: Unclear which table to use
With Metadata:
- customers: "Deprecated - do not use, replaced by customers_v2"
- customers_v2: "Current active customers - primary source for customer data"
- customers_archive: "Inactive customers from legacy system - historical reference only"
- customer_master: "Master data management staging - for ETL only, not for analysis"
User asks: "Show all customers"
AI: Confidently selects customers_v2
Scenario 2: Complex Metrics
Problem: Table contains calculated fields with unclear methodology.
Without Metadata:
Column: customer_ltv
User asks: "How is lifetime value calculated?"
No documentation available
With Metadata:
Column: customer_ltv
Description: "Customer lifetime value prediction based on historical purchase behavior.
Calculation: (Average Order Value × Purchase Frequency × Average Customer Lifespan) - Customer Acquisition Cost.
Model updated quarterly. Use for marketing budget allocation and customer segmentation."
Scenario 3: Data Freshness
Problem: Users don't know if data is current or how often it updates.
Without Metadata:
Table: sales_summary
Unknown: When was it last updated?
With Metadata:
Table: sales_summary
Description: "Daily sales aggregations by product and region. Updated every morning at 6 AM EST with previous day's data.
For real-time sales, use sales.transactions table instead."
Viewing Metadata
Data Catalog Browser
Navigate:
- Go to Data Intelligence → Data Catalog
- Browse database structure
- Click on any asset to view metadata
View includes:
- Description
- Tags
- Business owner
- Data classification
- Last updated
- Usage statistics
API
Get Table Metadata:
curl -X GET "https://your-actyze.com/api/metadata?database=postgres&schema=analytics&table=customer_revenue" \
-H "Authorization: Bearer YOUR_TOKEN"
Response:
{
"database_name": "postgres",
"schema_name": "analytics",
"table_name": "customer_revenue",
"description": "Customer lifetime value metrics - one row per customer, updated daily",
"business_owner": "Sales Analytics Team",
"data_classification": "Internal",
"tags": ["customer", "revenue", "kpi"],
"columns": [
{
"column_name": "customer_id",
"description": "Unique customer identifier - FK to customers.id",
"data_type": "INTEGER",
"is_pii": false
},
{
"column_name": "total_revenue",
"description": "Sum of all customer purchases in USD",
"data_type": "DECIMAL(10,2)",
"is_pii": false
}
],
"created_at": "2025-01-10T08:00:00Z",
"updated_at": "2025-02-01T10:30:00Z",
"updated_by": "admin@company.com"
}
Search Metadata:
curl -X GET "https://your-actyze.com/api/metadata/search?q=customer+revenue&tags=kpi" \
-H "Authorization: Bearer YOUR_TOKEN"
Managing Metadata
Update Metadata
UI:
- Navigate to asset in Data Catalog
- Click Edit Metadata
- Modify description, tags, or other fields
- Click Save
API:
curl -X PATCH https://your-actyze.com/api/metadata/{metadata_id} \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"description": "Updated description with more context",
"tags": ["customer", "revenue", "kpi", "sales"]
}'
Delete Metadata
API:
curl -X DELETE https://your-actyze.com/api/metadata/{metadata_id} \
-H "Authorization: Bearer YOUR_TOKEN"
Note: Only delete metadata when the data asset itself is removed.
Version History
Track metadata changes over time:
curl -X GET https://your-actyze.com/api/metadata/{metadata_id}/history \
-H "Authorization: Bearer YOUR_TOKEN"
Impact on AI Recommendations
Semantic Search Enhancement
Metadata descriptions are included in semantic similarity calculations:
Query: "Show high-value customers"
Without metadata:
Matches "customers" table (generic name match)
With metadata:
"customer_revenue table - includes customer segmentation
by lifetime value: high (>$10k), medium, low"
Matches "customer_revenue" table (semantic + metadata)
Disambiguation
When multiple tables match a query, metadata provides tiebreaker:
Query: "Show product sales"
Tables:
1. sales.transactions (score: 0.85)
Metadata: "Individual sales transactions - real-time"
2. sales.product_summary (score: 0.82)
Metadata: "Daily product sales aggregations - use for reporting"
AI: For reporting query → selects product_summary
For transactional query → selects transactions
Column Selection
Column metadata helps AI choose relevant fields:
Query: "Show customer names and revenue"
Table: customer_revenue (50 columns)
Column metadata guides selection:
- customer_name: "Full customer name"
- total_revenue: "Lifetime revenue (USD)"
AI selects: customer_name, total_revenue
(Instead of customer_display_name, revenue_ytd, etc.)
Permissions
| Action | USER | ADMIN | READONLY |
|---|---|---|---|
| View metadata | ✓ | ✓ | ✓ |
| Add metadata | ✓ | ✓ | ✗ |
| Edit metadata | ✓ | ✓ | ✗ |
| Delete metadata | ✓ | ✓ | ✗ |
| View audit history | ✓ | ✓ | ✗ |
Note: All users can view metadata. Only USER and ADMIN can modify.
Integration with Schema Service
FAISS Vector Search
Metadata descriptions are embedded and indexed:
# Schema Service process
1. Extract metadata descriptions
2. Generate embeddings using sentence transformers
3. Index in FAISS vector database
4. Enable semantic similarity search
Named Entity Recognition (NER)
spaCy NER extracts entities from metadata:
Metadata: "Customer revenue in USD for Q1 2025 sales analysis"
Entities extracted:
- MONEY: USD
- DATE: Q1 2025
- ORG: sales
Enhances: Query matching for date/money/org references
Best Practices
Start with High-Impact Tables
Priority 1: Top 20% most-queried tables
# Get query statistics
curl -X GET https://your-actyze.com/api/analytics/table-usage \
-H "Authorization: Bearer ADMIN_TOKEN"
Priority 2: Tables with ambiguous names
Examples: data_v2, temp_analytics, staging_table
Priority 3: Domain-specific tables
Finance, HR, PII-containing tables
Maintain Consistency
Use standard format:
- Start with purpose (one sentence)
- Explain grain (what's a row?)
- Note refresh frequency
- List key use cases
Consistent terminology:
- Use organizational vocabulary
- Match business definitions
- Align with data dictionary
Keep Current
Update when:
- Schema changes
- Business logic changes
- New tables added
- Tables deprecated
Review schedule:
- Monthly: High-impact tables
- Quarterly: All tables
- Ad-hoc: When users report issues
Encourage Contribution
Enable self-service:
- Train users to add metadata
- Provide metadata templates
- Make UI easy to use
Recognize contributors:
- Track metadata additions
- Highlight top contributors
- Reward quality documentation
Troubleshooting
Metadata Not Improving Recommendations
Check:
- Metadata is descriptive enough (not just table name)
- Metadata includes relevant keywords
- Metadata is semantically rich
Improve:
❌ Bad: "Customer table"
✓ Good: "Active customers with purchase history - use for customer segmentation and lifetime value analysis"
Cannot Add Metadata
Error: "Permission denied" or "Write access required"
Cause: READONLY users cannot add metadata.
Solution: Upgrade to USER role.
Metadata Not Showing in Catalog
Check:
- Metadata was saved successfully
- User has access to the table
- Browser cache (clear and refresh)
Additional Resources
- Schema Boosting - User-level preferences
- RBAC - User roles and permissions
- Database Connectors - Connect data sources
- API Reference - Complete API documentation
Support
For metadata issues:
- Verify metadata is saved
- Check user permissions (USER or ADMIN required)
- Review metadata format and content
- Test with example queries
- Check FAISS indexing status (admin)