Supported data sources, warehouses, and metadata
Audience: Anyone choosing connectors, planning ingestion, or understanding how schema inventory is built.
Native database connections (/db-connections)
The connection form supports the base types below. Your workspace may restrict which types appear (admin configuration).
Relational and MPP
PostgreSQL, MySQL, Oracle, Amazon Redshift, Teradata, Hive.
Cloud warehouses
Snowflake, Google BigQuery.
Streaming, SaaS, and object storage
Apache Kafka, Salesforce, Amazon S3, Google Cloud Storage, Azure Blob Storage.
Lake: Apache Iceberg
Iceberg connections support catalog modes such as REST, AWS Glue, Hive Metastore, and Nessie, with storage on S3, GCS, or Azure. Pick the combination that matches your deployment.
Other
Apache Ozone.
Use Test Connection when available to validate credentials and reachability before saving.
For a single table of every native type with required / typical fields, open Database Connections and scroll to Supported database types — required / typical fields.
Data Ingestion (Airbyte-backed)
Data Ingestion uses the Airbyte connector catalog: hundreds of sources and destinations (exact pairs are catalog- and version-dependent). Common patterns include relational databases, Redshift, Kafka, Iceberg, cloud warehouses (BigQuery, Snowflake, Databricks, etc.), and object stores (S3, GCS, Azure Blob), plus many SaaS systems.
Workspace database connections often appear as ingestion sources when the connector mapping supports them—so you can reuse credentials you already manage in Dagen.
Metadata extraction
From the Database Connections UI
On each saved connection card you can:
- Test Connection — verify the link is healthy.
- Extract Metadata — scan schemas and objects into Dagen’s inventory for modeling, agents, and downstream features.
How agents use metadata
Agents follow a progressive disclosure pattern:
- Lightweight discovery — list schemas and tables before pulling full column statistics. This keeps latency low on large catalogs.
- Structured extractors — for common warehouses and databases, the platform uses optimized paths (for example Snowflake, Redshift, PostgreSQL, Oracle) tied to your connection id, so agents get reliable column and key information for modeling and SQL.
- Broader extraction — deeper scans may run when you explicitly ask for profiling, lineage-style questions, or when a tool runs a full metadata extraction flow.
- Downstream use — results feed Data Modeling, Database Explorer, Data Insights, Knowledge Base graph enrichment where enabled, and ingestion planning.
Practical guidance
- If agents “don’t see” a new table, run Extract Metadata on the connection card first.
- For Iceberg / Kafka / object storage, behaviour is type-dependent—confirm Test Connection passes before expecting rich schema cards.
- For “what can I sync?” combine this native reference with the Airbyte catalog inside Data Ingestion.
Quick routing (what to tell users)
| Need | Point to |
|---|---|
| OLTP / classic warehouses | PostgreSQL, MySQL, Oracle, Redshift, Hive, Teradata |
| Cloud DW | Snowflake, BigQuery, Redshift |
| Streaming | Kafka |
| Lake tables | Iceberg + catalog + object store |
| “What can I sync?” | Airbyte catalog in Data Ingestion + the native types above |
Related
- Database Connections — Add and manage connections
- Data Ingestion — Syncs, schedules, runtimes
- Data Modeling — Schemas and AI-assisted design