Supported data sources, warehouses, and metadata

Audience: Anyone choosing connectors, planning ingestion, or understanding how schema inventory is built.

Native database connections (`/db-connections`)

The connection form supports the base types below. Your workspace may restrict which types appear (admin configuration).

Relational and MPP

PostgreSQL, MySQL, Oracle, Amazon Redshift, Teradata, Hive.

Cloud warehouses

Snowflake, Google BigQuery.

Streaming, SaaS, and object storage

Apache Kafka, Salesforce, Amazon S3, Google Cloud Storage, Azure Blob Storage.

Lake: Apache Iceberg

Iceberg connections support catalog modes such as REST, AWS Glue, Hive Metastore, and Nessie, with storage on S3, GCS, or Azure. Pick the combination that matches your deployment.

Other

Apache Ozone.

Use Test Connection when available to validate credentials and reachability before saving.

For a single table of every native type with required / typical fields, open Database Connections and scroll to Supported database types — required / typical fields.

Data Ingestion (Airbyte-backed)

Data Ingestion uses the Airbyte connector catalog: hundreds of sources and destinations (exact pairs are catalog- and version-dependent). Common patterns include relational databases, Redshift, Kafka, Iceberg, cloud warehouses (BigQuery, Snowflake, Databricks, etc.), and object stores (S3, GCS, Azure Blob), plus many SaaS systems.

Workspace database connections often appear as ingestion sources when the connector mapping supports them—so you can reuse credentials you already manage in Dagen.

Metadata extraction

From the Database Connections UI

On each saved connection card you can:

Test Connection — verify the link is healthy.
Extract Metadata — scan schemas and objects into Dagen’s inventory for modeling, agents, and downstream features.

How agents use metadata

Agents follow a progressive disclosure pattern:

Lightweight discovery — list schemas and tables before pulling full column statistics. This keeps latency low on large catalogs.
Structured extractors — for common warehouses and databases, the platform uses optimized paths (for example Snowflake, Redshift, PostgreSQL, Oracle) tied to your connection id, so agents get reliable column and key information for modeling and SQL.
Broader extraction — deeper scans may run when you explicitly ask for profiling, lineage-style questions, or when a tool runs a full metadata extraction flow.
Downstream use — results feed Data Modeling, Database Explorer, Data Insights, Knowledge Base graph enrichment where enabled, and ingestion planning.

Practical guidance

If agents “don’t see” a new table, run Extract Metadata on the connection card first.
For Iceberg / Kafka / object storage, behaviour is type-dependent—confirm Test Connection passes before expecting rich schema cards.
For “what can I sync?” combine this native reference with the Airbyte catalog inside Data Ingestion.

Quick routing (what to tell users)

Need	Point to
OLTP / classic warehouses	PostgreSQL, MySQL, Oracle, Redshift, Hive, Teradata
Cloud DW	Snowflake, BigQuery, Redshift
Streaming	Kafka
Lake tables	Iceberg + catalog + object store
“What can I sync?”	Airbyte catalog in Data Ingestion + the native types above

Database Connections — Add and manage connections
Data Ingestion — Syncs, schedules, runtimes
Data Modeling — Schemas and AI-assisted design