Data Ingestion — User Guide

Audience: Dagen users who want to set up automated data pipelines to move data between sources and destinations using Airbyte connectors.

Overview

The Data Ingestion page (/airbyte-ingestion) lets you create, run, schedule, and monitor data ingestion pipelines. Dagen uses an Airbyte-class connector catalog (hundreds of sources and destinations). The exact connector list and image mappings are catalog- and version-dependent; common patterns include relational databases, Redshift, Kafka, Iceberg, warehouses such as BigQuery, Snowflake, and Databricks, object stores (S3, GCS, Azure Blob), Postgres, MySQL, and many SaaS systems.

Workspace database connections appear as candidate sources (and sometimes destinations) when the connector supports your connection type—reuse credentials you already manage in Dagen. For native connection types and metadata behaviour, see Supported data sources.

Feature	Description
Source catalog	Browse Airbyte connectors plus your workspace database connections where compatible
Destination catalog	Configure where data lands—options depend on the live catalog (warehouses, lakes, DBs, object storage, etc.)
Sync monitoring	Real-time progress with record counts, table-level detail, and CDC streaming
Scheduling	Set up recurring syncs
Runtime selection	Choose which Kubernetes cluster runs your ingestion jobs
AI Chat assistant	Toggle the AI panel for help with pipeline setup
Import/Export	Transfer pipeline configurations between workspaces

Page Layout

The page has four main tabs:

Tab	Icon	Contents
Active Ingestions	Pipe	Your running and configured pipelines
Connections	Connection	Airbyte connection objects with sync status
Sources	Database import	Available source connectors and your database connections
Destinations	Database export	Available destination connectors

Creating an Ingestion Pipeline

Click Create Ingestion in the header.
Walk through the multi-step dialog:
- Basic Info — Name your pipeline.
- Source — Pick a source (your database connections appear first, followed by Airbyte connectors).
- Destination — Pick a destination.
- Select Data — Choose databases, schemas, and tables to sync.
- Configure — Set sync details and options.
Click Create to finalize.

Using Your Database Connections

In the Sources tab, the Your Database Connections section shows connections already configured in your workspace. Click any connection card to use it as a source — no additional configuration needed.

Browsing the Airbyte Connector Catalog

Below your database connections, the Available Airbyte Connectors section lists external data sources and APIs. Use the search field ("Search sources...") to filter, and paginate through the catalog with the page controls at the bottom.

Selecting an Ingestion Runtime

All ingestions in a workspace use a shared runtime. To change it:

Click the runtime button in the header (shows the current runtime name).
In the runtime menu, select a different runtime from the list.
Click Test Connection to verify the runtime is reachable.
To add or edit runtimes, click Manage Runtimes (opens the Runtime Environments page).

Monitoring Active Ingestions

Each pipeline card on the Active Ingestions tab shows:

Pipeline name and status chip (syncing, running, error, failed, etc.).
Source → Destination flow with connector icons.
Sync Configuration Details — expand to see selected databases, schemas, tables, and destination settings.
Sync Statistics — last sync time, records synced, sync frequency.

During a Sync

While a pipeline is running, the card shows:

Status indicator and current table being synced.
Records synced count.
Table-level progress (expand for details).
Progress bar with percentage.

When Errors Occur

If a sync fails, the card displays the error message along with:

Configure CDC with AI button — opens the AI chat to help troubleshoot Change Data Capture issues.
Retry Creation button — retries the pipeline setup.

Pipeline Actions

From the Card Buttons

Button	Action
Run Ingestion (play icon)	Starts a new sync (disabled while already running)
Schedule Ingestion (calendar icon)	Opens the schedule configuration
Delete Ingestion (trash icon)	Deletes the pipeline (with confirmation)

From the Three-Dot Menu

Action	Description
Copy Pipeline ID	Copies the Airbyte pipeline ID to clipboard
Refresh Status	Forces a status refresh
Stop Sync	Stops a running or streaming sync
View Metadata	Shows pipeline configuration details
View Runs & Logs	Opens the run history dialog
Export Configuration	Downloads the pipeline config as JSON
Rename Ingestion	Changes the pipeline display name
Create Schedule	Sets up a recurring sync schedule

Viewing Runs and Logs

Click View Runs & Logs from the pipeline menu.
The left panel lists all runs with timestamps, status, duration, and record counts.
Click a run to see its logs in the right panel — each log line shows timestamp, level, stage, and message.

Stopping a Sync

Click Stop Sync from the pipeline menu (or the three-dot menu).
Confirm in the dialog. The warning explains what will happen:
- For CDC streaming: "Stop the CDC streaming process." You can restart later.
- For full refresh: "Cancel the current sync operation. Progress will be lost."

Import and Export

Exporting

Click Import/Export in the header.
Select Export All to download all pipeline configurations as JSON.

Importing

Click Import/Export → Import Configurations.
Select a JSON file previously exported from Dagen.
The pipelines are recreated in your workspace.

Using the AI Chat Assistant

Toggle the AI Chat button in the header to open a side panel. The AI assistant can help you:

Choose the right source and destination connectors.
Configure CDC settings.
Troubleshoot sync failures.
Set up schedules.

Troubleshooting

Symptom	Cause	Fix
"No Active Ingestion"	No pipelines created yet	Click Create Ingestion to set up your first pipeline
"Sign in required" chip in header	Not authenticated	Click Sign In to authenticate
Pipeline stuck in "Creating"	Schema discovery is taking time or the runtime is unreachable	Check the runtime status; wait for discovery to complete or retry
Sync failed with connection error	Source or destination is not reachable from the runtime	Verify network access and credentials in the source/destination config
"Creating pipeline... (schema discovery in progress)" persists	Large source with many tables	Schema discovery can take minutes for large databases; wait or reduce table selection
Records show 0 after sync completes	Source tables are empty or filter excludes all rows	Check source data and sync configuration
Stop Sync button is disabled	Pipeline is not currently running	The button only appears when a sync is in progress

Natural language and AI-assisted ingestion

Besides the visual wizard, you can describe moves in plain English—in the ingestion AI Chat panel, in AI Chat with ingestion context attached, or in historical flows that expose a side chat. The agent will typically:

Resolve source and destination against your connections and the Airbyte catalog.
Inspect schemas and propose column / type mapping.
Propose full vs incremental sync and CDC where supported.
Surface schedules and runtime implications.

Example asks

Transfer all data from my PostgreSQL customers table to BigQuery dataset sales_data

Ingest orders, customers, and products from MySQL into Snowflake schema PROD_DATA

Transfer only active users from users where status='active' into BigQuery

Set up incremental ingestion for transactions based on updated_at

Schedule daily ingestion of sales at 2:00 UTC from PostgreSQL to BigQuery

Progress and status language

During runs you may see states such as Initializing → Analyzing → Transferring → Completed, plus row counts and throughput. Failed runs often include actionable errors on the pipeline card; use Configure CDC with AI or attach the ingestion job in AI Chat for remediation.

Scheduling (concepts)

Mode	When to use
One-off	Backfill, proof of concept, or ad hoc copy.
Recurring	Hourly, daily, weekly, or custom cron (where exposed in UI).
CDC / streaming	Near-real-time or log-based change capture when the connector supports it—watch for stop/restart semantics in the Stop Sync dialog.

Schedules may be created from the Schedule / Create Schedule actions on the card or during pipeline setup—labels vary by version.

Schema mapping and advanced options

Automatic type mapping between source and destination types.
Custom column mapping when names or types differ.
Batch sizing and performance tuning for large tables (balance source load vs throughput).
Notifications — pair with workflow or Slack channels for operational alerts (Slack Integration).

Best practices

Pilot with a small table or date-bounded extract before full history.
Prefer incremental patterns on large facts with a reliable cursor column.
Monitor row counts and duration; investigate drift early.
Validate counts and spot-check values after major loads.
Ensure the ingestion runtime (Runtime Environments) is healthy before blaming the connector.

Optional programmatic integration

Some deployments expose REST helpers for automation (exact paths may vary by version):

Start ingestion (illustrative)

POST /api/data-ingestion/start
Content-Type: application/json

{
  "source": { "type": "postgresql", "connection_id": "conn_123", "table": "customers" },
  "destination": { "type": "bigquery", "connection_id": "conn_456", "dataset": "sales_data", "table": "customers" },
  "options": { "mode": "full", "batch_size": 10000 }
}

Status (illustrative)

GET /api/data-ingestion/status/{job_id}

Prefer External API and workspace API Keys for cross-system automation where v1 is enabled.

Security and compliance

Transfers use encrypted connections where drivers allow.
Credentials stay in the workspace secret model—not pasted into chat.
RBAC governs who can create, run, or delete ingestion jobs.
Audit trails may record operations for compliance (edition-dependent).

Route note

The primary Airbyte-style ingestion UI is commonly at /airbyte-ingestion. Older docs or deep links may mention /data-ingestion; use the path shown in your app’s sidebar if they differ.

Related surfaces (runs & troubleshooting)

Area	Route (typical)	Notes
Ingestion UI	`/airbyte-ingestion`	Sync stats on cards; AI side panel
Runtimes	`/runtime-environments`	Manage Runtimes from ingestion header
Job history	`/job-history`	Agent and tool traces for chat-driven fixes
AI Chat	`/chat`	Attach Data Ingestion context + error excerpts

See Building Pipelines for schedules, Git, workflows, and cross-feature troubleshooting patterns.