Data Ingestion — User Guide

Audience: Dagen users who want to set up automated data pipelines to move data between sources and destinations using Airbyte connectors.


Overview

The Data Ingestion page (/airbyte-ingestion) lets you create, run, schedule, and monitor data ingestion pipelines. Dagen uses an Airbyte-class connector catalog (hundreds of sources and destinations). The exact connector list and image mappings are catalog- and version-dependent; common patterns include relational databases, Redshift, Kafka, Iceberg, warehouses such as BigQuery, Snowflake, and Databricks, object stores (S3, GCS, Azure Blob), Postgres, MySQL, and many SaaS systems.

Workspace database connections appear as candidate sources (and sometimes destinations) when the connector supports your connection type—reuse credentials you already manage in Dagen. For native connection types and metadata behaviour, see Supported data sources.

Feature Description
Source catalog Browse Airbyte connectors plus your workspace database connections where compatible
Destination catalog Configure where data lands—options depend on the live catalog (warehouses, lakes, DBs, object storage, etc.)
Sync monitoring Real-time progress with record counts, table-level detail, and CDC streaming
Scheduling Set up recurring syncs
Runtime selection Choose which Kubernetes cluster runs your ingestion jobs
AI Chat assistant Toggle the AI panel for help with pipeline setup
Import/Export Transfer pipeline configurations between workspaces

Page Layout

The page has four main tabs:

Tab Icon Contents
Active Ingestions Pipe Your running and configured pipelines
Connections Connection Airbyte connection objects with sync status
Sources Database import Available source connectors and your database connections
Destinations Database export Available destination connectors

Creating an Ingestion Pipeline

  1. Click Create Ingestion in the header.
  2. Walk through the multi-step dialog:
    • Basic Info — Name your pipeline.
    • Source — Pick a source (your database connections appear first, followed by Airbyte connectors).
    • Destination — Pick a destination.
    • Select Data — Choose databases, schemas, and tables to sync.
    • Configure — Set sync details and options.
  3. Click Create to finalize.

Using Your Database Connections

In the Sources tab, the Your Database Connections section shows connections already configured in your workspace. Click any connection card to use it as a source — no additional configuration needed.

Browsing the Airbyte Connector Catalog

Below your database connections, the Available Airbyte Connectors section lists external data sources and APIs. Use the search field ("Search sources...") to filter, and paginate through the catalog with the page controls at the bottom.


Selecting an Ingestion Runtime

All ingestions in a workspace use a shared runtime. To change it:

  1. Click the runtime button in the header (shows the current runtime name).
  2. In the runtime menu, select a different runtime from the list.
  3. Click Test Connection to verify the runtime is reachable.
  4. To add or edit runtimes, click Manage Runtimes (opens the Runtime Environments page).

Monitoring Active Ingestions

Each pipeline card on the Active Ingestions tab shows:

  • Pipeline name and status chip (syncing, running, error, failed, etc.).
  • Source → Destination flow with connector icons.
  • Sync Configuration Details — expand to see selected databases, schemas, tables, and destination settings.
  • Sync Statistics — last sync time, records synced, sync frequency.

During a Sync

While a pipeline is running, the card shows:

  • Status indicator and current table being synced.
  • Records synced count.
  • Table-level progress (expand for details).
  • Progress bar with percentage.

When Errors Occur

If a sync fails, the card displays the error message along with:

  • Configure CDC with AI button — opens the AI chat to help troubleshoot Change Data Capture issues.
  • Retry Creation button — retries the pipeline setup.

Pipeline Actions

From the Card Buttons

Button Action
Run Ingestion (play icon) Starts a new sync (disabled while already running)
Schedule Ingestion (calendar icon) Opens the schedule configuration
Delete Ingestion (trash icon) Deletes the pipeline (with confirmation)

From the Three-Dot Menu

Action Description
Copy Pipeline ID Copies the Airbyte pipeline ID to clipboard
Refresh Status Forces a status refresh
Stop Sync Stops a running or streaming sync
View Metadata Shows pipeline configuration details
View Runs & Logs Opens the run history dialog
Export Configuration Downloads the pipeline config as JSON
Rename Ingestion Changes the pipeline display name
Create Schedule Sets up a recurring sync schedule

Viewing Runs and Logs

  1. Click View Runs & Logs from the pipeline menu.
  2. The left panel lists all runs with timestamps, status, duration, and record counts.
  3. Click a run to see its logs in the right panel — each log line shows timestamp, level, stage, and message.

Stopping a Sync

  1. Click Stop Sync from the pipeline menu (or the three-dot menu).
  2. Confirm in the dialog. The warning explains what will happen:
    • For CDC streaming: "Stop the CDC streaming process." You can restart later.
    • For full refresh: "Cancel the current sync operation. Progress will be lost."

Import and Export

Exporting

  1. Click Import/Export in the header.
  2. Select Export All to download all pipeline configurations as JSON.

Importing

  1. Click Import/ExportImport Configurations.
  2. Select a JSON file previously exported from Dagen.
  3. The pipelines are recreated in your workspace.

Using the AI Chat Assistant

Toggle the AI Chat button in the header to open a side panel. The AI assistant can help you:

  • Choose the right source and destination connectors.
  • Configure CDC settings.
  • Troubleshoot sync failures.
  • Set up schedules.

Troubleshooting

Symptom Cause Fix
"No Active Ingestion" No pipelines created yet Click Create Ingestion to set up your first pipeline
"Sign in required" chip in header Not authenticated Click Sign In to authenticate
Pipeline stuck in "Creating" Schema discovery is taking time or the runtime is unreachable Check the runtime status; wait for discovery to complete or retry
Sync failed with connection error Source or destination is not reachable from the runtime Verify network access and credentials in the source/destination config
"Creating pipeline... (schema discovery in progress)" persists Large source with many tables Schema discovery can take minutes for large databases; wait or reduce table selection
Records show 0 after sync completes Source tables are empty or filter excludes all rows Check source data and sync configuration
Stop Sync button is disabled Pipeline is not currently running The button only appears when a sync is in progress

Natural language and AI-assisted ingestion

Besides the visual wizard, you can describe moves in plain English—in the ingestion AI Chat panel, in AI Chat with ingestion context attached, or in historical flows that expose a side chat. The agent will typically:

  1. Resolve source and destination against your connections and the Airbyte catalog.
  2. Inspect schemas and propose column / type mapping.
  3. Propose full vs incremental sync and CDC where supported.
  4. Surface schedules and runtime implications.

Example asks

Transfer all data from my PostgreSQL customers table to BigQuery dataset sales_data
Ingest orders, customers, and products from MySQL into Snowflake schema PROD_DATA
Transfer only active users from users where status='active' into BigQuery
Set up incremental ingestion for transactions based on updated_at
Schedule daily ingestion of sales at 2:00 UTC from PostgreSQL to BigQuery

Progress and status language

During runs you may see states such as Initializing → Analyzing → Transferring → Completed, plus row counts and throughput. Failed runs often include actionable errors on the pipeline card; use Configure CDC with AI or attach the ingestion job in AI Chat for remediation.


Scheduling (concepts)

Mode When to use
One-off Backfill, proof of concept, or ad hoc copy.
Recurring Hourly, daily, weekly, or custom cron (where exposed in UI).
CDC / streaming Near-real-time or log-based change capture when the connector supports it—watch for stop/restart semantics in the Stop Sync dialog.

Schedules may be created from the Schedule / Create Schedule actions on the card or during pipeline setup—labels vary by version.


Schema mapping and advanced options

  • Automatic type mapping between source and destination types.
  • Custom column mapping when names or types differ.
  • Batch sizing and performance tuning for large tables (balance source load vs throughput).
  • Notifications — pair with workflow or Slack channels for operational alerts (Slack Integration).

Best practices

  1. Pilot with a small table or date-bounded extract before full history.
  2. Prefer incremental patterns on large facts with a reliable cursor column.
  3. Monitor row counts and duration; investigate drift early.
  4. Validate counts and spot-check values after major loads.
  5. Ensure the ingestion runtime (Runtime Environments) is healthy before blaming the connector.

Optional programmatic integration

Some deployments expose REST helpers for automation (exact paths may vary by version):

Start ingestion (illustrative)

POST /api/data-ingestion/start
Content-Type: application/json

{
  "source": { "type": "postgresql", "connection_id": "conn_123", "table": "customers" },
  "destination": { "type": "bigquery", "connection_id": "conn_456", "dataset": "sales_data", "table": "customers" },
  "options": { "mode": "full", "batch_size": 10000 }
}

Status (illustrative)

GET /api/data-ingestion/status/{job_id}

Prefer External API and workspace API Keys for cross-system automation where v1 is enabled.


Security and compliance

  • Transfers use encrypted connections where drivers allow.
  • Credentials stay in the workspace secret model—not pasted into chat.
  • RBAC governs who can create, run, or delete ingestion jobs.
  • Audit trails may record operations for compliance (edition-dependent).

Route note

The primary Airbyte-style ingestion UI is commonly at /airbyte-ingestion. Older docs or deep links may mention /data-ingestion; use the path shown in your app’s sidebar if they differ.


Related surfaces (runs & troubleshooting)

Area Route (typical) Notes
Ingestion UI /airbyte-ingestion Sync stats on cards; AI side panel
Runtimes /runtime-environments Manage Runtimes from ingestion header
Job history /job-history Agent and tool traces for chat-driven fixes
AI Chat /chat Attach Data Ingestion context + error excerpts

See Building Pipelines for schedules, Git, workflows, and cross-feature troubleshooting patterns.