Data Ingestion — User Guide
Audience: Dagen users who want to set up automated data pipelines to move data between sources and destinations using Airbyte connectors.
Overview
The Data Ingestion page (/airbyte-ingestion) lets you create, run, schedule, and monitor data ingestion pipelines. Dagen uses an Airbyte-class connector catalog (hundreds of sources and destinations). The exact connector list and image mappings are catalog- and version-dependent; common patterns include relational databases, Redshift, Kafka, Iceberg, warehouses such as BigQuery, Snowflake, and Databricks, object stores (S3, GCS, Azure Blob), Postgres, MySQL, and many SaaS systems.
Workspace database connections appear as candidate sources (and sometimes destinations) when the connector supports your connection type—reuse credentials you already manage in Dagen. For native connection types and metadata behaviour, see Supported data sources.
| Feature | Description |
|---|---|
| Source catalog | Browse Airbyte connectors plus your workspace database connections where compatible |
| Destination catalog | Configure where data lands—options depend on the live catalog (warehouses, lakes, DBs, object storage, etc.) |
| Sync monitoring | Real-time progress with record counts, table-level detail, and CDC streaming |
| Scheduling | Set up recurring syncs |
| Runtime selection | Choose which Kubernetes cluster runs your ingestion jobs |
| AI Chat assistant | Toggle the AI panel for help with pipeline setup |
| Import/Export | Transfer pipeline configurations between workspaces |
Page Layout
The page has four main tabs:
| Tab | Icon | Contents |
|---|---|---|
| Active Ingestions | Pipe | Your running and configured pipelines |
| Connections | Connection | Airbyte connection objects with sync status |
| Sources | Database import | Available source connectors and your database connections |
| Destinations | Database export | Available destination connectors |
Creating an Ingestion Pipeline
- Click Create Ingestion in the header.
- Walk through the multi-step dialog:
- Basic Info — Name your pipeline.
- Source — Pick a source (your database connections appear first, followed by Airbyte connectors).
- Destination — Pick a destination.
- Select Data — Choose databases, schemas, and tables to sync.
- Configure — Set sync details and options.
- Click Create to finalize.
Using Your Database Connections
In the Sources tab, the Your Database Connections section shows connections already configured in your workspace. Click any connection card to use it as a source — no additional configuration needed.
Browsing the Airbyte Connector Catalog
Below your database connections, the Available Airbyte Connectors section lists external data sources and APIs. Use the search field ("Search sources...") to filter, and paginate through the catalog with the page controls at the bottom.
Selecting an Ingestion Runtime
All ingestions in a workspace use a shared runtime. To change it:
- Click the runtime button in the header (shows the current runtime name).
- In the runtime menu, select a different runtime from the list.
- Click Test Connection to verify the runtime is reachable.
- To add or edit runtimes, click Manage Runtimes (opens the Runtime Environments page).
Monitoring Active Ingestions
Each pipeline card on the Active Ingestions tab shows:
- Pipeline name and status chip (syncing, running, error, failed, etc.).
- Source → Destination flow with connector icons.
- Sync Configuration Details — expand to see selected databases, schemas, tables, and destination settings.
- Sync Statistics — last sync time, records synced, sync frequency.
During a Sync
While a pipeline is running, the card shows:
- Status indicator and current table being synced.
- Records synced count.
- Table-level progress (expand for details).
- Progress bar with percentage.
When Errors Occur
If a sync fails, the card displays the error message along with:
- Configure CDC with AI button — opens the AI chat to help troubleshoot Change Data Capture issues.
- Retry Creation button — retries the pipeline setup.
Pipeline Actions
From the Card Buttons
| Button | Action |
|---|---|
| Run Ingestion (play icon) | Starts a new sync (disabled while already running) |
| Schedule Ingestion (calendar icon) | Opens the schedule configuration |
| Delete Ingestion (trash icon) | Deletes the pipeline (with confirmation) |
From the Three-Dot Menu
| Action | Description |
|---|---|
| Copy Pipeline ID | Copies the Airbyte pipeline ID to clipboard |
| Refresh Status | Forces a status refresh |
| Stop Sync | Stops a running or streaming sync |
| View Metadata | Shows pipeline configuration details |
| View Runs & Logs | Opens the run history dialog |
| Export Configuration | Downloads the pipeline config as JSON |
| Rename Ingestion | Changes the pipeline display name |
| Create Schedule | Sets up a recurring sync schedule |
Viewing Runs and Logs
- Click View Runs & Logs from the pipeline menu.
- The left panel lists all runs with timestamps, status, duration, and record counts.
- Click a run to see its logs in the right panel — each log line shows timestamp, level, stage, and message.
Stopping a Sync
- Click Stop Sync from the pipeline menu (or the three-dot menu).
- Confirm in the dialog. The warning explains what will happen:
- For CDC streaming: "Stop the CDC streaming process." You can restart later.
- For full refresh: "Cancel the current sync operation. Progress will be lost."
Import and Export
Exporting
- Click Import/Export in the header.
- Select Export All to download all pipeline configurations as JSON.
Importing
- Click Import/Export → Import Configurations.
- Select a JSON file previously exported from Dagen.
- The pipelines are recreated in your workspace.
Using the AI Chat Assistant
Toggle the AI Chat button in the header to open a side panel. The AI assistant can help you:
- Choose the right source and destination connectors.
- Configure CDC settings.
- Troubleshoot sync failures.
- Set up schedules.
Troubleshooting
| Symptom | Cause | Fix |
|---|---|---|
| "No Active Ingestion" | No pipelines created yet | Click Create Ingestion to set up your first pipeline |
| "Sign in required" chip in header | Not authenticated | Click Sign In to authenticate |
| Pipeline stuck in "Creating" | Schema discovery is taking time or the runtime is unreachable | Check the runtime status; wait for discovery to complete or retry |
| Sync failed with connection error | Source or destination is not reachable from the runtime | Verify network access and credentials in the source/destination config |
| "Creating pipeline... (schema discovery in progress)" persists | Large source with many tables | Schema discovery can take minutes for large databases; wait or reduce table selection |
| Records show 0 after sync completes | Source tables are empty or filter excludes all rows | Check source data and sync configuration |
| Stop Sync button is disabled | Pipeline is not currently running | The button only appears when a sync is in progress |
Natural language and AI-assisted ingestion
Besides the visual wizard, you can describe moves in plain English—in the ingestion AI Chat panel, in AI Chat with ingestion context attached, or in historical flows that expose a side chat. The agent will typically:
- Resolve source and destination against your connections and the Airbyte catalog.
- Inspect schemas and propose column / type mapping.
- Propose full vs incremental sync and CDC where supported.
- Surface schedules and runtime implications.
Example asks
Transfer all data from my PostgreSQL customers table to BigQuery dataset sales_data
Ingest orders, customers, and products from MySQL into Snowflake schema PROD_DATA
Transfer only active users from users where status='active' into BigQuery
Set up incremental ingestion for transactions based on updated_at
Schedule daily ingestion of sales at 2:00 UTC from PostgreSQL to BigQuery
Progress and status language
During runs you may see states such as Initializing → Analyzing → Transferring → Completed, plus row counts and throughput. Failed runs often include actionable errors on the pipeline card; use Configure CDC with AI or attach the ingestion job in AI Chat for remediation.
Scheduling (concepts)
| Mode | When to use |
|---|---|
| One-off | Backfill, proof of concept, or ad hoc copy. |
| Recurring | Hourly, daily, weekly, or custom cron (where exposed in UI). |
| CDC / streaming | Near-real-time or log-based change capture when the connector supports it—watch for stop/restart semantics in the Stop Sync dialog. |
Schedules may be created from the Schedule / Create Schedule actions on the card or during pipeline setup—labels vary by version.
Schema mapping and advanced options
- Automatic type mapping between source and destination types.
- Custom column mapping when names or types differ.
- Batch sizing and performance tuning for large tables (balance source load vs throughput).
- Notifications — pair with workflow or Slack channels for operational alerts (Slack Integration).
Best practices
- Pilot with a small table or date-bounded extract before full history.
- Prefer incremental patterns on large facts with a reliable cursor column.
- Monitor row counts and duration; investigate drift early.
- Validate counts and spot-check values after major loads.
- Ensure the ingestion runtime (Runtime Environments) is healthy before blaming the connector.
Optional programmatic integration
Some deployments expose REST helpers for automation (exact paths may vary by version):
Start ingestion (illustrative)
POST /api/data-ingestion/start
Content-Type: application/json
{
"source": { "type": "postgresql", "connection_id": "conn_123", "table": "customers" },
"destination": { "type": "bigquery", "connection_id": "conn_456", "dataset": "sales_data", "table": "customers" },
"options": { "mode": "full", "batch_size": 10000 }
}
Status (illustrative)
GET /api/data-ingestion/status/{job_id}
Prefer External API and workspace API Keys for cross-system automation where v1 is enabled.
Security and compliance
- Transfers use encrypted connections where drivers allow.
- Credentials stay in the workspace secret model—not pasted into chat.
- RBAC governs who can create, run, or delete ingestion jobs.
- Audit trails may record operations for compliance (edition-dependent).
Route note
The primary Airbyte-style ingestion UI is commonly at /airbyte-ingestion. Older docs or deep links may mention /data-ingestion; use the path shown in your app’s sidebar if they differ.
Related surfaces (runs & troubleshooting)
| Area | Route (typical) | Notes |
|---|---|---|
| Ingestion UI | /airbyte-ingestion |
Sync stats on cards; AI side panel |
| Runtimes | /runtime-environments |
Manage Runtimes from ingestion header |
| Job history | /job-history |
Agent and tool traces for chat-driven fixes |
| AI Chat | /chat |
Attach Data Ingestion context + error excerpts |
See Building Pipelines for schedules, Git, workflows, and cross-feature troubleshooting patterns.