Welcome to Dagen

Dagen introduces the agentic pipeline: data infrastructure where pipelines are intent-driven—they learn, adapt, and improve. Specialist AI agents design, build, monitor, and refine the stack from ingestion to business-ready KPIs, alongside your existing SQL, Spark, and warehouse investments.

Highlights you should not miss

Capability What it is Documentation
Agent Intelligence Workspace-wide instructions, skills (loaded on demand via read_skill), rules, and lessons so agents behave like your data team. Agent Intelligence & Skills
Skills Packaged expertise with a trigger + body—saves context vs pasting huge prompts every time. Same guide + Custom Agents & Tools
Git Reviews AI review on GitHub pull requests (SQL, dbt, PySpark, YAML) with webhooks and optional auto-post. Git Reviews (AI on PRs)

Core design principle: intent-awareness

Every pipeline should understand why it exists: the business outcome, who consumes the data, which decisions depend on it, and what quality means for that use case—not only which tasks it runs.


Key features

Intent-driven agentic pipelines

  • Describe pipelines by purpose and outcome, not only by technical steps
  • A Super Agent coordinates specialist agents across the lifecycle
  • Three autonomy levels in AI Chat: Guided, Semi-Autonomous, Autonomous
  • Natural language for architecture goals (for example medallion or layered designs)

Specialist agent hierarchy

Agent Focus
Data Ingestion Broad connector coverage, rate limits, retries, CDC
dbt SQL transformations, tests, documentation aligned to intent
Metadata Discovery Schema profiling, semantics, knowledge enrichment
Data Model Generation Dimensional models, facts, medallion-style layouts
Data Cleansing Pipeline-specific quality rules
Test Data Generation Synthetic data for validation
Orchestration Scheduling, coordination, monitoring
Spark Developer PySpark and large-scale processing
Internet Search External enrichment and public datasets

Tri-layer memory

  • Working memory — active session context and decisions in flight
  • Episodic memory — structured history of runs, fixes, and outcomes
  • Institutional knowledgeskills (on-demand playbooks), Knowledge Base documents, and lessons that compound over time

Configure institutional behaviour explicitly in Agent Intelligence & Skills (/agent-intelligence).

Self-healing pipelines

Schema drift awareness, quality and volume signals, freshness expectations, and remediation—with human-in-the-loop where needed—often via execution logs, Fix with Agent, and chat with pipeline or DB context. Pair with Agent Intelligence rules and lessons so repeated failure modes get automatic guardrails. Shift-left on bad SQL with Git Reviews before merge.

Architecture support

Medallion (Bronze → Silver → Gold), star schema, Data Vault 2.0, and AI/RAG-ready, semantically rich outputs.

Pipeline modernization (conceptual phases)

  1. Discovery and cataloging
  2. Intent reconstruction
  3. Agentic rebuild
  4. Activation and continuous improvement

Platform architecture (summary)

  • Client experience — responsive UI with real-time updates for long-running work
  • APIs — REST and streaming patterns, workspace isolation, RBAC
  • Metadata and knowledge — stored platform state, semantic search, and lineage / graph capabilities for impact and discovery
  • Integrations — 500+ ingestion connectors (Airbyte family), Git, dbt, Spark, Dataform, and configurable LLMs (Anthropic, OpenAI, Google, open source, and more)

Security and compliance

Architecture-aware sovereignty (including GDPR, NIS2, EU AI Act considerations), data residency tracking per pipeline, encrypted credentials with audit logging, compliance-oriented records, and SOC 2–ready postures—exact obligations depend on your edition and deployment.


Key capabilities (docs)

AI-powered agents & intelligence

Data movement and systems

Supported data sources and metadata, Database Connections, Data Ingestion.

Modeling and exploration

Data Modeling, Database Explorer.

Operations

Platform Dashboard, Building Pipelines, Git Reviews, API Keys, External API.


Use cases Dagen is built for

Use case How Dagen helps
Lift-and-shift modernization Discovery and cataloging, intent reconstruction, then agentic rebuild (see Pipeline modernization above on this page).
Medallion / layered warehouses Natural language + Data Model + dbt agents for Bronze → Silver → Gold patterns.
Operational analytics Data Insights for conversational KPIs and charts; Database Explorer for SQL.
Data quality at scale Cleansing agent, tests in dbt, self-healing signals, Agent Intelligence & Skills rules and lessons.
Team scale-up Workspace sharing, RBAC, Administration, Knowledge Base for tribal knowledge.
Automation & CI External API, API Keys, Git Reviews, Slack.

Documentation map (full product)

Topic Documentation
Sign-in, SSO Authentication
DBs, warehouses, lakes, object storage Database Connections, Supported data sources
GitHub / repos Source Repositories
Home / overview cards, recent tasks Platform Dashboard
Move data (Airbyte UI) Data Ingestion
Schemas, DDL, test data Data Modeling
Agent Builder, tools, Knowledge Base Custom Agents & Tools
Agent Intelligence, skills, rules, lessons Agent Intelligence & Skills
GitHub AI PR review Git Reviews (AI on PRs)
Main agent UI AI Chat
SQL browse / console Database Explorer
Dashboards from chat Data Insights
dbt, Spark, workflows Building Pipelines
Jobs, usage, runtimes, team Administration
LLM configuration Model Settings
REST / A2A External API, API Keys
In-app help Magical Guide
Slack Slack Integration
On-prem / AMI Self-Hosted

Getting started

  1. Authentication — Sign in (email, Google, GitHub).
  2. Connect your ecosystemDatabase connections and source repositories.
  3. Declare intent — Describe the business purpose of your pipeline.
  4. Let agents build — Specialists design, build, and test.
  5. Choose autonomy — Guided, Semi, or Autonomous in AI Chat; iterate with self-healing and reviews.

Core concepts


Features


Guides


Reference


Deployment


Need help?

Use the Magical Guide in the application for interactive, context-aware assistance.