Welcome to Dagen
Dagen introduces the agentic pipeline: data infrastructure where pipelines are intent-driven—they learn, adapt, and improve. Specialist AI agents design, build, monitor, and refine the stack from ingestion to business-ready KPIs, alongside your existing SQL, Spark, and warehouse investments.
Highlights you should not miss
| Capability | What it is | Documentation |
|---|---|---|
| Agent Intelligence | Workspace-wide instructions, skills (loaded on demand via read_skill), rules, and lessons so agents behave like your data team. |
Agent Intelligence & Skills |
| Skills | Packaged expertise with a trigger + body—saves context vs pasting huge prompts every time. | Same guide + Custom Agents & Tools |
| Git Reviews | AI review on GitHub pull requests (SQL, dbt, PySpark, YAML) with webhooks and optional auto-post. | Git Reviews (AI on PRs) |
Core design principle: intent-awareness
Every pipeline should understand why it exists: the business outcome, who consumes the data, which decisions depend on it, and what quality means for that use case—not only which tasks it runs.
Key features
Intent-driven agentic pipelines
- Describe pipelines by purpose and outcome, not only by technical steps
- A Super Agent coordinates specialist agents across the lifecycle
- Three autonomy levels in AI Chat: Guided, Semi-Autonomous, Autonomous
- Natural language for architecture goals (for example medallion or layered designs)
Specialist agent hierarchy
| Agent | Focus |
|---|---|
| Data Ingestion | Broad connector coverage, rate limits, retries, CDC |
| dbt | SQL transformations, tests, documentation aligned to intent |
| Metadata Discovery | Schema profiling, semantics, knowledge enrichment |
| Data Model Generation | Dimensional models, facts, medallion-style layouts |
| Data Cleansing | Pipeline-specific quality rules |
| Test Data Generation | Synthetic data for validation |
| Orchestration | Scheduling, coordination, monitoring |
| Spark Developer | PySpark and large-scale processing |
| Internet Search | External enrichment and public datasets |
Tri-layer memory
- Working memory — active session context and decisions in flight
- Episodic memory — structured history of runs, fixes, and outcomes
- Institutional knowledge — skills (on-demand playbooks), Knowledge Base documents, and lessons that compound over time
Configure institutional behaviour explicitly in Agent Intelligence & Skills (/agent-intelligence).
Self-healing pipelines
Schema drift awareness, quality and volume signals, freshness expectations, and remediation—with human-in-the-loop where needed—often via execution logs, Fix with Agent, and chat with pipeline or DB context. Pair with Agent Intelligence rules and lessons so repeated failure modes get automatic guardrails. Shift-left on bad SQL with Git Reviews before merge.
Architecture support
Medallion (Bronze → Silver → Gold), star schema, Data Vault 2.0, and AI/RAG-ready, semantically rich outputs.
Pipeline modernization (conceptual phases)
- Discovery and cataloging
- Intent reconstruction
- Agentic rebuild
- Activation and continuous improvement
Platform architecture (summary)
- Client experience — responsive UI with real-time updates for long-running work
- APIs — REST and streaming patterns, workspace isolation, RBAC
- Metadata and knowledge — stored platform state, semantic search, and lineage / graph capabilities for impact and discovery
- Integrations — 500+ ingestion connectors (Airbyte family), Git, dbt, Spark, Dataform, and configurable LLMs (Anthropic, OpenAI, Google, open source, and more)
Security and compliance
Architecture-aware sovereignty (including GDPR, NIS2, EU AI Act considerations), data residency tracking per pipeline, encrypted credentials with audit logging, compliance-oriented records, and SOC 2–ready postures—exact obligations depend on your edition and deployment.
Key capabilities (docs)
AI-powered agents & intelligence
- Custom Agents & Tools — Agent Builder, templates, Python tools, Knowledge Base, graph.
- Agent Intelligence & Skills — Instructions, skills (
read_skill), rules, lessons, templates, import/export (/agent-intelligence). - Git Reviews (AI on PRs) — Automated PR review on connected GitHub repos.
Data movement and systems
Supported data sources and metadata, Database Connections, Data Ingestion.
Modeling and exploration
Data Modeling, Database Explorer.
Operations
Platform Dashboard, Building Pipelines, Git Reviews, API Keys, External API.
Use cases Dagen is built for
| Use case | How Dagen helps |
|---|---|
| Lift-and-shift modernization | Discovery and cataloging, intent reconstruction, then agentic rebuild (see Pipeline modernization above on this page). |
| Medallion / layered warehouses | Natural language + Data Model + dbt agents for Bronze → Silver → Gold patterns. |
| Operational analytics | Data Insights for conversational KPIs and charts; Database Explorer for SQL. |
| Data quality at scale | Cleansing agent, tests in dbt, self-healing signals, Agent Intelligence & Skills rules and lessons. |
| Team scale-up | Workspace sharing, RBAC, Administration, Knowledge Base for tribal knowledge. |
| Automation & CI | External API, API Keys, Git Reviews, Slack. |
Documentation map (full product)
| Topic | Documentation |
|---|---|
| Sign-in, SSO | Authentication |
| DBs, warehouses, lakes, object storage | Database Connections, Supported data sources |
| GitHub / repos | Source Repositories |
| Home / overview cards, recent tasks | Platform Dashboard |
| Move data (Airbyte UI) | Data Ingestion |
| Schemas, DDL, test data | Data Modeling |
| Agent Builder, tools, Knowledge Base | Custom Agents & Tools |
| Agent Intelligence, skills, rules, lessons | Agent Intelligence & Skills |
| GitHub AI PR review | Git Reviews (AI on PRs) |
| Main agent UI | AI Chat |
| SQL browse / console | Database Explorer |
| Dashboards from chat | Data Insights |
| dbt, Spark, workflows | Building Pipelines |
| Jobs, usage, runtimes, team | Administration |
| LLM configuration | Model Settings |
| REST / A2A | External API, API Keys |
| In-app help | Magical Guide |
| Slack | Slack Integration |
| On-prem / AMI | Self-Hosted |
Getting started
- Authentication — Sign in (email, Google, GitHub).
- Connect your ecosystem — Database connections and source repositories.
- Declare intent — Describe the business purpose of your pipeline.
- Let agents build — Specialists design, build, and test.
- Choose autonomy — Guided, Semi, or Autonomous in AI Chat; iterate with self-healing and reviews.
Core concepts
- Custom Agents & Tools
- Agent Intelligence & Skills — instructions, skills, rules, lessons (
/agent-intelligence) - Data Ingestion
- Data Modeling
Features
- Platform Dashboard
- AI Chat
- Database Explorer
- Building Pipelines
- Git Reviews (AI on PRs)
- Data Insights
Guides
Reference
Deployment
Need help?
Use the Magical Guide in the application for interactive, context-aware assistance.