Resource Hub

Explore the open-source ecosystem powering the agentic data stack. Discover components, their agent integrations, and real-world patterns. Learn what makes a data component truly agent-ready in our 6-Principle Evaluation Framework.

Agent Integration Landscape

The modern data stack is evolving to support autonomous agents. Components across every layer are adding MCP servers, CLI interfaces, and agent-specific skills.

25
MCP-Enabled Components
28
CLI-Enabled Components
28
Total Components Tracked

Common Questions About Agentic Data Stack Tools

Everything you need to know about building with open-source agentic components.

We catalog 27+ open-source tools across 8 categories: Catalog Services (Unity Catalog, Apache Polaris, Gravitino), Lake Formats (Apache Iceberg, Delta Lake, Apache Hudi), SQL Engines (Trino, DuckDB, Apache Spark), Semantic Layers (Cube, dbt Semantic Layer), ETL/ELT Tools (Airbyte, dbt, Apache Flink), BI Tools (Apache Superset, Metabase), Schedulers (Apache Airflow, Dagster), and Data Agents (DatusAI, WrenAI).
Use our 6-Principle Evaluation Framework to assess tools: (1) API-First Design - programmatic access for agents, (2) Declarative Configuration - YAML/JSON over imperative code, (3) Semantic Awareness - business logic understanding, (4) Versioning & Time-Travel - rollback capabilities, (5) Observable & Explainable - audit trails for agent actions, and (6) Self-Service Ready - minimal human intervention required.
Yes, all tools in our directory are production-grade and battle-tested. Tools like Apache Iceberg, Trino, dbt, and Airbyte are used by thousands of companies including Netflix, Uber, Apple, and Airbnb. We focus on mature projects with active communities, not experimental frameworks.
Absolutely. Most tools support hybrid architectures. For example, Trino and DuckDB can query Snowflake/BigQuery directly, Apache Iceberg works with S3/GCS/ADLS, and dbt supports 30+ data platforms. You can mix cloud-native services with open-source components based on your needs.
Start with three core components: (1) Choose a lake format (Apache Iceberg or Delta Lake) for versioned storage, (2) Set up a catalog service (Unity Catalog or Apache Polaris) for metadata management, and (3) Deploy a SQL engine (Trino or DuckDB) for flexible querying. Then layer on transformation tools (dbt), orchestration (Airflow), and data agents as your needs grow.