Apache Iceberg

Open table format for huge analytic datasets with agent-friendly branching and time travel.

MCPCLI

About Apache Iceberg

Open table format for huge analytic datasets with agent-friendly branching and time travel. Explore how Apache Iceberg integrates with the agentic data stack ecosystem and supports autonomous data operations.

Key Features

ACID transactions on cloud object storage ensuring data integrity
Schema evolution without table rewrites (add, rename, reorder columns)
Hidden partitioning that decouples physical layout from query filters
Time travel and version rollback for reproducible queries and recovery
Multi-engine compatibility (Spark, Trino, Flink, Presto, Hive, Impala)
Partition evolution allowing partition schemes to change without rewriting data
Scalable metadata handling for tables with tens of petabytes of data
File-level statistics enabling query engines to skip irrelevant data files

Agent Integration

MCP Server

cloudera/iceberg-mcp-server

CLI — pyiceberg

$ pip install "pyiceberg[cli]"

CLI Documentation

External Links

Cloudera Iceberg MCP Server

MCP server providing read-only access to Iceberg tables via Impala with LangChain/OpenAI SDK integration

REST Catalog Spec

Official REST API specification for Iceberg catalogs — the standardized API agents use

PyIceberg (Python SDK)

Native Python library for programmatic access to Iceberg table metadata and data, no Spark/JVM required

Awesome Apache Iceberg

Curated list of Apache Iceberg resources, tools, and ecosystem projects

Branching and Tagging Docs

Documentation for git-like branching, tagging, and fast-forward merge on Iceberg tables

← Back to Lake Format