Apache Paimon
Streaming data lake platform supporting high-speed ingestion and changelog tracking.
CLI
About Apache Paimon
Streaming data lake platform supporting high-speed ingestion and changelog tracking. Explore how Apache Paimon integrates with the agentic data stack ecosystem and supports autonomous data operations.
Key Features
- Real-time streaming updates via LSM-tree structure with sub-minute query latency
- Flexible merge engines (deduplicate, partial-update, aggregate, first-row)
- Unified batch and streaming read/write with automatic changelog generation
- Primary key tables for upserts and append-only tables for ordered stream reads
- Full schema evolution support (add, rename, reorder columns)
- Native Flink CDC integration for MySQL, PostgreSQL, MongoDB change capture
- Multi-engine read support (Spark, Flink, StarRocks, Doris, Hive, Trino)
- Branch and tag management for table versioning and data experimentation
Agent Integration
CLI — Flink Action JARs / Spark CALL
$ Download paimon-flink-action JAR from Maven CentralExternal Links
Apache Paimon GitHub
Main repository — lake format for Realtime Lakehouse Architecture with Flink and Spark
REST Catalog Overview
REST API documentation for Paimon's catalog service — the primary programmatic access point
PyPaimon (Python SDK)
Pure Python implementation for Paimon catalog access, reading and writing tables without JDK
Changelog Producer Docs
Documentation for changelog streaming modes (input, lookup, full-compaction)
Python API Docs
Official Python API documentation with catalog, read, and write interfaces