Apache Paimon

Streaming data lake platform supporting high-speed ingestion and changelog tracking.

CLI

About Apache Paimon

Streaming data lake platform supporting high-speed ingestion and changelog tracking. Explore how Apache Paimon integrates with the agentic data stack ecosystem and supports autonomous data operations.

Key Features

Real-time streaming updates via LSM-tree structure with sub-minute query latency
Flexible merge engines (deduplicate, partial-update, aggregate, first-row)
Unified batch and streaming read/write with automatic changelog generation
Primary key tables for upserts and append-only tables for ordered stream reads
Full schema evolution support (add, rename, reorder columns)
Native Flink CDC integration for MySQL, PostgreSQL, MongoDB change capture
Multi-engine read support (Spark, Flink, StarRocks, Doris, Hive, Trino)
Branch and tag management for table versioning and data experimentation

Agent Integration

CLI — Flink Action JARs / Spark CALL

$ Download paimon-flink-action JAR from Maven Central

CLI Documentation

External Links

Apache Paimon GitHub

Main repository — lake format for Realtime Lakehouse Architecture with Flink and Spark

REST Catalog Overview

REST API documentation for Paimon's catalog service — the primary programmatic access point

PyPaimon (Python SDK)

Pure Python implementation for Paimon catalog access, reading and writing tables without JDK

Changelog Producer Docs

Documentation for changelog streaming modes (input, lookup, full-compaction)

Python API Docs

Official Python API documentation with catalog, read, and write interfaces

← Back to Lake Format