Apache Paimon

Streaming data lake platform supporting high-speed ingestion and changelog tracking.

CLI

About Apache Paimon

Streaming data lake platform supporting high-speed ingestion and changelog tracking. Explore how Apache Paimon integrates with the agentic data stack ecosystem and supports autonomous data operations.

Key Features

  • Real-time streaming updates via LSM-tree structure with sub-minute query latency
  • Flexible merge engines (deduplicate, partial-update, aggregate, first-row)
  • Unified batch and streaming read/write with automatic changelog generation
  • Primary key tables for upserts and append-only tables for ordered stream reads
  • Full schema evolution support (add, rename, reorder columns)
  • Native Flink CDC integration for MySQL, PostgreSQL, MongoDB change capture
  • Multi-engine read support (Spark, Flink, StarRocks, Doris, Hive, Trino)
  • Branch and tag management for table versioning and data experimentation

Agent Integration

CLIFlink Action JARs / Spark CALL

$ Download paimon-flink-action JAR from Maven Central
CLI Documentation