Apache Hudi

Streaming data lakehouse platform with incremental processing and record-level updates.

CLI

About Apache Hudi

Streaming data lakehouse platform with incremental processing and record-level updates. Explore how Apache Hudi integrates with the agentic data stack ecosystem and supports autonomous data operations.

Key Features

  • ACID transactions with record-level upserts and deletes on data lakes
  • Two table types: Copy-on-Write (read-optimized) and Merge-on-Read (write-optimized)
  • Incremental and snapshot queries plus Change Data Capture query support
  • Built-in DeltaStreamer ingestion tool for Kafka, DFS, and database CDC sources
  • Advanced indexing (bloom filters, HFile, bucket index) for fast record lookups
  • Automatic file sizing, clustering, and compaction for performance optimization
  • Multi-engine compatibility (Spark, Flink, Presto, Trino, Hive)
  • Comprehensive admin CLI with 40+ commands for table management and diagnostics

Agent Integration

CLIhudi-cli

$ Download hudi-cli-bundle JAR and hudi-cli-with-bundle.sh from Maven/GitHub
CLI Documentation