Hive Metastore
The foundational metadata service for Hadoop ecosystem, still widely used for table and schema management.
CLI
About Hive Metastore
The foundational metadata service for Hadoop ecosystem, still widely used for table and schema management. Explore how Hive Metastore integrates with the agentic data stack ecosystem and supports autonomous data operations.
Key Features
- Central metadata repository via Thrift interface for tables, partitions, databases, and functions
- Industry-standard metastore protocol supported by Spark, Trino, Presto, Impala
- Standalone metastore mode for running independently of the full Hive execution engine
- RDBMS-backed persistence via DataNucleus ORM (Derby, MySQL, PostgreSQL, Oracle)
- High availability through stateless architecture with multiple metastore instances
- Schema management tooling (schematool) for initialization and upgrades
- Bulk metadata operations (metatool) for NameNode migration and JDOQL queries
- Multi-catalog support (Hive 3.0+) for logical metadata separation
Agent Integration
CLI — schematool / beeline
$ Download Apache Hive binary tarball, set HIVE_HOME, run schematool -initSchemaExternal Links
Metastore Administration Guide
Official admin docs for deploying, configuring, and managing the Hive Metastore service
Apache Hive GitHub
Main source repo including standalone metastore server and Thrift API definitions
Thrift API Definition
The Thrift IDL file defining all metastore API operations
PyHive — Python Interface
Python DB-API and SQLAlchemy interface for Hive by Dropbox
AWS Glue Data Catalog Client for HMS
AWS open-source client enabling Glue Data Catalog as a drop-in HMS replacement