Hive Metastore

The foundational metadata service for Hadoop ecosystem, still widely used for table and schema management.

CLI

About Hive Metastore

The foundational metadata service for Hadoop ecosystem, still widely used for table and schema management. Explore how Hive Metastore integrates with the agentic data stack ecosystem and supports autonomous data operations.

Key Features

Central metadata repository via Thrift interface for tables, partitions, databases, and functions
Industry-standard metastore protocol supported by Spark, Trino, Presto, Impala
Standalone metastore mode for running independently of the full Hive execution engine
RDBMS-backed persistence via DataNucleus ORM (Derby, MySQL, PostgreSQL, Oracle)
High availability through stateless architecture with multiple metastore instances
Schema management tooling (schematool) for initialization and upgrades
Bulk metadata operations (metatool) for NameNode migration and JDOQL queries
Multi-catalog support (Hive 3.0+) for logical metadata separation

Agent Integration

CLI — schematool / beeline

$ Download Apache Hive binary tarball, set HIVE_HOME, run schematool -initSchema

CLI Documentation

External Links

Metastore Administration Guide

Official admin docs for deploying, configuring, and managing the Hive Metastore service

Apache Hive GitHub

Main source repo including standalone metastore server and Thrift API definitions

Thrift API Definition

The Thrift IDL file defining all metastore API operations

PyHive — Python Interface

Python DB-API and SQLAlchemy interface for Hive by Dropbox

AWS Glue Data Catalog Client for HMS

AWS open-source client enabling Glue Data Catalog as a drop-in HMS replacement

← Back to Catalog Service