Databases 10 min read

Understanding HBase Connection Management and Best Practices

The article explains why HBase client connections should not be pooled, describes common misuse patterns, and details how the heavyweight, thread‑safe Connection object internally manages connections to HMaster, RegionServers, and ZooKeeper, recommending a single shared Connection per application.

Big Data Technology & Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
Understanding HBase Connection Management and Best Practices

Why write this short article? Because a question in a technical chat group prompted a deeper look at HBase client connection handling.

The answer is simple yet often misunderstood: the HBase client does not require a connection pool because the Connection object already manages connections. Common mistakes include creating a connection per thread, per operation, or implementing a custom pool.

Creating an HBase connection is an expensive operation, and having too many Connection instances can cause the server to reject connections. The most effective approach is to maintain a single shared Connection for the entire application process, typically as a singleton, and close it only when the application exits.

Below is a closer look at how the Connection interface maintains connections, which differs significantly from typical JDBC connections. The source of org.apache.hadoop.hbase.client.Connection is shown:

/**
 * A cluster connection encapsulating lower level individual connections to actual servers and
 * a connection to zookeeper. Connections are instantiated through the {@link ConnectionFactory}
 * class. The lifecycle of the connection is managed by the caller, who has to {@link #close()}
 * the connection to release the resources.
 *
 * <p> The connection object contains logic to find the master, locate regions out on the cluster,
 * keeps a cache of locations and then knows how to re-calibrate after they move. The individual
 * connections to servers, meta cache, zookeeper connection, etc are all shared by the
 * {@link Table} and {@link Admin} instances obtained from this connection.
 *
 * <p> Connection creation is a heavy-weight operation. Connection implementations are thread-safe,
 * so that the client can create a connection once, and share it with different threads.
 * {@link Table} and {@link Admin} instances, on the other hand, are light-weight and are not
 * thread-safe.  Typically, a single connection per client application is instantiated and every
 * thread will obtain its own Table instance. Caching or pooling of {@link Table} and {@link Admin}
 * is not recommended.
 *
 * <p>This class replaces {@link HConnection}, which is now deprecated.
 * @see ConnectionFactory
 * @since 0.99.0
 */
@InterfaceAudience.Public
@InterfaceStability.Evolving
public interface Connection extends Abortable, Closeable {
  Configuration getConfiguration();

  Table getTable(TableName tableName) throws IOException;
  Table getTable(TableName tableName, ExecutorService pool) throws IOException;

  public BufferedMutator getBufferedMutator(TableName tableName) throws IOException;
  public BufferedMutator getBufferedMutator(BufferedMutatorParams params) throws IOException;

  public RegionLocator getRegionLocator(TableName tableName) throws IOException;

  Admin getAdmin() throws IOException;

  @Override
  public void close() throws IOException;

  boolean isClosed();
}

From the JavaDoc we can draw three conclusions:

The Connection must know how to locate the HMaster, RegionServers, and ZooKeeper, so it maintains connections to all of them.

Creating a Connection is heavyweight, but the object itself is thread‑safe.

The Table and Admin objects obtained from a Connection are lightweight and not thread‑safe, so they should be used and discarded quickly.

We usually create a connection via ConnectionFactory.createConnection(), which has several overloads. The core overload is shown below:

static Connection createConnection(final Configuration conf, final boolean managed,
      final ExecutorService pool, final User user) throws IOException {
    String className = conf.get(HConnection.HBASE_CLIENT_CONNECTION_IMPL,
      ConnectionManager.HConnectionImplementation.class.getName());
    Class<?> clazz = null;
    try {
      clazz = Class.forName(className);
    } catch (ClassNotFoundException e) {
      throw new IOException(e);
    }
    try {
      Constructor<?> constructor =
        clazz.getDeclaredConstructor(Configuration.class,
          boolean.class, ExecutorService.class, User.class);
      constructor.setAccessible(true);
      return (Connection) constructor.newInstance(conf, managed, pool, user);
    } catch (Exception e) {
      throw new IOException(e);
    }
  }

This method uses reflection to instantiate the internal class HConnectionImplementation inside ConnectionManager. The subsequent code becomes more complex, so only the key logic is shown.

The RPC client in HBase is created by RpcClientFactory.createClient() inside the constructor of HConnectionImplementation:

private RpcClient rpcClient;
this.rpcClient = RpcClientFactory.createClient(this.conf, this.clusterId, this.metrics);

The created client is a BlockingRpcClient, a subclass of the abstract AbstractRpcClient. AbstractRpcClient uses a PoolMap structure to map ConnectionId to actual connection objects:

protected final PoolMap<ConnectionId, T> connections;
this.connections = new PoolMap<>(getPoolType(conf), getPoolSize(conf));

The pool type is determined by the configuration key hbase.client.ipc.pool.type, which can be RoundRobinPool, ThreadLocalPool, or ReusablePool (default RoundRobinPool). The pool size is set by hbase.client.ipc.pool.size, defaulting to 1. ConnectionId is not a simple identifier; it packages the server address, user ticket, and service name:

public ConnectionId(User ticket, String serviceName, InetSocketAddress address) {
  this.address = address;
  this.ticket = ticket;
  this.serviceName = serviceName;
}

The method AbstractRpcClient.getConnection() retrieves or creates a connection for a given ConnectionId:

private T getConnection(ConnectionId remoteId) throws IOException {
  if (failedServers.isFailedServer(remoteId.getAddress())) {
    if (LOG.isDebugEnabled()) {
      LOG.debug("Not trying to connect to " + remoteId.address + " this server is in the failed servers list");
    }
    throw new FailedServerException("This server is in the failed servers list: " + remoteId.address);
  }
  T conn;
  synchronized (connections) {
    if (!running) {
      throw new StoppedRpcClientException();
    }
    conn = connections.get(remoteId);
    if (conn == null) {
      conn = createConnection(remoteId);
      connections.put(remoteId, conn);
    }
    conn.setLastTouched(EnvironmentEdgeManager.currentTime());
  }
  return conn;
}

This method checks whether a connection for the given ID already exists in the connections map; if not, it creates a new one (a BlockingRpcConnection), stores it, and returns it. Thus, the Connection object indeed maintains all underlying connections.

From the analysis above, we can conclude that a single Connection instance is sufficient for an application, as it internally handles pooling and reuse of the necessary network resources.

Connection diagram
Connection diagram

— THE END —

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

JavaBig DatadatabaseHBaseclientconnection
Big Data Technology & Architecture
Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.