Amoro Lakehouse Management System: Deployment Practices and AWS Integration for Apache Iceberg
This article introduces Amoro, a lakehouse management platform built on Apache Iceberg, explains why Webex adopted it to overcome Hive limitations, details its AWS GlueCatalog and S3 integration with DynamoDB lock management, and provides step‑by‑step Helm‑based deployment instructions on Kubernetes.
Amoro is a lakehouse management system built on open‑source table formats such as Apache Iceberg, offering plug‑in data self‑optimization mechanisms and management services for an out‑of‑the‑box lakehouse experience.
Author : Bai Xu, Software Engineer at Cisco WebEx Data Platform, responsible for lakehouse‑integrated development and optimization.
Why choose Amoro
Webex originally used Hive for storage, but Hive’s format caused inefficient data correction, back‑tracking, and high maintenance overhead. Migrating to Apache Iceberg reduced operational costs and improved core business efficiency. However, Iceberg V2 introduced row‑level updates that caused performance issues due to Merge‑on‑Read (MOR) when many delete files accumulated, making query latency unacceptable.
Initial attempts to merge small files using Spark compaction procedures resulted in high resource consumption (over 40 cores and 300 GB memory per job), long execution times, low fault tolerance, and difficult maintenance when a single table failure halted the entire job.
Amoro addresses these pain points by registering an external Flink optimizer, pulling optimization tasks from the Amoro Management Service (AMS), and enabling snapshot expiration and data expiration to reduce storage pressure.
Benefits of Amoro
Higher resource utilization: Flink optimizer reduces resource usage by about 70% compared to Spark.
Improved fault tolerance: Failed optimization tasks are automatically retried on the next scan.
Timeliness: Continuous compaction keeps Iceberg query performance within a controllable range.
Self‑management: Optimization can be toggled per table via table properties.
Visualization: WebUI displays optimization status and table metadata.
Usage in Webex
Amoro has been deployed across multiple data centers and clusters (up to seven data centers), in both Hadoop and AWS environments, managing over 1,000 Iceberg tables.
Amoro on AWS
Key challenges include integrating Iceberg with AWS services (Catalog and FileSystem) and adapting AMS. The migration switched from HiveCatalog to GlueCatalog and from HDFS to S3, leveraging S3’s fine‑grained IAM permissions and eliminating hardware maintenance costs.
GlueCatalog reduces the need for a separate Hive Metastore service and MySQL metadata storage, avoiding issues such as MySQL connection limits.
LockManager
Because S3 does not provide atomic write locks, Iceberg uses DynamoDBLockManager to ensure metadata consistency. The lock acquisition workflow involves attempting to acquire a lock, retrying on contention, writing a new metadata‑v2.json , and releasing the lock after a successful commit.
The DynamoDB lock table stores entries such as:
Primary Key
Attributes
pda.orders
Lock Entity ID: pda.orders
Lease Duration (ms): 15000
Version: d3b9b4ec-6c02-4e7e-9570-927ba1bafa67
Lock Owner ID: s3://wap-bucket/orders/metadata/d3b9b4ec-6c02-4e7e-9570-927ba1bafa67-metadata.json
pda.customers
Lock Entity ID: pda.customers
Lease Duration (ms): 15000
Version: 0f50e24d-e7da-4c8b-aa4b-1b95a50c7f38
Lock Owner ID: s3://wap-bucket/customers/metadata/0f50e24d-e7da-4c8b-aa4b-1b95a50c7f38-metadata.json
pda.products
Lock Entity ID: pda.products
Lease Duration (ms): 15000
Version: 2dab53a2-7c63-4b95-8fe1-567f73e58d6c
Lock Owner ID: s3://wap-bucket/products/metadata/2dab53a2-7c63-4b95-8fe1-567f73e58d6c-metadata.json
Using DynamoDB for lock management avoids stale locks that can block Spark jobs in Hive Metastore.
Permission Control
AWS IAM accounts can grant fine‑grained permissions to S3, Glue, and DynamoDB. Teams receive dedicated IAM accounts, and Kubernetes namespaces are used to isolate IAM credentials, enabling table‑level access control.
S3 Intelligent‑Tiering
Setting the Iceberg storage‑class to S3 Intelligent‑Tiering automatically moves objects between frequent, infrequent, and archive access tiers, reducing storage costs by up to 68%.
AMS AWS Adaptations
AMS was adapted to run on AWS by using a custom catalog that creates a GlueCatalog and refactoring Arctic’s FileIO to support object storage. Future versions will expose GlueCatalog as a distinct catalog type with IAM configuration.
Credential Management
Credentials are supplied via the DefaultAWSCredentialsProviderChain environment variables in Kubernetes pods. Example snippet:
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app.kubernetes.io/name: ams
name: ams
spec:
replicas: {{ .Values.replicas }}
template:
spec:
containers:
- env:
- name: AWS_ACCESS_KEY_ID
value: AKIXXXXXXXXXXXXXXXX
- name: AWS_SECRET_ACCESS_KEY
value: fjHyrM1wTJ8CLP13+GU1bCGG1RGlL1xT1lXyBb11
image: {{ include "udp.amoro.image.fullname" .}}
...IAM Roles for Service Accounts (IRSA) can replace static keys, providing secure, token‑based authentication.
Deployment Practice
The article demonstrates deploying Amoro 0.6.0 using Helm charts on Kubernetes. The process includes building Docker images:
mvn clean install -DskipTests -am -e -pl dist docker build docker/ams/ --platform amd64 -t xxx/amoro && docker push xxx/amoroHelm templates define helpers, pod mounts, volumes, Deployments, Services, ServiceAccounts, Secrets, Ingress, and PodMonitors. Example helper definition:
{{- define "udp.amoro.image.fullname" -}}
{{ .Values.image.repository }}/{{ .Values.image.component }}:{{ .Values.image.tag | default .Chart.AppVersion }}
{{- end -}}Deploy with:
helm upgrade --install amoro ./ --namespace amoroAdditional configuration includes registering the GlueCatalog, setting warehouse , lock‑impl , lock.table , and client.credentials-provider for IRSA.
Future Plans
Incremental SORT/ZORDER for data skipping and clustering.
Enhanced monitoring and alerting for table health and optimization latency.
Develop a Kubernetes‑native optimizer to replace the external Flink optimizer, improving resource elasticity.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.