Cloud Native 14 min read

Build a Decoupled Storage‑Compute Data Platform with StarRocks and MinIO

This step‑by‑step tutorial shows how to deploy StarRocks and MinIO in a decoupled storage‑compute architecture using Docker Compose and Kubernetes, configure local caching, create storage volumes, load public datasets, and run SQL queries to explore the combined data.

StarRocks
StarRocks
StarRocks
Build a Decoupled Storage‑Compute Data Platform with StarRocks and MinIO

Overview

The decoupled storage‑compute ("存算分离") architecture separates compute and storage, allowing independent scaling, lower costs, and better resource utilization. StarRocks 3.0+ supports this model, and MinIO provides an open‑source, S3‑compatible object store that can be used for both local testing and private deployments.

Advantages of Decoupled Architecture

Cost control : Scale compute and storage independently.

Flexible deployment : Mix and match compute and storage systems.

Elastic scaling : Add or remove nodes as workload changes.

Query performance : Local cache reduces remote storage latency.

Maintainability : Workloads can be moved between resource pools.

Clear resource isolation : Supports multi‑tenant scenarios.

Prerequisites

curl : Download YAML and data files.

Docker Compose : Install via Docker Desktop (includes Docker Engine and Compose).

Verify installation: docker compose version SQL client : DBeaver or MySQL CLI.

The MySQL instance in this tutorial can be accessed via the MySQL CLI; a GUI client is optional.

Quick Start Steps

Create work directory and download Docker Compose file

mkdir sr-quickstart
cd sr-quickstart
curl -O https://raw.githubusercontent.com/StarRocks/demo/master/documentation-samples/quickstart/docker-compose.yml

Start containers in background docker compose up -d Configure MinIO Access the MinIO console at http://localhost:9001/access-keys (default credentials minioadmin:minioadmin ) and create an access key.

Connect SQL client (DBeaver example) Host: localhost , Port: 9030 , User: root .

Or use MySQL CLI inside the starrocks-fe container

docker compose exec starrocks-fe mysql -P9030 -h127.0.0.1 -uroot --prompt="StarRocks > "

Create storage volume in StarRocks

CREATE STORAGE VOLUME shared
TYPE = S3
LOCATIONS = ("s3://starrocks/shared/")
PROPERTIES (
  "enabled" = "true",
  "aws.s3.endpoint" = "http://minio:9000",
  "aws.s3.use_aws_sdk_default_behavior" = "false",
  "aws.s3.enable_ssl" = "false",
  "aws.s3.use_instance_profile" = "false",
  "aws.s3.access_key" = "{your Access Key}",
  "aws.s3.secret_key" = "{your Secret Key}"
);
SET shared AS DEFAULT STORAGE VOLUME;

Download sample datasets

curl -O https://raw.githubusercontent.com/StarRocks/demo/master/documentation-samples/quickstart/datasets/NYPD_Crash_Data.csv
curl -O https://raw.githubusercontent.com/StarRocks/demo/master/documentation-samples/quickstart/datasets/72505394728.csv

Create database and tables

CREATE DATABASE IF NOT EXISTS quickstart;
USE quickstart;
CREATE TABLE IF NOT EXISTS crashdata (
  CRASH_DATE DATETIME,
  BOROUGH STRING,
  ZIP_CODE STRING,
  LATITUDE INT,
  LONGITUDE INT,
  LOCATION STRING,
  ON_STREET_NAME STRING,
  CROSS_STREET_NAME STRING,
  OFF_STREET_NAME STRING,
  CONTRIBUTING_FACTOR_VEHICLE_1 STRING,
  CONTRIBUTING_FACTOR_VEHICLE_2 STRING,
  COLLISION_ID INT,
  VEHICLE_TYPE_CODE_1 STRING,
  VEHICLE_TYPE_CODE_2 STRING
);
CREATE TABLE IF NOT EXISTS weatherdata (
  DATE DATETIME,
  NAME STRING,
  HourlyDewPointTemperature STRING,
  HourlyDryBulbTemperature STRING,
  HourlyPrecipitation STRING,
  HourlyPresentWeatherType STRING,
  HourlyPressureChange STRING,
  HourlyPressureTendency STRING,
  HourlyRelativeHumidity STRING,
  HourlySkyConditions STRING,
  HourlyVisibility STRING,
  HourlyWetBulbTemperature STRING,
  HourlyWindDirection STRING,
  HourlyWindGustSpeed STRING,
  HourlyWindSpeed STRING
);

Load data into tables via StarRocks stream load

curl --location-trusted -u root \
    -T ./NYPD_Crash_Data.csv \
    -H "label:crashdata-0" \
    -H "column_separator:," \
    -H "skip_header:1" \
    -H "enclose:\"" \
    -H "max_filter_ratio:1" \
    -H "columns:tmp_CRASH_DATE, tmp_CRASH_TIME, CRASH_DATE=str_to_date(concat_ws(' ', tmp_CRASH_DATE, tmp_CRASH_TIME), '%m/%d/%Y %H:%i'),BOROUGH,ZIP_CODE,LATITUDE,LONGITUDE,LOCATION,ON_STREET_NAME,CROSS_STREET_NAME,OFF_STREET_NAME,NUMBER_OF_PERSONS_INJURED,NUMBER_OF_PERSONS_KILLED,NUMBER_OF_PEDESTRIANS_INJURED,NUMBER_OF_PEDESTRIANS_KILLED,NUMBER_OF_CYCLIST_INJURED,NUMBER_OF_CYCLIST_KILLED,NUMBER_OF_MOTORIST_INJURED,NUMBER_OF_MOTORIST_KILLED,CONTRIBUTING_FACTOR_VEHICLE_1,CONTRIBUTING_FACTOR_VEHICLE_2,CONTRIBUTING_FACTOR_VEHICLE_3,CONTRIBUTING_FACTOR_VEHICLE_4,CONTRIBUTING_FACTOR_VEHICLE_5,COLLISION_ID,VEHICLE_TYPE_CODE_1,VEHICLE_TYPE_CODE_2,VEHICLE_TYPE_CODE_3,VEHICLE_TYPE_CODE_4,VEHICLE_TYPE_CODE_5" \
    -XPUT http://localhost:8030/api/quickstart/crashdata/_stream_load
curl --location-trusted -u root \
    -T ./72505394728.csv \
    -H "label:weather-0" \
    -H "column_separator:," \
    -H "skip_header:1" \
    -H "enclose:\"" \
    -H "max_filter_ratio:1" \
    -H "columns: STATION, DATE, LATITUDE, LONGITUDE, ELEVATION, NAME, REPORT_TYPE, SOURCE, HourlyAltimeterSetting, HourlyDewPointTemperature, HourlyDryBulbTemperature, HourlyPrecipitation, HourlyPresentWeatherType, HourlyPressureChange, HourlyPressureTendency, HourlyRelativeHumidity, HourlySkyConditions, HourlySeaLevelPressure, HourlyStationPressure, HourlyVisibility, HourlyWetBulbTemperature, HourlyWindDirection, HourlyWindGustSpeed, HourlyWindSpeed, Sunrise, Sunset, DailyAverageDewPointTemperature, DailyAverageDryBulbTemperature, DailyAverageRelativeHumidity, DailyAverageSeaLevelPressure, DailyAverageStationPressure, DailyAverageWetBulbTemperature, DailyAverageWindSpeed, DailyCoolingDegreeDays, DailyDepartureFromNormalAverageTemperature, DailyHeatingDegreeDays, DailyMaximumDryBulbTemperature, DailyMinimumDryBulbTemperature, DailyPeakWindDirection, DailyPeakWindSpeed, DailyPrecipitation, DailySnowDepth, DailySnowfall, DailySustainedWindDirection, DailySustainedWindSpeed, DailyWeather, MonthlyAverageRH, MonthlyDaysWithGT001Precip, MonthlyDaysWithGT010Precip, MonthlyDaysWithGT32Temp, MonthlyDaysWithGT90Temp, MonthlyDaysWithLT0Temp, MonthlyDaysWithLT32Temp, MonthlyDepartureFromNormalAverageTemperature, MonthlyDepartureFromNormalCoolingDegreeDays, MonthlyDepartureFromNormalHeatingDegreeDays, MonthlyDepartureFromNormalMaximumTemperature, MonthlyDepartureFromNormalMinimumTemperature, MonthlyDepartureFromNormalPrecipitation, MonthlyDewpointTemperature, MonthlyGreatestPrecip, MonthlyGreatestPrecipDate, MonthlyGreatestSnowDepth, MonthlyGreatestSnowDepthDate, MonthlyGreatestSnowfall, MonthlyGreatestSnowfallDate, MonthlyMaxSeaLevelPressureValue, MonthlyMaxSeaLevelPressureValueDate, MonthlyMaxSeaLevelPressureValueTime, MonthlyMaximumTemperature, MonthlyMeanTemperature, MonthlyMinSeaLevelPressureValue, MonthlyMinSeaLevelPressureValueDate, MonthlyMinSeaLevelPressureValueTime, MonthlyMinimumTemperature, MonthlySeaLevelPressure, MonthlyStationPressure, MonthlyTotalLiquidPrecipitation, MonthlyTotalSnowfall, MonthlyWetBulb, AWND, CDSD, CLDD, DSNW, HDSD, HTDD, NormalsCoolingDegreeDay, NormalsHeatingDegreeDay, ShortDurationEndDate005, ShortDurationEndDate010, ShortDurationEndDate015, ShortDurationEndDate020, ShortDurationEndDate030, ShortDurationEndDate045, ShortDurationEndDate060, ShortDurationEndDate080, ShortDurationEndDate100, ShortDurationEndDate120, ShortDurationEndDate150, ShortDurationEndDate180, ShortDurationPrecipitationValue005, ShortDurationPrecipitationValue010, ShortDurationPrecipitationValue015, ShortDurationPrecipitationValue020, ShortDurationPrecipitationValue030, ShortDurationPrecipitationValue045, ShortDurationPrecipitationValue060, ShortDurationPrecipitationValue080, ShortDurationPrecipitationValue100, ShortDurationPrecipitationValue120, ShortDurationPrecipitationValue150, ShortDurationPrecipitationValue180, REM, BackupDirection, BackupDistance, BackupDistanceUnit, BackupElements, BackupElevation, BackupEquipment, BackupLatitude, BackupLongitude, BackupName, WindEquipmentChangeDate" \
    -XPUT http://localhost:8030/api/quickstart/weatherdata/_stream_load

Query the loaded data

SELECT COUNT(DISTINCT c.COLLISION_ID) AS Crashes,
       TRUNCATE(AVG(w.HourlyDryBulbTemperature), 1) AS Temp_F,
       MAX(w.HourlyPrecipitation) AS Precipitation,
       DATE_FORMAT(c.CRASH_DATE, '%d %b %Y %H:00') AS Hour
FROM crashdata c
LEFT JOIN weatherdata w
  ON DATE_FORMAT(c.CRASH_DATE, '%Y-%m-%d %H:00:00') = DATE_FORMAT(w.DATE, '%Y-%m-%d %H:00:00')
WHERE DAYOFWEEK(c.CRASH_DATE) BETWEEN 2 AND 6
GROUP BY Hour
ORDER BY Crashes DESC
LIMIT 200;

Conclusion

Integrating StarRocks with MinIO provides a flexible, scalable, and cost‑effective data platform. Decoupling compute from storage enables independent scaling, improves performance, simplifies operations, and offers clear resource isolation for multi‑tenant workloads, making it a solid foundation for modern cloud‑native analytics.

SQLKubernetesStarRocksMinIOObject StorageData LakehouseDocker ComposeDecoupled Storage
StarRocks
Written by

StarRocks

StarRocks is an open‑source project under the Linux Foundation, focused on building a high‑performance, scalable analytical database that enables enterprises to create an efficient, unified lake‑house paradigm. It is widely used across many industries worldwide, helping numerous companies enhance their data analytics capabilities.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.