Designing a Resumable Large‑File Upload API for Private Enterprise
An in‑depth guide walks through the challenges of enterprise‑grade large file uploads—covering chunked transfer, resumable uploads, security, audit trails, and a complete set of RESTful endpoints with database schema, state‑machine handling, and both local and cloud storage integration for AI‑driven document processing.
Background
Private AI deployments for government and enterprise customers often need to ingest massive document collections (Word, PDF, PPT, Markdown) that can reach tens of gigabytes. Uploads occur inside LANs or completely offline environments, must not pass through public cloud storage, and require full audit trails (who uploaded what and when). A simple "single upload endpoint + cloud storage" approach fails because it cannot handle resumable uploads, cluster‑wide chunk merging, or strict security and compliance requirements.
Frontend upload techniques
Instant‑upload check (hash‑based deduplication).
Chunked upload – fixed‑size slices (e.g., 5 MB or 10 MB).
Breakpoint resume – the client records which chunks have been uploaded and only sends missing ones after a network interruption.
Concurrent chunk upload – typically 3‑5 chunks in parallel to improve throughput.
Real‑time progress display.
Backend API design
The backend provides a set of focused endpoints that match the frontend workflow.
/upload/check – Instant‑upload check
POST /api/upload/check
{
"fileHash": "md5_abc123def456",
"fileName": "training-docs.zip",
"fileSize": 5342245120
} {
"success": true,
"data": { "exists": false }
}If exists is true, the file is already stored and the client can skip the upload.
/upload/init – Initialise upload task
POST /api/upload/init
{
"fileHash": "md5_abc123def456",
"fileName": "training-docs.zip",
"totalChunks": 320,
"chunkSize": 5242880
} {
"success": true,
"data": {
"uploadId": "b4f8e3a7-1a0c-4a1d-88af-61e98d91a49b",
"uploadedChunks": []
}
}The returned uploadId uniquely identifies the upload session and is used for all subsequent calls.
/upload/chunk – Upload a single chunk
POST /api/upload/chunk
Content-Type: multipart/form-data
formData:
uploadId: b4f8e3a7-1a0c-4a1d-88af-61e98d91a49b
chunkIndex: 0
chunkSize: 5242880
chunkHash: md5_001
file: (binary data) {
"success": true,
"data": {
"uploadId": "b4f8e3a7-1a0c-4a1d-88af-61e98d91a49b",
"chunkIndex": 0,
"chunkSize": 5242880
}
}Each successful upload creates a record in upload_chunk and increments uploaded_chunks in the corresponding upload_task.
/upload/merge – Merge all chunks
POST /api/upload/merge
{
"uploadId": "b4f8e3a7-1a0c-4a1d-88af-61e98d91a49b",
"fileHash": "md5_abc123def456"
} {
"success": true,
"message": "文件合并成功",
"data": { "storagePath": "/data/uploads/training-docs.zip" }
}The server validates that all chunks are present, performs the merge (local concatenation or cloud‑side multipart‑complete), verifies the final MD5 against fileHash, and marks the task as COMPLETED.
/upload/pause – Pause a task
POST /api/upload/pause
{ "uploadId": "b4f8e3a7-1a0c-4a1d-88af-61e98d91a49b" } { "success": true, "message": "任务已暂停" }/upload/cancel – Cancel a task
POST /api/upload/cancel
{ "uploadId": "b4f8e3a7-1a0c-4a1d-88af-61e98d91a49b" } { "success": true, "message": "任务已取消" }/upload/list – List upload tasks (admin view)
GET /api/upload/list {
"success": true,
"data": [
{
"uploadId": "b4f8e3a7-1a0c-4a1d-88af-61e98d91a49b",
"fileName": "training-docs.zip",
"status": "COMPLETED",
"uploadedChunks": 320,
"totalChunks": 320,
"uploader": "admin",
"createdAt": "2025-10-20 14:30:12"
}
]
}Database schema
Three core tables store the complete lifecycle.
upload_task – one row per upload session; key fields include upload_id, file_hash, file_name, file_size, chunk_size, total_chunks, uploaded_chunks, status (0‑7), storage_type, storage_url, local_path, uploader, timestamps.
upload_chunk – one row per chunk; fields: upload_id (FK), chunk_index, chunk_size, optional chunk_hash, status (0‑2), local_path, timestamps.
file_info – final file metadata after successful merge; fields: file_hash, file_name, file_size, storage_type, storage_url, uploader, status, timestamps.
Relationships: upload_task → upload_chunk (one‑to‑many) and upload_task → file_info (one‑to‑one after merge).
Upload state machine
WAITING (0)– task created, no chunks uploaded. UPLOADING (1) – chunks are being received; uploaded_chunks is updated. PAUSED (7) – user‑initiated pause; chunks remain on disk. CANCELED (4) – user aborts; temporary files may be deleted. MERGING (2) – all chunks present, server is merging. CHUNK_MERGED (6) – merge succeeded, optional post‑processing. COMPLETED (3) – file merged, hash verified, final path stored. FAILED (5) – any error during upload, merge or verification.
Recovery workflow:
Client computes file hash and calls /upload/check.
If a file_info record exists, the client performs an instant upload.
If a task exists in upload_task, the client retrieves uploadId and queries upload_chunk for already uploaded indices.
The client resumes only the missing chunks.
Merge and integrity verification
Local merge
When storage_type=local, the server opens the target file and streams each chunk in chunk_index order. After concatenation it recomputes the MD5 and compares it with the original fileHash. A mismatch marks the task FAILED and logs an error.
Cloud merge
For object storage (OSS, COS, MinIO, etc.) the server calls the provider’s multipart‑complete API (e.g., completeMultipartUpload). The provider guarantees correct ordering and integrity; the server only records the resulting storage_url.
Cluster deployment strategies
Shared storage – all nodes write chunks to a common NFS/NAS path (e.g., /data/uploads) so any node can perform the merge.
Cloud‑side merge – chunks are stored directly in object storage; merge is performed by the cloud service.
Dedicated merge node – a scheduler assigns a specific node to pull chunks from other nodes via internal RPC and execute the merge.
Private‑cloud environments typically use the shared‑storage approach for performance and security.
Asynchronous processing & performance optimisation
In production the merge, hash verification, and downstream AI processing (document parsing, paging, vectorisation) are off‑loaded to background workers or task queues. The upload endpoints remain lightweight and return immediately after receiving a chunk, preventing front‑end time‑outs and reducing peak I/O pressure. Failed tasks can be retried automatically from the persisted database state.
Summary
The design transforms a simple single‑endpoint upload into a robust, auditable, resumable large‑file upload service suitable for private, AI‑driven enterprise environments. By delegating slicing, progress, and resume to the frontend and handling storage, verification, merging, and audit trails on the backend, the system remains extensible, cluster‑ready, and compliant with strict security policies.
Rare Earth Juejin Tech Community
Juejin, a tech community that helps developers grow.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
