Frontend Development 21 min read

Practical Implementation and Optimization of Large File Chunked Upload (Frontend Part)

This article presents a complete frontend solution for large file chunked uploading, covering requirement analysis, slice implementation, hash calculation with Web Workers, MerkleTree-based file identification, a PromisePool for concurrent uploads, real‑time progress tracking, and performance optimizations with detailed code examples.

Rare Earth Juejin Tech Community

Nov 16, 2024

Practical Implementation and Optimization of Large File Chunked Upload (Frontend Part)

The article starts with a requirement analysis for a large‑file upload system, describing features such as instant‑transfer (秒传) when a file already exists, and breakpoint‑resume for interrupted uploads.

Key optimizations include a progress bar based on real upload progress, a PromisePool that limits the number of concurrent promises, and a PromisePool ‑based WorkerPool that reuses Web Workers to compute chunk hashes in parallel, achieving a 6‑7× speedup.

/**
 * 分割文件
 * @param file
 * @param baseSize 默认分块大小为 1MB
 */
function sliceFile(file: File, baseSize = 1): Blob[] {
  const chunkSize = baseSize * 1024 * 1024; // KB
  const chunks: Blob[] = [];
  let startPos = 0;
  while (startPos < file.size) {
    chunks.push(file.slice(startPos, startPos + chunkSize));
    startPos += chunkSize;
  }
  return chunks;
}

Two approaches for converting Blob to ArrayBuffer are provided, one using FileReader and the other using the native blob.arrayBuffer() method.

Hash calculation is off‑loaded to Web Workers. Example of an MD5 worker:

/// <reference lib="webworker" />
import { WorkerMessage } from './util/worker-message';
import { WorkerLabelsEnum } from './types/worker-labels.enum';
import SparkMD5 from 'spark-md5';
addEventListener('message', ({ data }: { data: ArrayBuffer }) => {
  const hash = SparkMD5.ArrayBuffer.hash(data);
  postMessage(new WorkerMessage(WorkerLabelsEnum.DONE, { result: hash, chunk: data }), [data]);
});

A WorkerPool manages a pool of WorkerWrapper instances, tracks running workers with an RxJS BehaviorSubject, and distributes hash tasks while respecting the maximum worker count.

The article also introduces a MerkleTree implementation to derive a fast file‑level hash from chunk hashes, reducing the need to hash the entire file.

Concurrent chunk uploading is controlled by a custom PromisePool that ensures only a limited number of HTTP requests are pending at any time, preventing overload of network connections.

Real‑time upload progress is calculated by aggregating the uploaded bytes of each active request and updating the UI at a throttled interval.

Performance tests on a Ryzen9 5900HX show that an 8‑thread worker pool speeds up MD5 calculation by 670 % compared to single‑thread execution, and a 12‑thread pool reaches a 776 % improvement.

The final upload workflow includes metadata extraction, chunk slicing, hash computation, existence check, fetching missing chunks, uploading with progress callbacks, verification, and server‑side merging.

Known limitations are the dependency of MerkleTree hash on chunk size and the memory overhead of loading all chunk buffers; recent updates address memory usage by processing chunks in parts and switching to hash-wasm for faster hashing.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

frontend file-upload Chunked Upload hash calculation

Written by

Rare Earth Juejin Tech Community

Juejin, a tech community that helps developers grow.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.