Large File Chunking and Web Worker Optimization in JavaScript
This article demonstrates how to split large files into 5 MB chunks, compute MD5 hashes, and accelerate processing with Web Workers by dynamically allocating threads based on the browser's hardware concurrency, achieving up to ten‑fold speed improvements.
Large File Chunking
Hello everyone, I am sharing a tutorial on large‑file chunking combined with Web Workers. By writing this article I also learned how to obtain the number of CPU threads via JavaScript, so let’s dive straight in.
1. Initialization, Setting Up the Scaffold
<!DOCTYPE html>
<html lang="zh-CN">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<meta http-equiv="X-UA-Compatible" content="ie=edge" />
<title>大文件分片</title>
<style></style>
</head>
<body>
<input type="file" id="fileRef" />
<script type="module" src="./main.js"></script>
</body>
</html>2. Reading a Single File Chunk and Encrypting It
We use SparkMD5 for hashing; you can download and import it yourself.
import { createChunks } from './createChunks';
// Define the size of each slice (5 MB)
const CHUNK_SIZE = 1024 * 1024 * 5;
export async function cutFile(file) {
// Generate each slice – slicing is time‑consuming, so it is asynchronous
const chunk = await createChunks(file, 1, CHUNK_SIZE);
// After slicing completes, we can access the slice information
console.log(chunk);
}The cutFile function creates a slice and returns its start, end, index, and hash value.
3. Splitting the Entire File into Chunks
After calculating the total number of slices, we loop through them and store each result in an array.
export async function cutFile(file) {
const result = [];
// Calculate total number of slices
const chunks = Math.ceil(file.size / CHUNK_SIZE);
// Generate each slice asynchronously
for (let i = 0; i < chunks; i++) {
const chunk = await createChunks(file, i, CHUNK_SIZE);
result.push(chunk);
}
return result;
}The result shows that a 500 MB file was divided into 103 slices, taking about 2.3 seconds.
4. Analyzing Optimization Opportunities
When uploading multi‑gigabyte files, MD5 hashing becomes a bottleneck, causing long thread blockage. The key to optimization is to avoid blocking the main thread, which can be achieved with Web Workers.
According to MDN, a Web Worker runs scripts in a background thread separate from the main UI thread, allowing heavy computations without freezing the interface.
5. Optimizing with Web Workers
1. Setting Up Workers
First, define the number of worker threads and create them.
// Define thread count
const THREAD_COUNT = 4; // 4 workers
export async function cutFile(file) {
const result = [];
const chunks = Math.ceil(file.size / CHUNK_SIZE);
const workerChunkCount = Math.ceil(chunks / THREAD_COUNT);
for (let i = 0; i < THREAD_COUNT; i++) {
const worker = new Worker('./worker.js', { type: 'module' });
const startIndex = i * workerChunkCount;
let endIndex = startIndex + workerChunkCount;
if (endIndex > chunks) endIndex = chunks;
worker.postMessage({ file, CHUNK_SIZE, startIndex, endIndex });
worker.onmessage = e => {
// Process returned chunks
for (let j = startIndex; j < endIndex; j++) {
result[j] = e.data[j - startIndex];
}
worker.terminate();
// Resolve when all workers finish (handled later)
};
}
// Promise resolution omitted for brevity
}2. Calculating the Number of Slices per Worker
Each worker processes Math.ceil(totalChunks / THREAD_COUNT) slices, with start and end indices computed from the loop index.
3. Receiving Messages from Workers
let finishCount = 0; // Track completed workers
worker.onmessage = e => {
for (let i = startIndex; i < endIndex; i++) {
result[i] = e.data[i - startIndex];
}
worker.terminate();
finishCount++;
if (finishCount === THREAD_COUNT) {
resolve(result);
}
};4. Worker Script (worker.js)
The worker receives the file, chunk size, and slice range, then creates all required chunks in parallel using Promise.all.
import { createChunks } from "./createChunks.js";
onmessage = async e => {
const arr = [];
const { file, CHUNK_SIZE, startIndex, endIndex } = e.data;
for (let i = startIndex; i < endIndex; i++) {
arr.push(createChunks(file, i, CHUNK_SIZE));
}
const chunks = await Promise.all(arr);
postMessage(chunks);
};Running the optimized version reduces processing time from over 2 seconds to about 0.2 seconds—a ten‑fold speedup.
6. Obtaining the Number of CPU Threads in JavaScript
We can query navigator.hardwareConcurrency to get the maximum number of logical processors, falling back to 4 if unavailable, and adjust THREAD_COUNT accordingly.
// Get the number of logical CPU threads
const THREAD_COUNT = navigator.hardwareConcurrency || 4;
console.log('CPU threads:', navigator.hardwareConcurrency);Using the actual hardware concurrency further halves the processing time.
Source code repository: https://gitee.com/tcwty123/large-file-sharding
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Rare Earth Juejin Tech Community
Juejin, a tech community that helps developers grow.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
