How ByteDance’s Front‑End Team Built High‑Performance Shape Word Clouds
ByteDance’s data platform front‑end team surveyed academic, commercial, and open‑source word‑cloud solutions, identified gaps in geo‑ and shape‑based clouds, and engineered a performant front‑end layout algorithm that generates customizable shape word clouds for diverse business scenarios.
ByteDance’s data platform front‑end team surveyed academic, commercial, and open‑source word‑cloud products, summarizing algorithms from top to bottom to help readers quickly understand the development of word‑cloud algorithms and outline the roadmap for ByteDance’s data platform word‑clouds.
By Orange, from ByteDance Data Platform Front‑End Team
Preface
The previous article introduced the current state of word‑cloud development and the interaction experience of some commercial/open‑source projects. This part focuses on academic algorithm research and commercial product summaries, sharing insights and practices for ByteDance’s data platform word‑cloud development.
Development Direction Discussion
Geo Word Cloud
There is currently no usable geographic word‑cloud generation tool in the industry or open‑source space, leaving a gap.
Potential issues include:
Value : Geographic word clouds take coordinates and tags as input; they are useful only if there are suitable business scenarios, such as GIS projects.
Algorithm efficiency : Computing requires K‑means, PCA, and additional geographic considerations. A 2016 Python implementation for large data takes about 30 minutes; simplification/optimization is needed.
High input requirements : Sparse or low‑density geographic points produce sparse, unattractive clouds.
Shape Word Cloud
Open‑source does not provide an effective, easy‑to‑use shape word‑cloud library.
We can implement a simplified version of the Shape Wordle algorithm:
Discard the computationally heavy distance‑field‑based shape‑aware spiral algorithm and use a simple spiral algorithm.
Preserve pure front‑end graphic segmentation, applying the spiral algorithm independently to each segmented shape to improve aesthetics.
Retain the secondary fill algorithm to enhance shape perception after the core word layout.
Drawbacks:
The secondary fill algorithm, while improving visual quality, introduces efficiency problems, especially on low‑performance mobile devices.
Finding a front‑end graphic segmentation library that balances efficiency and effect is challenging.
Word Cloud Creation Tool
Both open‑source and commercial tools lack a convenient, visually appealing shape word‑cloud editor. Existing tools like WordArt and micro‑word‑cloud fix only a few words, making complex editing difficult.
Potential problems:
Pure front‑end tokenization and part‑of‑speech restoration : English requires POS tagging; Chinese needs effective tokenization, both of which may be problematic in a pure front‑end environment.
Other algorithmic issues are similar to those encountered in Shape Word Cloud.
Image Cloud Direction
Perfectly stitched image clouds generally require complex graphics calculations. Recent academic work (e.g., “Pyramid of Arclength Descriptor for Generating Collage of Shapes”) can achieve near‑designer quality but needs GPU acceleration and over 60 minutes to render, which is impractical for pure front‑end libraries.
Proposed initial directions:
Treat images as simple rectangles or polygons and apply a spiral algorithm directly. This yields acceptable speed but may lack compactness.
Introduce force‑directed layout: start with a spiral layout, then apply force‑directed adjustments to tighten the cloud.
Both methods can provide a basic image cloud with reasonable performance.
Data Platform Shape Word Cloud Practice
Based on the research, we explored applying shape word clouds within ByteDance’s data platform. Considering front‑end performance constraints, we balanced algorithm complexity and visual effect, designing and implementing a self‑developed shape word‑cloud layout algorithm.
This algorithm is already deployed in internal ByteDance data platform products. For example, in the Douyin Movie Index scenario, word clouds visualize comment data for each movie, conveying public opinion.
The basic cloud shows word frequency but lacks movie‑specific visual features. By using a silhouette of a related figure as the shape, we can highlight movie characteristics.
Our shape word cloud maintains accurate frequency transmission while automatically producing visually appealing shapes, suitable for various scenarios, including complex outlines and detailed silhouettes.
We provide a rich API that offers a free creative space for internal users. With automation, users only need to supply three properties (data, key, shape image) to generate a high‑quality shape word cloud.
Automation also reduces configuration burden in scenarios like the movie index, where the system can automatically select suitable parameters.
We have refined animation effects, including entrance translations and gradient transitions that adjust font size and opacity.
Future work will continue to optimize shape word‑cloud algorithms and explore practical deployments of image‑cloud techniques within acceptable performance limits.
ByteDance Data Platform
The ByteDance Data Platform team empowers all ByteDance business lines by lowering data‑application barriers, aiming to build data‑driven intelligent enterprises, enable digital transformation across industries, and create greater social value. Internally it supports most ByteDance units; externally it delivers data‑intelligence products under the Volcano Engine brand to enterprise customers.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.