Tagged articles
4 articles
Page 1 of 1
ITPUB
ITPUB
Feb 10, 2024 · Backend Development

How to Warm Up Your Cache to Boost High‑Concurrency System Performance

Cache warming, a technique used in high‑concurrency systems, involves preloading frequently accessed data into memory before traffic spikes to improve hit rates, reduce cold‑start latency, prevent cache breakdowns, and lessen backend load, with various strategies such as startup loading, scheduled jobs, manual triggers, Redis tools, and Caffeine loaders demonstrated through Spring Boot code examples.

BackendCacheCaffeine
0 likes · 10 min read
How to Warm Up Your Cache to Boost High‑Concurrency System Performance
Baobao Algorithm Notes
Baobao Algorithm Notes
Jan 14, 2022 · Artificial Intelligence

BERT Interview Q&A: Decoding CLS, Masks, Complexity, and More

An in‑depth Q&A breaks down core BERT concepts—from the purpose of the [CLS] token and masking strategies to self‑attention complexity, sparse attention tricks, subword handling of OOV words, warm‑up learning rates, GPT’s unidirectional nature, and ALBERT’s parameter sharing—providing concise explanations for each.

BERTMaskingSelf-Attention
0 likes · 7 min read
BERT Interview Q&A: Decoding CLS, Masks, Complexity, and More
iQIYI Technical Product Team
iQIYI Technical Product Team
Nov 27, 2020 · Artificial Intelligence

Optimizing TensorFlow Serving Model Hot‑Update to Eliminate Latency Spikes in CTR Recommendation Systems

By adding model warm‑up files, separating load/unload threads, switching to the Jemalloc allocator, and isolating TensorFlow’s parameter memory from RPC request buffers, iQIYI’s engineers reduced TensorFlow Serving hot‑update latency spikes in high‑throughput CTR recommendation services from over 120 ms to about 2 ms, eliminating jitter.

Model Hot UpdateTensorFlow ServingWarmup
0 likes · 11 min read
Optimizing TensorFlow Serving Model Hot‑Update to Eliminate Latency Spikes in CTR Recommendation Systems