Xiaohongshu Search Engine Innovations Presented at SIGIR-AP 2023
At SIGIR‑AP 2023 in Beijing, Xiaohongshu’s technical team unveiled four key innovations—advanced user‑intent analysis via multi‑stage LLM pre‑training, multimodal vector retrieval, generative inverted‑index enhancements, and a three‑stage relevance‑ranking pipeline with knowledge distillation—to tackle high multi‑intent, long‑tail, and multimodal search challenges for its 260 million‑user platform.
From November 26‑28, 2023, the ACM‑sponsored SIGIR‑AP conference was held in Beijing, jointly organized by Tsinghua University and the University of Melbourne. As the first regional Information Retrieval (IR) conference in China and a CCF A‑class event, more than 100 researchers from academia and industry gathered to discuss frontier IR technologies and trends.
Prominent speakers included Maarten de Rijke (Academy of Arts and Sciences, Netherlands), Gerard Salton Award laureate, Chengxiang Zhai, Charles Clarke, Tetsuya Sakai, and many other distinguished scholars from institutions such as Tsinghua, Renmin University, Chinese Academy of Sciences, Waseda University, and NUS.
Representing industry, the Xiaohongshu technical team delivered a talk titled “Xiaohongshu Search Engine Innovation Practice.” With over 260 million monthly active users and nearly 300 million daily search queries, Xiaohongshu’s search engine aims to provide “ordinary‑person perspective, experienced insight” and “useful intelligence.” The presentation highlighted challenges such as high multi‑intent query ratios, long‑tail queries lacking statistical signals, and the need to understand multimodal content (text, images, video, live streams).
To address these challenges, the team described four major technical advances:
1. User Intent Analysis – Short‑text understanding is enhanced through multi‑stage continuous pre‑training of large language models, including unsupervised domain pre‑training, weakly supervised log pre‑training, and fine‑tuning on manually annotated data. For long‑tail queries, knowledge‑base and entity‑linking techniques enrich the model; for head queries, log mining and system simulation provide posterior data.
2. Vector Retrieval – Multimodal representation learning is applied to both long‑tail and head queries. Cross‑modal alignment aligns note images and text with query embeddings; multimodal fusion incorporates Masked Language Modeling (MLM) and Masked Image Modeling (MIM); hard negative samples are constructed by masking, rewriting, and replacing query‑image pairs.
3. Inverted Index – Three practices improve traditional recall: (a) generating queries for low‑exposure notes using generative models; (b) converting video content to transcribed text for indexing; (c) extracting chapter‑level tags from notes, filtering irrelevant hashtags, and using weak‑supervised training to enhance multimodal understanding.
4. Relevance Ranking – A three‑stage training pipeline (unsupervised pre‑training on internal text, continuous supervised training on search logs, fine‑tuning on annotated data) is employed. Model efficiency is boosted via knowledge distillation (48‑layer BERT → 12‑layer/4‑layer), Faster Transformer, dynamic padding, and dynamic summarization. Multimodal relevance is further modeled with contrastive losses on queries vs. core words and notes vs. similar notes.
The conference also featured a keynote by Maarten de Rijke on “Simulation for Recommendations in Dynamic and Interactive Environments,” discussing how simulators can help evaluate recommendation systems under changing user preferences, bias, and noise.
SIGIR‑AP 2023 concluded successfully, fostering academic‑industry exchange and setting the stage for future research and innovation in information retrieval and AI‑driven search technologies.
Xiaohongshu Tech REDtech
Official account of the Xiaohongshu tech team, sharing tech innovations and problem insights, advancing together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.