Tagged articles

NLP

530 articles · Page 3 of 6

Sep 8, 2022 · Artificial Intelligence

Concept Tag Mining for Recommendation Systems: Methods, Challenges, and Solutions

This article presents a comprehensive overview of concept tag mining for recommendation systems, describing unsupervised pattern‑matching, semi‑supervised AutoPhase, and supervised NER approaches, analyzing their advantages and drawbacks, and offering practical solutions to tag duplication and quality issues.

NERNLPSemi-supervised Learning

0 likes · 11 min read

Concept Tag Mining for Recommendation Systems: Methods, Challenges, and Solutions

Youzan Coder

Sep 5, 2022 · Artificial Intelligence

Inside Youzan’s Query Parser: Architecture, Plugins, and Real‑World Impact

This article explains the role of Youzan’s Query Parser (QP) in search, walks through its overall and layered architecture, details each algorithmic plugin—from preprocessing to synonym handling—and shows concrete code examples and results that improve search relevance across multiple retail scenarios.

NLPSearch EngineSemantic Segmentation

0 likes · 12 min read

Inside Youzan’s Query Parser: Architecture, Plugins, and Real‑World Impact

Snowball Engineer Team

Sep 1, 2022 · Databases

Snowball Knowledge Graph Construction, Applications, and Industrial Deployment

The article details Snowball's large‑scale financial knowledge graph, covering its background challenges, two‑layer ontology and data design, data sourcing and pipeline, graph database selection, search and NLP services, domain‑specific pre‑training models, and future industrial considerations.

NLPSearchfinancial data

0 likes · 18 min read

Snowball Knowledge Graph Construction, Applications, and Industrial Deployment

Programmer DD

Aug 30, 2022 · Artificial Intelligence

How to Build a Custom HanLP Analyzer Plugin for Elasticsearch with Nginx

This guide walks through setting up a Java GraalVM 17 environment, installing Nginx to serve static dictionary files, configuring a HanLP‑based Elasticsearch analyzer plugin, packaging and deploying it, and testing the analyzer with JUnit5 and curl commands.

ElasticsearchHanLPJava

0 likes · 14 min read

How to Build a Custom HanLP Analyzer Plugin for Elasticsearch with Nginx

DataFunSummit

Jul 27, 2022 · Artificial Intelligence

DataFun 2022 Natural Language Processing Summit – Leading Experts Discuss Large‑Scale Language Models, Multimodal Understanding, Dialogue Systems and AI Applications

The DataFun 2022 NLP Summit, held on July 30, brings together top researchers and industry leaders from Alibaba, Baidu, Microsoft, Amazon, and more to present the latest advances in large‑scale pre‑training, multimodal perception, information extraction, dialogue interaction, machine translation, and practical AI deployments, with live streaming and free registration via QR code.

AIDialogue SystemsLarge Language Models

0 likes · 44 min read

DataFun 2022 Natural Language Processing Summit – Leading Experts Discuss Large‑Scale Language Models, Multimodal Understanding, Dialogue Systems and AI Applications

NetEase Smart Enterprise Tech+

Jul 19, 2022 · Artificial Intelligence

How NER Dominated NLPCC 2022: Techniques Behind the Winning Model

This article reviews the recent NLPCC 2022 NER competition, explains the evolution of named entity recognition, details the five major modeling paradigms, and describes the winning team’s relation‑classification approach, data‑augmentation strategy, experimental results, and its practical deployment in NetEase Cloud Commerce services.

Artificial IntelligenceDeep LearningNLP

0 likes · 13 min read

How NER Dominated NLPCC 2022: Techniques Behind the Winning Model

Bitu Technology

Jul 8, 2022 · Artificial Intelligence

Applying NLP and Machine Learning to Classify Tubi User Feedback

This article explains how Tubi leverages natural‑language processing, sentence embeddings (USE and BERT), and LightGBM models to automatically categorize large volumes of Net Promoter Score comments and customer‑support tickets, enabling data‑driven product decisions and workflow automation.

LightGBMNLPTubi

0 likes · 11 min read

Applying NLP and Machine Learning to Classify Tubi User Feedback

DataFunTalk

Jul 8, 2022 · Artificial Intelligence

Civil Aviation QA Competition (CCL2022‑DQAB): Task Description, Data, Evaluation Metrics, and Prizes

The CCL2022‑DQAB competition, organized by Beihang University and AVIC Mobile Technology, invites participants to develop reading‑comprehension models for extracting accurate question‑answer pairs from civil aviation texts, offering detailed task definitions, evaluation criteria, dataset statistics, a prize structure, and a competition schedule.

AICivil AviationEvaluation Metrics

0 likes · 5 min read

Civil Aviation QA Competition (CCL2022‑DQAB): Task Description, Data, Evaluation Metrics, and Prizes

AntTech

Jul 7, 2022 · Artificial Intelligence

Ant Group Insurance Technology Wins First Place in Fine‑Grained Dialogue Social Bias Detection at NLPCC 2023

Ant Group's insurance technology team secured the top spot in the fine‑grained dialogue social bias detection task at the 11th CCF NLPCC conference, showcasing their AI‑driven bias‑mitigation methods, a proprietary pre‑trained model AntInsBert, and a claim‑automation system that boosts insurance service fairness and efficiency.

AntInsBertBias DetectionNLP

0 likes · 3 min read

Ant Group Insurance Technology Wins First Place in Fine‑Grained Dialogue Social Bias Detection at NLPCC 2023

政采云技术

Jul 5, 2022 · Artificial Intelligence

Overview of Natural Language Processing Techniques and Their Evolution

This article provides a comprehensive overview of natural language processing, covering its definition, historical development from one‑hot encoding to modern models such as word2vec, ELMo, GPT, and BERT, and discusses the advantages, limitations, and key concepts of each technique.

Artificial IntelligenceLanguage ModelsNLP

0 likes · 23 min read

Overview of Natural Language Processing Techniques and Their Evolution

Airbnb Technology Team

Jul 4, 2022 · Artificial Intelligence

Intelligent Customer Service Product: Overview, History, Architecture, and Future Trends

The article outlines the evolution, architecture, and core value of intelligent customer service systems—detailing their GUI‑based chatbot interface, triage and dialogue modes, knowledge‑base management, and operator benefits—while highlighting future trends such as richer human‑like interactions, 5G‑enabled channels, and continuous feedback‑driven improvement.

AIAutomationChatbot

0 likes · 12 min read

Intelligent Customer Service Product: Overview, History, Architecture, and Future Trends

DataFunTalk

Jun 30, 2022 · Artificial Intelligence

OBERT: A Billion‑Parameter Pretrained Language Model for Large‑Scale NLP Applications

The OPPO XiaoBu team introduced OBERT, a series of 100M‑, 300M‑, and 1B‑parameter pretrained language models that leverage massive TB‑scale corpora, multi‑granular masking, retrieval‑augmented training, and distributed acceleration to achieve state‑of‑the‑art results on CLUE and KgCLUE benchmarks while enabling efficient industrial deployment.

Knowledge augmentationLarge Language ModelNLP

0 likes · 12 min read

OBERT: A Billion‑Parameter Pretrained Language Model for Large‑Scale NLP Applications

DataFunSummit

Jun 26, 2022 · Artificial Intelligence

Applying Knowledge Graphs to Recruitment: Construction, Tag Mining, and Recommendation at 58.com

58.com’s NLP senior engineer explains how a recruitment knowledge graph is built—through multi‑dimensional tag systems, tag mining, and relation extraction—and how it enhances bidirectional matching and recommendation efficiency, addressing challenges such as weak expression, cold start, and supply‑demand imbalance.

AINLPdata augmentation

0 likes · 17 min read

Applying Knowledge Graphs to Recruitment: Construction, Tag Mining, and Recommendation at 58.com

AntTech

Jun 21, 2022 · Artificial Intelligence

FinQA Competition Winning Model by Ant Risk AI: Architecture, Dataset, and Experimental Results

Ant Risk AI’s team secured the FinQA competition champion by presenting a comprehensive model that combines a retriever and program generator, detailed dataset analysis, domain-specific language design, and extensive experiments demonstrating superior execution and program accuracy on financial numerical reasoning tasks.

Dataset AnalysisFinQANLP

0 likes · 16 min read

FinQA Competition Winning Model by Ant Risk AI: Architecture, Dataset, and Experimental Results

JD Retail Technology

Jun 16, 2022 · Artificial Intelligence

2022 Global AI Technology Innovation Competition – Algorithm Challenge: Connecting AI with E‑commerce

The 2022 Global AI Technology Innovation Competition – Algorithm Challenge, co‑hosted by JD Retail and academic partners, brought together 12 finalist teams from over 3,000 entrants to tackle e‑commerce‑focused AI problems such as multimodal image‑text matching and product‑title entity recognition, highlighting real‑world business impact and fostering talent exchange.

AI competitionE‑CommerceJD Retail

0 likes · 8 min read

2022 Global AI Technology Innovation Competition – Algorithm Challenge: Connecting AI with E‑commerce

Ctrip Technology

Jun 16, 2022 · Artificial Intelligence

Entity Linking System for Travel Knowledge Graph at Ctrip AI R&D

The article presents Ctrip's travel AI team's end‑to‑end entity linking solution built on a large‑scale tourism knowledge graph, detailing its background, technical architecture, core modules—including mention detection, candidate generation, and disambiguation using BERT and prefix‑tree techniques—and real‑world applications such as search, intelligent客服, and POI data maintenance.

BERTNLPentity linking

0 likes · 18 min read

Entity Linking System for Travel Knowledge Graph at Ctrip AI R&D

Baidu Geek Talk

Jun 15, 2022 · Artificial Intelligence

CCL2022 Video Highlight Extraction Challenge Overview

The article describes the CCL2022 Video Highlight Extraction Challenge, a competition at the 21st China Conference on Computational Linguistics organized by Baidu, inviting participants worldwide to generate timestamped concise summaries of video segments, with registration details, eligibility, task description, example inputs/outputs, and evaluation metrics based on timing accuracy and ROUGE-L.

CCL2022Evaluation MetricsNLP

0 likes · 6 min read

CCL2022 Video Highlight Extraction Challenge Overview

DataFunTalk

Jun 13, 2022 · Artificial Intelligence

JD Technology Financial Causal Knowledge Graph: Construction, Causal Extraction, and Alignment Techniques

This article presents JD Technology's recent research on financial causal knowledge graphs, detailing the overall knowledge‑graph architecture, data layers, causal relation extraction, argument extraction, and graph‑alignment methods, and discusses their applications in finance, intelligent research reports, and industry‑leader recommendation.

Graph AlignmentNLPSemantic Role Labeling

0 likes · 18 min read

JD Technology Financial Causal Knowledge Graph: Construction, Causal Extraction, and Alignment Techniques

Meituan Technology Team

Jun 9, 2022 · Artificial Intelligence

FSL++: A Few-Shot Learning Model for Chinese Language Understanding that Tops the FewCLUE Benchmark

FSL++—a RoBERTa‑large‑based few‑shot model enhanced with domain‑adaptive pre‑training, prompt learning, diverse embedding‑level augmentations, and ensemble self‑training—topped the Chinese FewCLUE benchmark, beating human accuracy on news and scientific classification tasks and delivering measurable gains across multiple Meituan product scenarios.

Chinese language understandingEnsembleNLP

0 likes · 23 min read

FSL++: A Few-Shot Learning Model for Chinese Language Understanding that Tops the FewCLUE Benchmark

Alimama Tech

Jun 8, 2022 · Artificial Intelligence

CTR-Driven Advertising Text Generation and Bundle Creative Optimization (CREATER & CONNA)

Alibaba’s advertising team introduces CREATER, a CTR‑driven text generator that leverages user reviews, aspect control codes, and contrastive fine‑tuning, and CONNA, a non‑autoregressive bundle creator that predicts heterogeneous ad elements with set‑based loss, both delivering substantial online CTR gains and CPC reductions through dynamic creative optimization.

CTRDynamic creative optimizationNLP

0 likes · 25 min read

CTR-Driven Advertising Text Generation and Bundle Creative Optimization (CREATER & CONNA)

Python Programming Learning Circle

Jun 8, 2022 · Artificial Intelligence

Leveraging PaddleNLP UIE for Zero‑Shot Logistic Parcel Information Extraction

This article explains how PaddleNLP's Universal Information Extraction (UIE) model can dramatically reduce labeling effort and improve accuracy for logistics parcel data extraction, showcasing a five‑sample experiment that boosts F1 by 18 points to 93% and providing a zero‑shot Python example.

NLPPaddleNLPPython

0 likes · 5 min read

Leveraging PaddleNLP UIE for Zero‑Shot Logistic Parcel Information Extraction

Meituan Technology Team

May 26, 2022 · Information Security

Building and Deploying Software Composition Analysis (SCA) for Enterprise Security

The article analyzes the rising threat of open‑source components, explains Software Composition Analysis (SCA) and SBOM generation, outlines the three‑stage process for building an in‑house SCA capability, discusses practical challenges such as data quality and integration, and looks ahead to future standards and open‑source tools.

DevSecOpsNLPOpen-source

0 likes · 37 min read

Building and Deploying Software Composition Analysis (SCA) for Enterprise Security

Tencent Cloud Developer

May 19, 2022 · Industry Insights

What Does the Future Hold for AI? Insights from Industry Leaders

In a TVP forum hosted by Li Kaifu and Shen Chunhua, experts trace AI’s 70‑year journey, discuss the origins of the book “AI Future in Progress,” analyze investment stages, AI‑cloud synergy, NLP breakthroughs, medical applications, societal impacts, data privacy, and the challenges facing traditional enterprises.

AIAI cloudHealthcare AI

0 likes · 23 min read

What Does the Future Hold for AI? Insights from Industry Leaders

Code DAO

May 19, 2022 · Artificial Intelligence

Semi‑Supervised Training Methods for Transformers

This article explains an end‑to‑end semi‑supervised training pipeline for Transformer‑based NLP models, detailing the unsupervised language‑model pre‑training, supervised fine‑tuning, and the internal architecture of embeddings, encoder layers, and downstream tasks such as text classification and NER.

BERTMasked Language ModelNLP

0 likes · 9 min read

Semi‑Supervised Training Methods for Transformers

Sohu Tech Products

May 18, 2022 · Artificial Intelligence

Design and Implementation of the Internal Intelligent QA Chatbot “Jarvis”

This article describes the motivation, micro‑service architecture, code implementation, V1.0 browser‑based NLP prototype, V2.0 AI‑enhanced version with BM25 and BERT, integration with ChatUI, DingTalk bot, command‑based automation, and future plans for the internal intelligent QA chatbot named Jarvis.

AIAutomationChatbot

0 likes · 18 min read

Design and Implementation of the Internal Intelligent QA Chatbot “Jarvis”

DataFunTalk

May 7, 2022 · Artificial Intelligence

Intelligent Recommendation Selling Point Generation: Architecture, Core AI Techniques, Model Development, and Product Impact

This article explains how JD's intelligent recommendation selling point system leverages NLP, BERT, Transformer and pointer‑generator models to automatically create short, personalized product highlights, describing the technical background, system architecture, model training pipeline, online/offline monitoring, and the resulting business benefits.

BERTE‑CommerceNLP

0 likes · 13 min read

Intelligent Recommendation Selling Point Generation: Architecture, Core AI Techniques, Model Development, and Product Impact

DataFunTalk

May 5, 2022 · Artificial Intelligence

NLP Evolution: Symbolic Deep Parsing vs Neural Pre‑trained Models, Low‑Code Trends, and Semi‑Automated Applications

The article reviews the history and current state of NLP, compares symbolic deep‑parsing and neural pre‑trained approaches, discusses the knowledge‑bottleneck and low‑code trend, and illustrates semi‑automated, low‑code NLP deployment in the financial domain while pondering future integration of symbolic and neural methods.

Knowledge EngineeringNLPSemi-Automated

0 likes · 23 min read

NLP Evolution: Symbolic Deep Parsing vs Neural Pre‑trained Models, Low‑Code Trends, and Semi‑Automated Applications

DataFunTalk

May 1, 2022 · Artificial Intelligence

Graph Deep Learning for Natural Language Processing: Methods, Models, and the Graph4NLP Library

This talk introduces graph deep learning techniques for natural language processing, covering the motivation for graph representations, traditional graph-based NLP methods, fundamentals of graph neural networks, static and dynamic graph construction, representation learning, and showcases the open‑source Graph4NLP Python library with example applications.

Graph Neural NetworksGraph RepresentationGraph4NLP

0 likes · 16 min read

Graph Deep Learning for Natural Language Processing: Methods, Models, and the Graph4NLP Library

Tencent Tech

Apr 29, 2022 · Artificial Intelligence

Tencent’s Hunyuan AI Model Tops CLUE Leaderboard with Record Score

Tencent’s Hunyuan AI large model shattered records by scoring 80.888 to claim first place on the CLUE benchmark, showcasing its advanced natural language processing, multimodal abilities, curriculum‑learning training approach, and real‑world deployments in WeChat Search and advertising.

AICLUEHunyuan

0 likes · 3 min read

Tencent’s Hunyuan AI Model Tops CLUE Leaderboard with Record Score

Code DAO

Apr 18, 2022 · Artificial Intelligence

Transformer‑Based Denoising AutoEncoder (TSDAE) for Job Description Embeddings (Job2Vec)

This article explains how TSDAE, a transformer‑based denoising auto‑encoder, converts noisy job description sentences into robust vector embeddings, details its training process, loss function, dataset preparation, and demonstrates using FAISS for similarity search on the resulting Job2Vec representations.

AutoencoderFAISSNLP

0 likes · 4 min read

Transformer‑Based Denoising AutoEncoder (TSDAE) for Job Description Embeddings (Job2Vec)

Zuoyebang Tech Team

Apr 15, 2022 · Artificial Intelligence

Zuoyebang’s NLP Platforms: Boosting Online Education with AI

In this interview, Zuoyebang’s NLP lead explains how the company built self‑developed platforms like IQC and FTP to automate text quality inspection and intelligent labeling, outlines their architecture, shares practical deep‑learning applications such as translation and grammar correction, and discusses future research directions in large‑scale multi‑label classification, few‑shot learning, and multimodal models.

AI PlatformsNLPmachine learning

0 likes · 11 min read

Zuoyebang’s NLP Platforms: Boosting Online Education with AI

TAL Education Technology

Apr 14, 2022 · Artificial Intelligence

Intelligent Call Recording Quality Inspection Using Dual‑Mode Detection

This article proposes a dual‑mode detection solution for call‑recording quality inspection that combines rule‑based semantic similarity matching with BERT‑based sentence segmentation and RoBERTa multi‑label classification to achieve high accuracy, fast task adaptation, and strong generalization for customer‑service scenarios.

BERTNLPRoBERTa

0 likes · 7 min read

Intelligent Call Recording Quality Inspection Using Dual‑Mode Detection

政采云技术

Apr 12, 2022 · Artificial Intelligence

Design and Implementation of the Internal Intelligent QA Chatbot “Jarvis”

This article describes the end‑to‑end design, architecture, code implementation, and deployment steps for an internal intelligent QA chatbot named “Jarvis”, covering its V1.0 browser‑based prototype, V2.0 AI‑enhanced version, DingTalk integration, automation features, and future roadmap.

AIAutomationChatbot

0 likes · 19 min read

DaTaobao Tech

Apr 12, 2022 · Artificial Intelligence

ArcCSE: Angular Margin Contrastive Learning for Self‑Supervised Text Representation

ArcCSE introduces an angular‑margin contrastive loss and both pairwise (dropout‑augmented) and triple‑wise (span‑masked) relationship modeling to self‑supervise text embeddings, yielding tighter decision boundaries, higher alignment and uniformity, and superior performance on unsupervised STS, SentEval, and Alibaba’s retrieval and recommendation systems.

NLPangular margincontrastive learning

0 likes · 8 min read

ArcCSE: Angular Margin Contrastive Learning for Self‑Supervised Text Representation

Code DAO

Apr 10, 2022 · Artificial Intelligence

A Comprehensive Overview of Relation Extraction Techniques

This article surveys relation extraction, defining the task, categorizing its five main forms, and detailing key approaches such as entity position encoding, dependency‑tree methods like shortest dependency path and BRCNN, as well as distant supervision with multi‑instance learning and selective attention.

NLPdependency parsingdistant supervision

0 likes · 12 min read

A Comprehensive Overview of Relation Extraction Techniques

Xueersi Online School Tech Team

Apr 2, 2022 · Artificial Intelligence

Design and Implementation of the Xiaosi Intelligent Customer Service Bot for Internal IM

This article details the design, architecture, key technologies, and deployment of Xiaosi, an AI‑powered intelligent customer service chatbot built on the internal IM platform, highlighting its problem‑diagnosis accuracy, manpower savings, adapter mechanism, high‑availability backend, and practical use cases.

AI ChatbotDjangoNGINX

0 likes · 13 min read

Design and Implementation of the Xiaosi Intelligent Customer Service Bot for Internal IM

Youku Technology

Apr 2, 2022 · Artificial Intelligence

Constrained Sequence-to-Tree Generation for Hierarchical Text Classification

At SIGIR 2022, the authors present a constrained Seq2Tree model that transforms hierarchical label taxonomies into preorder sequences and applies dynamic‑dictionary decoding to ensure label consistency, achieving superior hierarchical text classification performance on benchmark datasets and real‑world deployment within Alibaba Entertainment’s AI Brain.

Artificial IntelligenceEncoder-DecoderHierarchical Text Classification

0 likes · 5 min read

Constrained Sequence-to-Tree Generation for Hierarchical Text Classification

Yuewen Technology

Apr 1, 2022 · Artificial Intelligence

Detecting Emerging Terms in Web Novels: PMI, Entropy, and TF‑IDF Methods

This article explores how to automatically discover new words in Chinese web novels by combining n‑gram statistics, pointwise mutual information, information entropy, and TF‑IDF filtering, presenting a practical, unsupervised pipeline that improves tokenization and search recall without manual labeling.

Chinese text miningNLPPMI

0 likes · 14 min read

Detecting Emerging Terms in Web Novels: PMI, Entropy, and TF‑IDF Methods

DaTaobao Tech

Mar 31, 2022 · Artificial Intelligence

Intelligent Copy Generation for Taobao Push: Design, Implementation, and Evaluation

The 2021 Taobao Push project introduced an AI‑driven copy‑generation platform that combines template extraction and fine‑tuned Unilm models with diverse beam search, creating diverse, high‑quality push messages, cutting manual costs, and delivering a 10 % click‑through lift and higher material adoption.

AICopy GenerationNLP

0 likes · 18 min read

Intelligent Copy Generation for Taobao Push: Design, Implementation, and Evaluation

JD.com Experience Design Center

Mar 25, 2022 · Product Management

Unlocking User Insights: Leveraging Online Reviews with NLP for Product Research

This article explains how product researchers can tap into massive user comment data using NLP tools to identify pain points, sentiment, and opportunities, offering practical scenarios and tool recommendations for e‑commerce, offline retail, and app platforms.

NLPProduct ManagementSentiment Analysis

0 likes · 8 min read

Unlocking User Insights: Leveraging Online Reviews with NLP for Product Research

JD Cloud Developers

Mar 21, 2022 · Artificial Intelligence

How JD’s AI Generates Multimodal Product Summaries to Boost E‑Commerce

The article explains how rapid internet growth created information overload, leading to concise summary services, and how recent AI advances—especially large language models like GPT‑3—enable platforms such as JD.com to automatically generate high‑quality, multimodal product copy that drives sales and supports diverse creative tasks.

AINLPText Generation

0 likes · 4 min read

How JD’s AI Generates Multimodal Product Summaries to Boost E‑Commerce

Meituan Technology Team

Mar 17, 2022 · Artificial Intelligence

Tsinghua University & Meituan Digital Life Joint Research Institute Academic Salon: Large Model Technologies and Challenges

The Tsinghua‑Meituan Digital Life Joint Research Institute’s Academic Salon on March 23 featured Associate Professor Liu Zhiyuan presenting the latest advances and ten key challenges in large‑model technologies, aiming to foster industry‑academia collaboration and drive innovation in representation learning, knowledge graphs, and social computing.

Academic SeminarArtificial IntelligenceLarge Language Models

0 likes · 4 min read

Tsinghua University & Meituan Digital Life Joint Research Institute Academic Salon: Large Model Technologies and Challenges

DataFunTalk

Mar 17, 2022 · Artificial Intelligence

A Survey of Text Classification and Intent Recognition: Industrial and Research Perspectives

This article reviews recent developments in text classification and intent recognition, comparing industrial practices such as business‑coupled feature engineering with research trends like pretrained language models, and provides references and practical insights for building effective NLP solutions.

Intent RecognitionNLPPretrained Models

0 likes · 13 min read

A Survey of Text Classification and Intent Recognition: Industrial and Research Perspectives

ELab Team

Mar 16, 2022 · Artificial Intelligence

Reverse Dictionary Made Easy: Harness WantWords and Hugging Face for Quick NLP Model Integration

This article introduces the open‑source WantWords reverse‑dictionary project, explains its token‑based processing pipeline, walks through Python implementation and model invocation with Hugging Face’s Transformers, reviews NLP’s historical evolution, and shows how front‑end developers can quickly integrate NLP models into products.

Artificial IntelligenceBERTHugging Face

0 likes · 13 min read

Reverse Dictionary Made Easy: Harness WantWords and Hugging Face for Quick NLP Model Integration

Baobao Algorithm Notes

Mar 16, 2022 · Artificial Intelligence

How to Boost Kaggle NLP Scores with BERT, Tree Models, and Smart Post‑Processing

The article analyzes a recent Kaggle essay‑segmentation competition, explains why standard BERT‑based models plateau, and shows how a two‑stage pipeline that combines coarse BERT filtering with a feature‑rich tree model and post‑processing scaling can push scores well beyond the 70‑point barrier.

BERTKaggleNLP

0 likes · 5 min read

How to Boost Kaggle NLP Scores with BERT, Tree Models, and Smart Post‑Processing

Python Programming Learning Circle

Mar 10, 2022 · Artificial Intelligence

Top 7 Python Libraries and Packages of the Year for Data Science and AI

This article reviews the seven most notable Python libraries and packages of 2018 for data scientists and AI practitioners, including AdaNet, TPOT, SHAP, Optimus, spaCy, Jupytext, and Chartify, with descriptions, installation commands, and usage examples.

AutoMLNLPdata cleaning

0 likes · 15 min read

Top 7 Python Libraries and Packages of the Year for Data Science and AI

Baobao Algorithm Notes

Feb 10, 2022 · Artificial Intelligence

Winning Kaggle’s Jigsaw Toxicity Challenge with Transfer Learning and Zero‑Shot Classification

This article breaks down the evolution of Kaggle’s Jigsaw toxic comment competitions and presents a three‑step solution—training on historic data, using a genetic algorithm to weight multi‑label predictions, and ensembling fifteen models—to achieve high‑accuracy zero‑shot text classification.

Artificial IntelligenceKaggleNLP

0 likes · 5 min read

Winning Kaggle’s Jigsaw Toxicity Challenge with Transfer Learning and Zero‑Shot Classification

Baobao Algorithm Notes

Jan 28, 2022 · Artificial Intelligence

How Pre‑Training Evolved: From word2vec to MAE Across NLP and CV

This article traces the history of deep‑learning pre‑training techniques, comparing the parallel developments in natural‑language processing and computer vision—from early word2vec and bag‑of‑words models through ELMo and BERT to recent transformer‑based vision models like iGPT, ViT, BEiT and MAE—highlighting key innovations, challenges, and the convergence of the two fields.

Deep LearningMAENLP

0 likes · 20 min read

How Pre‑Training Evolved: From word2vec to MAE Across NLP and CV

DataFunTalk

Jan 26, 2022 · Artificial Intelligence

Exploring and Practicing Generative Chat in OPPO's XiaoBu Assistant

This article presents a comprehensive overview of OPPO's XiaoBu Assistant, detailing its research background, chat skill architecture, evolution from retrieval and rule‑based methods to generative models, industry model comparisons, decoding and ranking strategies, safety mechanisms, performance optimizations, and evaluation results.

ChatbotDialogue SystemsGenerative AI

0 likes · 20 min read

Exploring and Practicing Generative Chat in OPPO's XiaoBu Assistant

Code DAO

Jan 15, 2022 · Artificial Intelligence

Compressing Unsupervised fastText Models 300× Smaller with Near‑Identical NLP Performance

This article shows how the compress‑fasttext Python library can shrink a 7 GB fastText word‑embedding model to about 21 MB—a 300‑fold reduction—while preserving almost the same accuracy on downstream NLP tasks, and explains the underlying compression techniques, usage examples, and evaluation results.

NLPcompress-fasttextfastText

0 likes · 9 min read

Compressing Unsupervised fastText Models 300× Smaller with Near‑Identical NLP Performance

Baobao Algorithm Notes

Jan 14, 2022 · Artificial Intelligence

Boosting BERT Text Classification with Label Embedding: How It Works

The paper proposes a simple yet effective method that fuses label embeddings into BERT, enhancing text‑classification performance without increasing computational cost, and validates the approach across six benchmark datasets, also exploring tf‑idf‑based label augmentation and the impact of using [SEP] versus no‑[SEP] inputs.

BERTDeep LearningNLP

0 likes · 8 min read

Boosting BERT Text Classification with Label Embedding: How It Works

Baobao Algorithm Notes

Jan 14, 2022 · Artificial Intelligence

Visualize Transformer Attention with BertViz: Install and Example Walkthrough

This guide introduces BertViz, an interactive visualization tool for transformer models such as BERT, GPT‑2 and T5, explains how to install it via pip along with required dependencies, and demonstrates head, model, and neuron view visualizations with code examples in Jupyter.

Attention VisualizationBertVizNLP

0 likes · 6 min read

Visualize Transformer Attention with BertViz: Install and Example Walkthrough

Beike Product & Technology

Jan 7, 2022 · Artificial Intelligence

Beike Real Estate NLP Team Wins First Place in CCIR Cup 2021 Intelligent Human‑Computer Interaction Track

The Beike Real Estate NLP team secured first place in the CCIR Cup 2021 Intelligent Human‑Computer Interaction track by applying semi‑supervised and transfer learning techniques to small‑sample intent recognition and slot filling, and also presented the large‑scale Mandarin dialect speech benchmark KeSpeech at NeurIPS 2021.

AI competitionBERTNLP

0 likes · 5 min read

Beike Real Estate NLP Team Wins First Place in CCIR Cup 2021 Intelligent Human‑Computer Interaction Track

JD Cloud Developers

Jan 4, 2022 · Artificial Intelligence

How JD’s Vega v1 Model Dominated GLUE Benchmark, Surpassing Human Performance

JD Explore’s Vega v1 model topped the GLUE benchmark with a 91.3 average score, outperforming Microsoft, Facebook, and Stanford across multiple NLP tasks, including first‑ever human‑level results on sentiment analysis and coreference, showcasing JD’s leading position in deep‑learning research.

AI researchDeep LearningGLUE benchmark

0 likes · 3 min read

How JD’s Vega v1 Model Dominated GLUE Benchmark, Surpassing Human Performance

DataFunTalk

Dec 31, 2021 · Artificial Intelligence

Knowledge‑Enhanced Semantic Understanding with Baidu ERNIE: Techniques, Progress, and Applications

This article reviews Baidu's knowledge‑enhanced semantic understanding models, detailing the evolution from early semantic techniques to ERNIE 1.0, 2.0 and the large‑scale ERNIE 3.0, its architecture, training strategies, performance benchmarks, and real‑world applications across industry.

AIERNIENLP

0 likes · 19 min read

Knowledge‑Enhanced Semantic Understanding with Baidu ERNIE: Techniques, Progress, and Applications

Ctrip Technology

Dec 30, 2021 · Artificial Intelligence

Semantic Matching Techniques for Intelligent Customer Service at Ctrip

This article presents Ctrip's intelligent customer service system, detailing the evolution of semantic matching methods from traditional lexical models to deep learning approaches such as BERT and ESIM, and describing multi‑stage retrieval, multilingual transfer learning, and KBQA techniques for improving query understanding and response accuracy.

BERTMultilingualNLP

0 likes · 16 min read

Semantic Matching Techniques for Intelligent Customer Service at Ctrip

Python Crawling & Data Mining

Dec 29, 2021 · Artificial Intelligence

Boost Chinese Sentiment Analysis: Master Jieba Segmentation and SnowNLP

This tutorial walks through Chinese text tokenization with Jieba, optimizes the token list using stop‑words and part‑of‑speech filtering, visualises word frequencies, and applies SnowNLP to perform sentiment analysis on Weibo comments, complete with code examples and result charts.

NLPtext segmentation

0 likes · 8 min read

Boost Chinese Sentiment Analysis: Master Jieba Segmentation and SnowNLP

Baobao Algorithm Notes

Dec 23, 2021 · Artificial Intelligence

How Pre‑Training Evolved: From word2vec to MAE Across NLP & Vision

This article traces the evolution of deep‑learning pre‑training techniques, starting with word2vec in NLP, moving through ELMo and BERT, then shifting to computer‑vision models such as iGPT, ViT, BEiT, and MAE, highlighting key innovations, challenges, and the convergence of NLP and CV paradigms.

BERTMAENLP

0 likes · 21 min read

How Pre‑Training Evolved: From word2vec to MAE Across NLP & Vision

DataFunSummit

Dec 21, 2021 · Artificial Intelligence

Large‑Scale Pretrained Model Compression and Distillation: AdaBERT, L2A, and Meta‑KD

This talk presents Alibaba DAMO Academy’s recent work on compressing large pretrained language models, covering task‑adaptive AdaBERT, data‑augmented L2A, and meta‑knowledge distillation Meta‑KD, describing their motivations, architectures, NAS‑based search, loss designs, and experimental results across multiple NLP tasks.

Knowledge DistillationNLPNeural Architecture Search

0 likes · 13 min read

Large‑Scale Pretrained Model Compression and Distillation: AdaBERT, L2A, and Meta‑KD

Python Programming Learning Circle

Dec 16, 2021 · Artificial Intelligence

Part-of-Speech Tagging with Jieba in Python

This article explains how to perform Chinese part-of-speech tagging using the jieba.posseg library in Python, including loading stop words, extracting article content via Newspaper3k, applying precise mode segmentation, filtering, and presenting results in a pandas DataFrame.

NLPPOS taggingPython

0 likes · 3 min read

Part-of-Speech Tagging with Jieba in Python

Code DAO

Dec 12, 2021 · Artificial Intelligence

How to Boost Text Analysis Accuracy on a 2‑Billion‑Word Corpus

This article explains practical techniques for improving NLP model accuracy on massive corpora, covering challenges of multi‑field text, word‑embedding choices, a fasttext‑based regression demo with book‑review data, feature engineering tricks, and a comparison with tf‑idf + LASSO.

NLPPythonRegression

0 likes · 13 min read

How to Boost Text Analysis Accuracy on a 2‑Billion‑Word Corpus

Laiye Technology Team

Dec 10, 2021 · Artificial Intelligence

Best Practices for Building an Entity‑Relationship Annotation Tool at Laiye AI R&D Center

This article details Laiye Technology’s AI R&D team’s end‑to‑end approach to designing and optimizing a custom entity‑relationship annotation tool, covering data‑labeling challenges, shortcomings of Excel and off‑the‑shelf solutions, architectural requirements, line‑breaking and mark‑position algorithms, performance improvements, and real‑world results.

JavaScriptNLPPerformance Optimization

0 likes · 12 min read

Best Practices for Building an Entity‑Relationship Annotation Tool at Laiye AI R&D Center

Meituan Technology Team

Dec 9, 2021 · Artificial Intelligence

Fine-Grained Aspect-Based Sentiment Analysis for Meituan's To‑Restaurant Business

To enhance decision‑making for users and quality monitoring for merchants, Meituan’s to‑restaurant platform implements fine‑grained aspect‑based sentiment analysis that extracts dish, attribute, opinion and polarity tuples from reviews, employing both a BERT‑CRF pipeline and a joint Dual‑MRC model which raise F1 scores from 0.61 to 0.68, and are deployed in dashboards and review‑management tools, with future work targeting efficiency and broader four‑tuple extraction.

ABSABERTNLP

0 likes · 28 min read

Fine-Grained Aspect-Based Sentiment Analysis for Meituan's To‑Restaurant Business

Code DAO

Dec 7, 2021 · Artificial Intelligence

How to Cluster Text with TF‑IDF, KMeans and PCA in Python

This article walks through a complete Python workflow that loads the 20 Newsgroups dataset, preprocesses the documents, vectorizes them with TF‑IDF, groups them using KMeans, reduces dimensions with PCA, and visualizes the resulting clusters, illustrating each step with code and plots.

KMeansNLPPCA

0 likes · 13 min read

How to Cluster Text with TF‑IDF, KMeans and PCA in Python

Alibaba Cloud Native

Dec 7, 2021 · Operations

How Information Entropy Powers AI‑Driven Alert Noise Reduction in Cloud‑Native Operations

This article explains how Shannon's information entropy and NLP are combined in Alibaba Cloud's ARMS intelligent noise reduction to quantify alert uncertainty, filter redundant notifications, and automatically prioritize critical incidents, offering a practical, self‑learning solution for modern monitoring environments.

Alert Noise ReductionNLPinformation entropy

0 likes · 11 min read

How Information Entropy Powers AI‑Driven Alert Noise Reduction in Cloud‑Native Operations

DataFunTalk

Nov 29, 2021 · Artificial Intelligence

Text Mining for User Research: Architecture, Labeling, and Application Cases at JD.com

The presentation explains how JD.com leverages large‑scale text mining and NLP techniques—including data cleaning, multi‑level labeling, sentiment classification with models such as TextCNN, RoBERTa, and USE—to transform unstructured customer feedback into actionable product insights across various e‑commerce scenarios.

AIE‑CommerceNLP

0 likes · 18 min read

Text Mining for User Research: Architecture, Labeling, and Application Cases at JD.com

DataFunTalk

Nov 26, 2021 · Artificial Intelligence

Solving Model Prediction Errors: A Comprehensive Bad‑Case Treatment Methodology

This article presents a step‑by‑step methodology for diagnosing and fixing model prediction errors—especially bad cases—in NLP and search systems, covering sample bias, threshold selection, preprocessing, post‑processing, validation cycles, and guidance on when to replace the model.

NLPPostProcessingPreprocessing

0 likes · 11 min read

Solving Model Prediction Errors: A Comprehensive Bad‑Case Treatment Methodology

Programmer DD

Nov 26, 2021 · Artificial Intelligence

Leverage DDParser for COVID‑19 Vaccine Data Extraction and Open‑Source Tools

This article introduces several pandemic‑related open‑source resources—including a nationwide COVID‑19 vaccine record lookup, a tracker for ineffective vaccine distribution, and Baidu's DDParser NLP tool—detailing their purpose, usage, and installation to help developers build better vaccine‑related applications.

COVID-19DDParserNLP

0 likes · 5 min read

Leverage DDParser for COVID‑19 Vaccine Data Extraction and Open‑Source Tools

Baidu Geek Talk

Nov 17, 2021 · Artificial Intelligence

Fast Video Editing: Architecture and AI‑Powered Subtitle & Redundant Segment Detection

Baidu’s Fast Editing tool automates video trimming by using NLP to recognize subtitles, tone markers and duplicate sentences, then aligns them with the timeline for one‑click removal, employing character, Levenshtein and cosine similarity algorithms within a three‑module architecture (Plugin, Window, Caption) and planning on‑device PaddlePaddle analysis to cut latency and cost.

AINLPSwift

0 likes · 11 min read

Fast Video Editing: Architecture and AI‑Powered Subtitle & Redundant Segment Detection

Kuaishou Tech

Nov 16, 2021 · Artificial Intelligence

KuaiSearch's PERKS Pre‑trained Language Model Sets New Record on the CLUE Benchmark

The KuaiSearch research team introduced PERKS, a large‑scale Chinese pre‑trained language model that achieved an 80.618 score on the CLUE 1.1 language classification task, narrowing the gap to human annotation and demonstrating significant advances in multi‑stage training, model optimization, and real‑world search applications.

CLUE benchmarkKuaiSearchNLP

0 likes · 7 min read

KuaiSearch's PERKS Pre‑trained Language Model Sets New Record on the CLUE Benchmark

DataFunSummit

Nov 14, 2021 · Artificial Intelligence

Overview of Pre‑training Models and the UER‑py Framework for Natural Language Processing

This article introduces the importance of pre‑training in natural language processing, reviews classic pre‑training models such as Skip‑thoughts, BERT, GPT‑2 and T5, presents the modular UER‑py framework and its Chinese resources, compares it with Huggingface Transformers, and outlines practical deployment steps in industry.

Language ModelsNLPUER-py

0 likes · 22 min read

Overview of Pre‑training Models and the UER‑py Framework for Natural Language Processing

DataFunTalk

Nov 12, 2021 · Artificial Intelligence

Xiaomi Xiao AI Intelligent Question‑Answering System: Architecture, Techniques, and Applications

This article presents a comprehensive overview of Xiaomi's Xiao AI intelligent QA system, detailing its background, three core answering modules—knowledge‑graph QA, retrieval‑based FAQ, and reading‑comprehension—and the underlying methods such as template matching, cross‑domain semantic parsing, path‑based reasoning, semantic retrieval, and neural matching, while also discussing performance results and practical trade‑offs.

AINLPquestion answering

0 likes · 18 min read

Xiaomi Xiao AI Intelligent Question‑Answering System: Architecture, Techniques, and Applications

NetEase Smart Enterprise Tech+

Nov 11, 2021 · Artificial Intelligence

Transforming B2B Customer Service: Table QA via Multi‑Turn Dialogue

This article explores how table‑based question answering can be integrated into B2B intelligent customer service by converting table queries into entity‑attribute recognition and multi‑turn dialogue, comparing end‑to‑end NL2SQL and slot‑filling approaches, and presenting NetEase Qiyu's practical implementation with its benefits and use cases.

NL2SQLNLPattribute extraction

0 likes · 10 min read

Transforming B2B Customer Service: Table QA via Multi‑Turn Dialogue

Meituan Technology Team

Nov 4, 2021 · Artificial Intelligence

Knowledge-based Question Answering (KBQA) System at Meituan: Design, Challenges, and Solutions

Meituan’s knowledge‑based question answering system tackles diverse, constraint‑rich, multi‑hop queries across pre‑sale, in‑sale and post‑sale scenarios by integrating fine‑grained query understanding, relation recognition, sub‑graph retrieval and answer ranking, employing optimized BERT models, pre‑training tasks, and domain‑specific enhancements to boost response speed, conversion rates, and benchmark performance, while acknowledging remaining challenges in long‑tail and complex queries.

KBQAMeituanNLP

0 likes · 24 min read

Knowledge-based Question Answering (KBQA) System at Meituan: Design, Challenges, and Solutions

Meituan Technology Team

Oct 21, 2021 · Artificial Intelligence

Meituan's End-to-End Sentiment Analysis Technology and the ASAP Dataset

Meituan’s NLP Center introduced the ASAP dataset—the largest real‑world Chinese attribute‑level sentiment corpus—to date, and the article traces the progression from document‑level regression models upgraded with MT‑BERT, through multi‑task attribute‑level ABSA and opinion‑triplet extraction, to scalable real‑time and batch services, while outlining future transfer‑learning and few‑shot research.

MeituanNLPPretrained Models

0 likes · 25 min read

Meituan's End-to-End Sentiment Analysis Technology and the ASAP Dataset

DataFunTalk

Oct 20, 2021 · Artificial Intelligence

Building an Industry Chain Knowledge Graph: Theory, Architecture, and Key Methods

This article presents a comprehensive overview of constructing an industry‑chain knowledge graph for the financial sector, covering its theoretical background, architectural design, automated building pipeline, key NLP techniques, and practical applications such as visualization, IPO review, and investment analysis.

Industry ChainNLPfinancial technology

0 likes · 22 min read

Building an Industry Chain Knowledge Graph: Theory, Architecture, and Key Methods

Yuewen Technology

Oct 15, 2021 · Artificial Intelligence

How Yuedu's TTS Platform Automates High‑Quality Audiobook Production

This article explains how Yuedu's TTS synthesis platform tackles the booming audiobook market by using AI‑driven text preprocessing, role graph construction, content structuring, emotion and effect recognition, and a streamlined post‑processing workflow to efficiently generate multi‑character, emotionally rich audio books at scale.

Emotion RecognitionNLPTTS

0 likes · 13 min read

How Yuedu's TTS Platform Automates High‑Quality Audiobook Production

Amap Tech

Oct 14, 2021 · Artificial Intelligence

CCF Big Data & Computing Intelligence Contest – POI Name Generation Challenge

The 9th CCF Big Data & Computing Intelligence Contest partners with Gaode Map to launch a POI Name Generation challenge, requiring participants to fuse image, signboard detection, and OCR data to automatically produce accurate, fluent place names, with a ¥50,000 prize pool, weekly vouchers, and recruitment opportunities for global teams.

AINLPPOI

0 likes · 7 min read

CCF Big Data & Computing Intelligence Contest – POI Name Generation Challenge

DataFunTalk

Oct 12, 2021 · Artificial Intelligence

Intelligent Grading: Technical Exploration and Practice in AI‑Powered Education

This presentation by Tencent senior researcher Li Chao outlines the background, typical challenges, and multi‑layer technical solutions for intelligent grading in education, covering AI‑driven classroom, homework, review, and exam scenarios, multimodal spell‑checking, essay evaluation, and adaptive learning pipelines.

AIEducation TechnologyEssay Evaluation

0 likes · 25 min read

Intelligent Grading: Technical Exploration and Practice in AI‑Powered Education

DataFunTalk

Oct 12, 2021 · Artificial Intelligence

PaddleNLP v2.1 Release: Taskflow One‑Click NLP, Few‑Shot Learning Enhancements, and 28× Text Generation Acceleration

PaddleNLP v2.1 introduces an industrial‑grade Taskflow for eight NLP scenarios, a three‑line few‑shot learning paradigm that boosts small‑sample performance, and a FasterTransformer‑based inference engine that delivers up to 28‑fold speedup for text generation, all backed by extensive model and algorithm integrations.

NLPPaddleNLPfew-shot learning

0 likes · 7 min read

PaddleNLP v2.1 Release: Taskflow One‑Click NLP, Few‑Shot Learning Enhancements, and 28× Text Generation Acceleration

DataFunSummit

Oct 12, 2021 · Artificial Intelligence

Intelligent Grading: Technical Exploration and Practice in AI‑Powered Education

This article presents a comprehensive overview of AI‑driven intelligent grading technologies, covering background, typical educational challenges, multimodal NLP solutions for essay, spelling and grammar correction, adaptive learning, and related research, illustrating how deep learning and multimodal models improve automated assessment across K‑12 scenarios.

AIEducation TechnologyEssay Scoring

0 likes · 24 min read

58 Tech

Oct 12, 2021 · Artificial Intelligence

Seq2Seq Approaches for Phone Number Extraction from Two‑Speaker Voice Dialogues

This article presents a practical study of extracting phone numbers from two‑speaker voice dialogues using Seq2Seq models—including LSTM, GRU with attention and feature fusion, and Transformer—detailing data characteristics, model architectures, training strategies, experimental results, and comparative analysis showing the GRU‑Attention approach achieving the best performance.

GRULSTMNLP

0 likes · 13 min read

Seq2Seq Approaches for Phone Number Extraction from Two‑Speaker Voice Dialogues

DataFunSummit

Oct 2, 2021 · Artificial Intelligence

Joint Entity and Relation Extraction: Methods and Document‑Level Approaches

This presentation reviews the importance of entity‑relation extraction for knowledge‑graph construction, compares sentence‑level and complex contexts, and surveys joint extraction techniques—including sequence labeling, table filling, and seq2seq models—as well as document‑level graph‑based methods and future research directions.

NLPdocument-levelentity-relation extraction

0 likes · 15 min read

Joint Entity and Relation Extraction: Methods and Document‑Level Approaches

DataFunSummit

Sep 26, 2021 · Artificial Intelligence

Contrastive Learning and Its Applications in Weibo Content Representation

This article explains the fundamentals of contrastive learning, reviews typical models such as SimCLR, MoCo, SwAV, BYOL, SimSiam and Barlow Twins, and demonstrates how these methods are applied to Weibo text and multimodal (text‑image) representation tasks like hashtag generation and image‑text matching.

MultimodalNLPWeibo

0 likes · 18 min read

Contrastive Learning and Its Applications in Weibo Content Representation

DataFunTalk

Sep 12, 2021 · Artificial Intelligence

Overview of Pretraining Models and the UER‑py Framework for Natural Language Processing

This article reviews the background and evolution of pre‑training models in NLP, introduces classic models such as Skip‑thoughts, BERT, and T5, and details the modular UER‑py framework, its comparison with HuggingFace Transformers, available Chinese pre‑trained weights, and practical deployment workflows.

Language ModelsNLPTransformer

0 likes · 21 min read

Sohu Tech Products

Sep 1, 2021 · Artificial Intelligence

2021 Sohu Text Matching Competition: Model Design, Tricks, and Performance Analysis

This article details the authors' approach to the 2021 Sohu Text Matching competition, describing the task definition, data splits, model architectures (cross‑encoder and bi‑encoder), pretrained language models used, various training tricks, ensemble strategies, and the resulting evaluation scores.

AINLPPretrained Models

0 likes · 8 min read

2021 Sohu Text Matching Competition: Model Design, Tricks, and Performance Analysis

Ctrip Technology

Aug 26, 2021 · Artificial Intelligence

Applying Snorkel Weak Supervision to Automate Event Summaries in Ctrip Customer Service

The article explains how Ctrip’s hotel customer‑service team uses the Snorkel weak‑supervision framework to generate large‑scale labeled data for training models that automatically produce structured event summaries, detailing the workflow, labeling functions, generative and discriminative model training, and performance improvements.

Labeling FunctionsNLPSnorkel

0 likes · 14 min read

Applying Snorkel Weak Supervision to Automate Event Summaries in Ctrip Customer Service

DataFunSummit

Aug 21, 2021 · Artificial Intelligence

My Journey in Text2SQL Research: From Paper Reading to Winning a Global Competition

This article recounts the author's six‑month Text2SQL research experience, detailing how systematic paper reading, leveraging existing engineering solutions, and fully utilizing academic, human, and hardware resources led to a successful thesis, a patent, a paper, and a second‑place finish in Yale's global Text2SQL competition.

AINLPPaper Reading

0 likes · 9 min read

My Journey in Text2SQL Research: From Paper Reading to Winning a Global Competition

Meituan Technology Team

Aug 19, 2021 · Artificial Intelligence

Few-Shot Learning Methods and Applications in Meituan NLP

Meituan’s NLP team leverages few‑shot learning—using data‑augmentation, semi‑supervised, ensemble/self‑training, and domain‑adaptation techniques—to cut annotation costs, achieving 1–2 percentage‑point accuracy gains on internal benchmarks and deploying high‑performing models for tasks such as topic classification, fake‑review detection, and sentiment analysis, while planning broader platform and model extensions.

Active LearningNLPSemi-supervised Learning

0 likes · 29 min read

Few-Shot Learning Methods and Applications in Meituan NLP

Meituan Technology Team

Aug 5, 2021 · Artificial Intelligence

Overview of Meituan's ACL 2021 Accepted Papers

Meituan’s 2021 ACL contributions comprise seven accepted papers—six long and one short—introducing novel approaches to event argument decoding, cross‑domain slot transfer, contrastive out‑of‑domain detection, novel slot discovery, self‑supervised sentence representation, unsupervised semantic parsing, and pseudo‑query‑enhanced dense retrieval, inviting further research and collaboration.

ACLEvent ExtractionMeituan

0 likes · 22 min read

Overview of Meituan's ACL 2021 Accepted Papers

Sohu Tech Products

Aug 4, 2021 · Artificial Intelligence

Technical Summary of the 2021 Sohu Campus Text Matching Algorithm Competition

This article presents a comprehensive technical summary of the 2021 Sohu Campus Text Matching Algorithm Competition, detailing data characteristics, preprocessing strategies, tokenization choices, positional encoding methods, model architectures using relative encodings such as WoBERT and RoFormer, experimental results, and reflections on future improvements.

Model DesignMulti-Task LearningNLP

0 likes · 9 min read

Technical Summary of the 2021 Sohu Campus Text Matching Algorithm Competition

DataFunSummit

Aug 3, 2021 · Artificial Intelligence

Content Understanding for Personalized Recommendation: Interest Graph, Concept Mining, and Semantic Matching at Tencent

The article explains how Tencent addresses the limitations of traditional content understanding methods in personalized recommendation by introducing an interest‑graph framework that combines classification, concept, entity, and event layers, and details the associated mining, matching, and online evaluation techniques.

EmbeddingNLPcontent understanding

0 likes · 13 min read

Content Understanding for Personalized Recommendation: Interest Graph, Concept Mining, and Semantic Matching at Tencent

DataFunTalk

Jul 30, 2021 · Artificial Intelligence

Fundamentals of Natural Language Processing: Language Models, Smoothing, and Basic Tasks

This article provides a comprehensive overview of natural language processing fundamentals, covering the challenges of language modeling, N‑gram and Markov assumptions, smoothing techniques such as discounting and add‑one, evaluation via perplexity, basic tasks like Chinese word segmentation, subword tokenization, POS tagging, syntactic and semantic parsing, and a range of downstream applications including information extraction, sentiment analysis, question answering, machine translation, and dialogue systems.

AILanguage ModelNLP

0 likes · 29 min read

Fundamentals of Natural Language Processing: Language Models, Smoothing, and Basic Tasks

iQIYI Technical Product Team

Jul 30, 2021 · Artificial Intelligence

iQIYI Search Ranking Algorithm Practice – NLP and Search Integration

At iQIYI’s iTech Conference, Zhang Zhigang detailed a full‑stack search ranking system that combines NLP‑driven query analysis, hierarchical indexing, multi‑stage coarse‑to‑fine ranking, Transformer‑based re‑ranking, sparse‑feature DNN enhancements and LIME/SE‑Block explainability, delivering measurable gains in CTR and NDCG for the platform’s video search.

Information RetrievalNLPiQIYI

0 likes · 20 min read

iQIYI Search Ranking Algorithm Practice – NLP and Search Integration

Ctrip Technology

Jul 29, 2021 · Artificial Intelligence

NLP Techniques for Classifying Ctrip Ticket Customer Service Conversations

This article presents the background, problem analysis, data preprocessing, modeling approaches and optimization results of applying various NLP methods—including statistical models, word embeddings, attention mechanisms and pretrained language models such as BERT—to improve the accuracy of classifying Ctrip ticket customer service dialogues.

BERTDeep LearningNLP

0 likes · 13 min read

NLP Techniques for Classifying Ctrip Ticket Customer Service Conversations

DataFunTalk

Jul 22, 2021 · Artificial Intelligence

Joint Entity and Relation Extraction: Methods, Challenges, and Document‑Level Approaches

This article reviews the fundamentals of entity‑relation extraction, surveys joint extraction techniques such as sequence labeling, table‑filling and seq2seq models, discusses document‑level graph‑based methods, highlights experimental findings, and outlines future research directions in knowledge‑graph construction.

Graph Neural NetworksNLPdocument-level

0 likes · 17 min read

Joint Entity and Relation Extraction: Methods, Challenges, and Document‑Level Approaches

Meituan Technology Team

Jul 15, 2021 · Artificial Intelligence

Local Life Comprehensive Demand Knowledge Graph: Design, Algorithms, and Applications

The Local Life Comprehensive Demand Knowledge Graph (GENE) reorients Meituan’s supply‑demand matching by building a multi‑layer, user‑centric graph that captures intent and consideration, employing BERT, Word2Vec, ELECTRA, and reinforcement‑learning models to generate concrete and scene‑based demand nodes, now powering parent‑child, leisure, medical‑beauty, and education services.

AIDemand ModelingNLP

0 likes · 34 min read

Local Life Comprehensive Demand Knowledge Graph: Design, Algorithms, and Applications

DataFunTalk

Jul 13, 2021 · Artificial Intelligence

NLP‑Driven Scenario Tagging and Experience Management Platform for Douyin App

This article describes how Douyin built an AI‑powered feedback management platform that uses NLP to automatically tag and cluster user comments, maps them to business scenarios, defines quantitative experience metrics, and creates a closed‑loop system for rapid problem discovery and product improvement.

AIDouyinNLP

0 likes · 15 min read

NLP‑Driven Scenario Tagging and Experience Management Platform for Douyin App

Python Crawling & Data Mining

Jul 13, 2021 · Artificial Intelligence

What Python Reveals About Public Reaction to “Chinese Doctor” – A Data‑Driven Review Analysis

Using Python, the author collected and visualized Douban and Weibo comments on the film “Chinese Doctor,” showing rating distributions, keyword word clouds, and fan sentiment, and compares the 2021 movie with the 2019 documentary to illustrate audience perception of the pandemic drama.

NLPPythondata analysis

0 likes · 5 min read

What Python Reveals About Public Reaction to “Chinese Doctor” – A Data‑Driven Review Analysis

DataFunTalk

Jul 8, 2021 · Artificial Intelligence

Baidu ERNIE 3.0: Knowledge‑Enhanced 100B‑Parameter Model Sets New Chinese NLP Benchmarks and Tops SuperGLUE

Baidu's ERNIE 3.0 introduces a 100‑billion‑parameter, knowledge‑graph‑augmented language model that breaks 54 Chinese NLP benchmarks, achieves human‑level performance on SuperGLUE, and demonstrates strong generation and zero‑shot capabilities, now available for public demo and research.

BaiduERNIE 3.0Large Language Model

0 likes · 7 min read

Baidu ERNIE 3.0: Knowledge‑Enhanced 100B‑Parameter Model Sets New Chinese NLP Benchmarks and Tops SuperGLUE