Tagged articles

Chinese NLP

23 articles · Page 1 of 1

Apr 26, 2026 · Artificial Intelligence

Embedding Explained: How Vectorization Turns Text into Numbers for RAG

This article walks through why traditional keyword matching fails for RAG, explains the evolution from one‑hot encoding to Word2Vec and BERT, details sentence‑level embeddings and similarity metrics, compares leading Chinese and multilingual embedding models using the C‑MTEB benchmark, and provides practical LangChain code, deployment tips, and common pitfalls.

Chinese NLPEmbeddingLangChain

0 likes · 18 min read

Embedding Explained: How Vectorization Turns Text into Numbers for RAG

Lao Guo's Learning Space

Mar 29, 2026 · Artificial Intelligence

Top Free Large Language Models for OpenClaw (March 2026) – Ranked by Cost, Chinese Support, Stability, and API Ease

This guide evaluates and ranks the most useful free large language models as of March 2026, comparing domestic and international options on free quota, Chinese capability, stability, and API friendliness, and provides ready‑to‑copy OpenClaw configuration commands with practical usage tips.

API ConfigurationChinese NLPDomestic Models

0 likes · 10 min read

Top Free Large Language Models for OpenClaw (March 2026) – Ranked by Cost, Chinese Support, Stability, and API Ease

DataFunSummit

Oct 27, 2023 · Artificial Intelligence

ChatGPT Technology, Domesticization Attempts, and Open‑Source Large Models

This article reviews the evolution and challenges of ChatGPT technology, describes the authors' efforts to localize and commercialize the model for the Chinese market, and introduces their open‑source Chinese large‑model initiative, including training methods, performance gaps, and future improvement directions.

ChatGPTChinese NLPLarge Language Models

0 likes · 11 min read

ChatGPT Technology, Domesticization Attempts, and Open‑Source Large Models

Model Perspective

Sep 11, 2023 · Artificial Intelligence

Why Chinese Word Segmentation Matters: Techniques, Challenges, and Python Demo

This article explores Chinese word segmentation, illustrating its linguistic nuances with a humorous example, explains key methods—including dictionary‑based, statistical, and deep‑learning approaches—and provides Python code using a simple dictionary algorithm and the popular jieba library to demonstrate practical implementation.

Chinese NLPPythonWord Segmentation

0 likes · 6 min read

Why Chinese Word Segmentation Matters: Techniques, Challenges, and Python Demo

Baidu Tech Salon

Aug 8, 2023 · Artificial Intelligence

Tsinghua University Report Ranks Baidu Wenxin Yiyan First Among Chinese Large Language Models

A Tsinghua University evaluation of seven large language models found Baidu’s Wenxin Yiyan topping the domestic rankings with the highest overall score across 20 metrics—especially Chinese semantic understanding and safety—surpassing ChatGPT and tying GPT‑4, while also demonstrating rapid training, inference speed, and broad industry adoption.

AI evaluationBaidu WenxinChinese NLP

0 likes · 4 min read

Tsinghua University Report Ranks Baidu Wenxin Yiyan First Among Chinese Large Language Models

Alibaba Cloud Big Data AI Platform

Jul 13, 2023 · Artificial Intelligence

Rapid Diffusion: Fast, Domain‑Specific Text‑to‑Image Generation for Chinese

Rapid Diffusion introduces a knowledge‑enhanced, high‑speed Chinese text‑to‑image diffusion model with one‑click deployment, achieving superior image quality and up to 1.73× faster inference through FlashAttention and BladeDISC optimizations, and demonstrates strong performance across e‑commerce, traditional painting, and food datasets.

Chinese NLPdiffusion modelfast inference

0 likes · 12 min read

Rapid Diffusion: Fast, Domain‑Specific Text‑to‑Image Generation for Chinese

21CTO

Apr 24, 2023 · Artificial Intelligence

Inside MOSS 003: Fudan University's Open-Source Large Language Model

This article details the evolution of Fudan University's open‑source MOSS series—from the early OpenChat 001 prototype to the current MOSS 003—covering data collection, multilingual capabilities, plugin architecture, model releases on HuggingFace, and how developers can start using the models.

AIChinese NLPMOSS

0 likes · 10 min read

Inside MOSS 003: Fudan University's Open-Source Large Language Model

Smart Era Software Development

Mar 17, 2023 · Artificial Intelligence

Wenxin Yiyan vs GPT-4: Live Demo Shows Baidu’s New AI Model in Action

The article presents a side‑by‑side demonstration of Baidu’s newly released Wenxin Yiyan and OpenAI’s GPT‑4 across literary creation, business copywriting, mathematical reasoning, Chinese idiom interpretation, acrostic poetry, and multimodal generation, then explains the underlying six‑core technologies and Baidu’s hardware‑cloud strategy while reporting audience reactions.

Chinese NLPERNIEGPT-4

0 likes · 11 min read

Wenxin Yiyan vs GPT-4: Live Demo Shows Baidu’s New AI Model in Action

Alibaba Cloud Big Data AI Platform

Oct 19, 2022 · Artificial Intelligence

How CKBERT Boosts Chinese NLP with Knowledge‑Enhanced Pretraining

CKBERT, a Chinese knowledge‑enhanced BERT developed by Alibaba’s EasyNLP team, integrates external knowledge graphs and internal linguistic cues through novel pre‑training tasks, offers three model sizes compatible with HuggingFace and PAI, and demonstrates superior performance on CLUE and NER benchmarks while providing easy deployment on cloud platforms.

CKBERTChinese NLPEasyNLP

0 likes · 40 min read

How CKBERT Boosts Chinese NLP with Knowledge‑Enhanced Pretraining

Alibaba Cloud Big Data AI Platform

Jul 29, 2022 · Artificial Intelligence

Unlock Chinese Text-to-Image Generation with EasyNLP’s Open‑Source Models

This article introduces EasyNLP’s newly integrated Chinese text‑to‑image generation framework, explains the underlying Transformer‑VQGAN architecture, provides model specifications, code snippets, performance benchmarks on multiple datasets, and step‑by‑step tutorials for fine‑tuning and inference using open‑source checkpoints.

AI generationChinese NLPEasyNLP

0 likes · 20 min read

Unlock Chinese Text-to-Image Generation with EasyNLP’s Open‑Source Models

DataFunSummit

May 9, 2022 · Artificial Intelligence

TextToKnowledge (解语): Zero‑Shot Chinese Text Knowledge Annotation and Mining Framework

The article introduces TextToKnowledge, an open‑source Baidu platform that provides a unified Chinese term taxonomy (TermTree) and two annotation tools (WordTag and NPTag) to enable zero‑sample text labeling, term‑linking, and downstream knowledge‑mining applications for various NLP tasks.

Chinese NLPPaddleNLPTermTree

0 likes · 25 min read

TextToKnowledge (解语): Zero‑Shot Chinese Text Knowledge Annotation and Mining Framework

DataFunTalk

Jun 8, 2021 · Artificial Intelligence

CCKS 2021 Life Service Domain Knowledge Graph Question Answering Competition

The CCKS 2021 competition invites researchers to develop Chinese Knowledge Base Question Answering systems that leverage a life‑service knowledge graph from Meituan, offering detailed task description, dataset information, registration procedures, timelines, and prize incentives.

CCKS2021Chinese NLPKBQA

0 likes · 5 min read

CCKS 2021 Life Service Domain Knowledge Graph Question Answering Competition

DataFunSummit

Mar 30, 2021 · Artificial Intelligence

Chinese Short‑Text Entity Linking: Model Design, Multitask Learning, and Experimental Results on the Qianyan Dataset

This article presents a comprehensive approach to Chinese short‑text entity linking, describing the Qianyan dataset, pipeline and end‑to‑end task formulations, sample construction, a multitask model that jointly performs entity ranking and NIL classification, various optimization techniques including confidence learning and adversarial training, and detailed experimental analysis showing state‑of‑the‑art performance.

Chinese NLPadversarial trainingconfidence learning

0 likes · 13 min read

Chinese Short‑Text Entity Linking: Model Design, Multitask Learning, and Experimental Results on the Qianyan Dataset

Tencent Cloud Developer

Jul 8, 2020 · Artificial Intelligence

Graph-Based Chinese Word Embedding (AlphaEmbedding) for Improved Text Matching

AlphaEmbedding builds a weighted graph linking Chinese words, sub‑words, characters and pinyin, then uses random‑walk‑based node2vec training to produce embeddings that capture orthographic and phonetic similarity, markedly improving recall and ranking for homophones, typos and OOV terms in enterprise search.

Chinese NLPgraph computingsemantic similarity

0 likes · 17 min read

Graph-Based Chinese Word Embedding (AlphaEmbedding) for Improved Text Matching

Xueersi Online School Tech Team

Jan 17, 2020 · Artificial Intelligence

Fine‑tuning BERT for Sentence Pair Similarity in an Online Education Platform

This article describes how a BERT‑based model is fine‑tuned to compute sentence‑pair similarity for improving recommendation accuracy in an online school, detailing the architecture, training mechanisms, code implementation, experimental results, and future extensions such as sentiment analysis.

BERTChinese NLPDeep Learning

0 likes · 20 min read

Fine‑tuning BERT for Sentence Pair Similarity in an Online Education Platform

Tencent Cloud Developer

Apr 24, 2019 · Artificial Intelligence

Chinese Text Sentiment Classification Using Multi‑layer LSTM: Data Preparation, Model Architecture, and Business Applications

The article details a practical workflow for Chinese sentiment classification in Tencent’s Goose Man product, covering data preparation, word‑segmentation challenges, a six‑layer multi‑LSTM architecture with word embeddings, training results achieving roughly 96 % accuracy, and its deployment for automatic detection of misleading and high‑impact user reviews.

Chinese NLPDeep LearningKeras

0 likes · 23 min read

Chinese Text Sentiment Classification Using Multi‑layer LSTM: Data Preparation, Model Architecture, and Business Applications

DataFunTalk

Nov 24, 2018 · Artificial Intelligence

Comprehensive Guide to Fine‑Tuning BERT on Chinese Datasets

This article provides a step‑by‑step guide for fine‑tuning Google’s open‑source BERT on Chinese datasets, covering model download, processor customization, code examples, training commands, and insights into the underlying TensorFlow estimator architecture and deployment considerations.

BERTChinese NLPTensorFlow

0 likes · 11 min read

Comprehensive Guide to Fine‑Tuning BERT on Chinese Datasets

Alibaba Cloud Developer

Oct 16, 2018 · Artificial Intelligence

Boosting Chinese NER Accuracy with Crowdsourced Data and Adversarial Learning

This paper proposes a Chinese named entity recognition method that leverages noisy crowdsourced annotations through adversarial training with dual BiLSTM modules, demonstrating consistent F1 improvements on dialogue and e‑commerce datasets.

BiLSTMCRFChinese NLP

0 likes · 8 min read

Boosting Chinese NER Accuracy with Crowdsourced Data and Adversarial Learning

MaGe Linux Operations

Oct 1, 2018 · Fundamentals

Turning 1,000 Douban Movie Reviews into a Chinese Word Cloud with MongoDB & Jieba

This article demonstrates how to extract 1,000 short movie reviews stored in MongoDB, apply Chinese word segmentation using Jieba, select the top 50 terms, generate a visual word cloud, and perform additional analyses such as top‑liked comments and 15‑day comment volume trends.

Chinese NLPMongoDBword cloud

0 likes · 7 min read

Turning 1,000 Douban Movie Reviews into a Chinese Word Cloud with MongoDB & Jieba

Alibaba Cloud Developer

Apr 25, 2018 · Artificial Intelligence

How cw2vec Beats Word2Vec: Leveraging Chinese Stroke N‑grams for Superior Word Embeddings

This article introduces cw2vec, a novel Chinese word‑embedding algorithm that exploits stroke‑level subword information, outlines its theoretical foundations, compares it with word2vec, GloVe, CWE and other models on multiple benchmarks, and demonstrates its superior performance across word similarity, analogy, text classification and named‑entity recognition tasks.

Chinese NLPDeep Learningcw2vec

0 likes · 14 min read

How cw2vec Beats Word2Vec: Leveraging Chinese Stroke N‑grams for Superior Word Embeddings

AntTech

Jan 18, 2018 · Artificial Intelligence

cw2vec: Learning Chinese Word Embeddings with Stroke n-grams

The cw2vec paper, presented at AAAI 2018, introduces a Chinese word embedding method that leverages stroke n‑grams to capture character semantics, proposes a novel loss function, demonstrates consistent improvements over existing models across similarity, analogy, classification and NER tasks, and discusses real‑world AI applications.

AAAI 2018AI researchChinese NLP

0 likes · 7 min read

cw2vec: Learning Chinese Word Embeddings with Stroke n-grams

MaGe Linux Operations

Apr 9, 2017 · Artificial Intelligence

How to Install and Fix WordCloud in Python for Chinese Text Visualization

This guide walks you through installing the Python WordCloud library, resolving common compilation errors, handling Chinese font encoding issues, and creating basic and image‑masked word clouds, complete with code snippets and troubleshooting tips for smooth visualization of Chinese text data.

Chinese NLPPythonjieba

0 likes · 4 min read

How to Install and Fix WordCloud in Python for Chinese Text Visualization

Meituan Technology Team

Dec 18, 2014 · Artificial Intelligence

Auto-Label Missing POI Categories Using Naive Bayes and Feature Selection

This article details a step‑by‑step machine‑learning pipeline that transforms over one million calibrated POI records into feature vectors, selects discriminative terms via information‑gain and domain rules, trains a Naive Bayes classifier, and achieves 91% accuracy with 84% coverage on unseen POI data.

Chinese NLPNaive BayesPOI classification

0 likes · 12 min read

Auto-Label Missing POI Categories Using Naive Bayes and Feature Selection