Tagged articles
22 articles
Page 1 of 1
AI Architect Hub
AI Architect Hub
Apr 26, 2026 · Artificial Intelligence

Embedding Explained: How Vectorization Turns Text into Numbers for RAG

This article walks through why traditional keyword matching fails for RAG, explains the evolution from one‑hot encoding to Word2Vec and BERT, details sentence‑level embeddings and similarity metrics, compares leading Chinese and multilingual embedding models using the C‑MTEB benchmark, and provides practical LangChain code, deployment tips, and common pitfalls.

Chinese NLPEmbeddingLangChain
0 likes · 18 min read
Embedding Explained: How Vectorization Turns Text into Numbers for RAG
Lao Guo's Learning Space
Lao Guo's Learning Space
Mar 29, 2026 · Artificial Intelligence

Top Free Large Language Models for OpenClaw (March 2026) – Ranked by Cost, Chinese Support, Stability, and API Ease

This guide evaluates and ranks the most useful free large language models as of March 2026, comparing domestic and international options on free quota, Chinese capability, stability, and API friendliness, and provides ready‑to‑copy OpenClaw configuration commands with practical usage tips.

API ConfigurationChinese NLPDomestic Models
0 likes · 10 min read
Top Free Large Language Models for OpenClaw (March 2026) – Ranked by Cost, Chinese Support, Stability, and API Ease
DataFunSummit
DataFunSummit
Oct 27, 2023 · Artificial Intelligence

ChatGPT Technology, Domesticization Attempts, and Open‑Source Large Models

This article reviews the evolution and challenges of ChatGPT technology, describes the authors' efforts to localize and commercialize the model for the Chinese market, and introduces their open‑source Chinese large‑model initiative, including training methods, performance gaps, and future improvement directions.

ChatGPTChinese NLPLarge Language Models
0 likes · 11 min read
ChatGPT Technology, Domesticization Attempts, and Open‑Source Large Models
Model Perspective
Model Perspective
Sep 11, 2023 · Artificial Intelligence

Why Chinese Word Segmentation Matters: Techniques, Challenges, and Python Demo

This article explores Chinese word segmentation, illustrating its linguistic nuances with a humorous example, explains key methods—including dictionary‑based, statistical, and deep‑learning approaches—and provides Python code using a simple dictionary algorithm and the popular jieba library to demonstrate practical implementation.

Chinese NLPPythonjieba
0 likes · 6 min read
Why Chinese Word Segmentation Matters: Techniques, Challenges, and Python Demo
Baidu Tech Salon
Baidu Tech Salon
Aug 8, 2023 · Artificial Intelligence

Tsinghua University Report Ranks Baidu Wenxin Yiyan First Among Chinese Large Language Models

A Tsinghua University evaluation of seven large language models found Baidu’s Wenxin Yiyan topping the domestic rankings with the highest overall score across 20 metrics—especially Chinese semantic understanding and safety—surpassing ChatGPT and tying GPT‑4, while also demonstrating rapid training, inference speed, and broad industry adoption.

AI EvaluationBaidu WenxinChinese NLP
0 likes · 4 min read
Tsinghua University Report Ranks Baidu Wenxin Yiyan First Among Chinese Large Language Models
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jul 13, 2023 · Artificial Intelligence

Rapid Diffusion: Fast, Domain‑Specific Text‑to‑Image Generation for Chinese

Rapid Diffusion introduces a knowledge‑enhanced, high‑speed Chinese text‑to‑image diffusion model with one‑click deployment, achieving superior image quality and up to 1.73× faster inference through FlashAttention and BladeDISC optimizations, and demonstrates strong performance across e‑commerce, traditional painting, and food datasets.

Chinese NLPKnowledge Enhancementdiffusion model
0 likes · 12 min read
Rapid Diffusion: Fast, Domain‑Specific Text‑to‑Image Generation for Chinese
21CTO
21CTO
Apr 24, 2023 · Artificial Intelligence

Inside MOSS 003: Fudan University's Open-Source Large Language Model

This article details the evolution of Fudan University's open‑source MOSS series—from the early OpenChat 001 prototype to the current MOSS 003—covering data collection, multilingual capabilities, plugin architecture, model releases on HuggingFace, and how developers can start using the models.

AIChinese NLPMOSS
0 likes · 10 min read
Inside MOSS 003: Fudan University's Open-Source Large Language Model
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Oct 19, 2022 · Artificial Intelligence

How CKBERT Boosts Chinese NLP with Knowledge‑Enhanced Pretraining

CKBERT, a Chinese knowledge‑enhanced BERT developed by Alibaba’s EasyNLP team, integrates external knowledge graphs and internal linguistic cues through novel pre‑training tasks, offers three model sizes compatible with HuggingFace and PAI, and demonstrates superior performance on CLUE and NER benchmarks while providing easy deployment on cloud platforms.

CKBERTChinese NLPEasyNLP
0 likes · 40 min read
How CKBERT Boosts Chinese NLP with Knowledge‑Enhanced Pretraining
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jul 29, 2022 · Artificial Intelligence

Unlock Chinese Text-to-Image Generation with EasyNLP’s Open‑Source Models

This article introduces EasyNLP’s newly integrated Chinese text‑to‑image generation framework, explains the underlying Transformer‑VQGAN architecture, provides model specifications, code snippets, performance benchmarks on multiple datasets, and step‑by‑step tutorials for fine‑tuning and inference using open‑source checkpoints.

AI GenerationChinese NLPEasyNLP
0 likes · 20 min read
Unlock Chinese Text-to-Image Generation with EasyNLP’s Open‑Source Models
DataFunTalk
DataFunTalk
Jun 8, 2021 · Artificial Intelligence

CCKS 2021 Life Service Domain Knowledge Graph Question Answering Competition

The CCKS 2021 competition invites researchers to develop Chinese Knowledge Base Question Answering systems that leverage a life‑service knowledge graph from Meituan, offering detailed task description, dataset information, registration procedures, timelines, and prize incentives.

CCKS2021Chinese NLPKBQA
0 likes · 5 min read
CCKS 2021 Life Service Domain Knowledge Graph Question Answering Competition
DataFunSummit
DataFunSummit
Mar 30, 2021 · Artificial Intelligence

Chinese Short‑Text Entity Linking: Model Design, Multitask Learning, and Experimental Results on the Qianyan Dataset

This article presents a comprehensive approach to Chinese short‑text entity linking, describing the Qianyan dataset, pipeline and end‑to‑end task formulations, sample construction, a multitask model that jointly performs entity ranking and NIL classification, various optimization techniques including confidence learning and adversarial training, and detailed experimental analysis showing state‑of‑the‑art performance.

Chinese NLPadversarial trainingconfidence learning
0 likes · 13 min read
Chinese Short‑Text Entity Linking: Model Design, Multitask Learning, and Experimental Results on the Qianyan Dataset
Tencent Cloud Developer
Tencent Cloud Developer
Jul 8, 2020 · Artificial Intelligence

Graph-Based Chinese Word Embedding (AlphaEmbedding) for Improved Text Matching

AlphaEmbedding builds a weighted graph linking Chinese words, sub‑words, characters and pinyin, then uses random‑walk‑based node2vec training to produce embeddings that capture orthographic and phonetic similarity, markedly improving recall and ranking for homophones, typos and OOV terms in enterprise search.

Chinese NLPgraph computingsemantic similarity
0 likes · 17 min read
Graph-Based Chinese Word Embedding (AlphaEmbedding) for Improved Text Matching
Xueersi Online School Tech Team
Xueersi Online School Tech Team
Jan 17, 2020 · Artificial Intelligence

Fine‑tuning BERT for Sentence Pair Similarity in an Online Education Platform

This article describes how a BERT‑based model is fine‑tuned to compute sentence‑pair similarity for improving recommendation accuracy in an online school, detailing the architecture, training mechanisms, code implementation, experimental results, and future extensions such as sentiment analysis.

BERTChinese NLPDeep Learning
0 likes · 20 min read
Fine‑tuning BERT for Sentence Pair Similarity in an Online Education Platform
Tencent Cloud Developer
Tencent Cloud Developer
Apr 24, 2019 · Artificial Intelligence

Chinese Text Sentiment Classification Using Multi‑layer LSTM: Data Preparation, Model Architecture, and Business Applications

The article details a practical workflow for Chinese sentiment classification in Tencent’s Goose Man product, covering data preparation, word‑segmentation challenges, a six‑layer multi‑LSTM architecture with word embeddings, training results achieving roughly 96 % accuracy, and its deployment for automatic detection of misleading and high‑impact user reviews.

Chinese NLPDeep LearningKeras
0 likes · 23 min read
Chinese Text Sentiment Classification Using Multi‑layer LSTM: Data Preparation, Model Architecture, and Business Applications
DataFunTalk
DataFunTalk
Nov 24, 2018 · Artificial Intelligence

Comprehensive Guide to Fine‑Tuning BERT on Chinese Datasets

This article provides a step‑by‑step guide for fine‑tuning Google’s open‑source BERT on Chinese datasets, covering model download, processor customization, code examples, training commands, and insights into the underlying TensorFlow estimator architecture and deployment considerations.

BERTChinese NLPFine-tuning
0 likes · 11 min read
Comprehensive Guide to Fine‑Tuning BERT on Chinese Datasets
Alibaba Cloud Developer
Alibaba Cloud Developer
Apr 25, 2018 · Artificial Intelligence

How cw2vec Beats Word2Vec: Leveraging Chinese Stroke N‑grams for Superior Word Embeddings

This article introduces cw2vec, a novel Chinese word‑embedding algorithm that exploits stroke‑level subword information, outlines its theoretical foundations, compares it with word2vec, GloVe, CWE and other models on multiple benchmarks, and demonstrates its superior performance across word similarity, analogy, text classification and named‑entity recognition tasks.

Chinese NLPDeep LearningUnsupervised Learning
0 likes · 14 min read
How cw2vec Beats Word2Vec: Leveraging Chinese Stroke N‑grams for Superior Word Embeddings
AntTech
AntTech
Jan 18, 2018 · Artificial Intelligence

cw2vec: Learning Chinese Word Embeddings with Stroke n-grams

The cw2vec paper, presented at AAAI 2018, introduces a Chinese word embedding method that leverages stroke n‑grams to capture character semantics, proposes a novel loss function, demonstrates consistent improvements over existing models across similarity, analogy, classification and NER tasks, and discusses real‑world AI applications.

AAAI 2018AI researchChinese NLP
0 likes · 7 min read
cw2vec: Learning Chinese Word Embeddings with Stroke n-grams
MaGe Linux Operations
MaGe Linux Operations
Apr 9, 2017 · Artificial Intelligence

How to Install and Fix WordCloud in Python for Chinese Text Visualization

This guide walks you through installing the Python WordCloud library, resolving common compilation errors, handling Chinese font encoding issues, and creating basic and image‑masked word clouds, complete with code snippets and troubleshooting tips for smooth visualization of Chinese text data.

Chinese NLPPythonjieba
0 likes · 4 min read
How to Install and Fix WordCloud in Python for Chinese Text Visualization
Meituan Technology Team
Meituan Technology Team
Dec 18, 2014 · Artificial Intelligence

Auto-Label Missing POI Categories Using Naive Bayes and Feature Selection

This article details a step‑by‑step machine‑learning pipeline that transforms over one million calibrated POI records into feature vectors, selects discriminative terms via information‑gain and domain rules, trains a Naive Bayes classifier, and achieves 91% accuracy with 84% coverage on unseen POI data.

Chinese NLPNaive BayesPOI classification
0 likes · 12 min read
Auto-Label Missing POI Categories Using Naive Bayes and Feature Selection