Tag

Chinese NLP

1 views collected around this technical thread.

DataFunSummit
DataFunSummit
Oct 27, 2023 · Artificial Intelligence

ChatGPT Technology, Domesticization Attempts, and Open‑Source Large Models

This article reviews the evolution and challenges of ChatGPT technology, describes the authors' efforts to localize and commercialize the model for the Chinese market, and introduces their open‑source Chinese large‑model initiative, including training methods, performance gaps, and future improvement directions.

ChatGPTChinese NLPModel Localization
0 likes · 11 min read
ChatGPT Technology, Domesticization Attempts, and Open‑Source Large Models
Model Perspective
Model Perspective
Sep 11, 2023 · Artificial Intelligence

Why Chinese Word Segmentation Matters: Techniques, Challenges, and Python Demo

This article explores Chinese word segmentation, illustrating its linguistic nuances with a humorous example, explains key methods—including dictionary‑based, statistical, and deep‑learning approaches—and provides Python code using a simple dictionary algorithm and the popular jieba library to demonstrate practical implementation.

Chinese NLPNatural Language Processingjieba
0 likes · 6 min read
Why Chinese Word Segmentation Matters: Techniques, Challenges, and Python Demo
Baidu Tech Salon
Baidu Tech Salon
Aug 8, 2023 · Artificial Intelligence

Tsinghua University Report Ranks Baidu Wenxin Yiyan First Among Chinese Large Language Models

A Tsinghua University evaluation of seven large language models found Baidu’s Wenxin Yiyan topping the domestic rankings with the highest overall score across 20 metrics—especially Chinese semantic understanding and safety—surpassing ChatGPT and tying GPT‑4, while also demonstrating rapid training, inference speed, and broad industry adoption.

AI evaluationBaidu WenxinChinese NLP
0 likes · 4 min read
Tsinghua University Report Ranks Baidu Wenxin Yiyan First Among Chinese Large Language Models
DataFunSummit
DataFunSummit
May 9, 2022 · Artificial Intelligence

TextToKnowledge (解语): Zero‑Shot Chinese Text Knowledge Annotation and Mining Framework

The article introduces TextToKnowledge, an open‑source Baidu platform that provides a unified Chinese term taxonomy (TermTree) and two annotation tools (WordTag and NPTag) to enable zero‑sample text labeling, term‑linking, and downstream knowledge‑mining applications for various NLP tasks.

Chinese NLPKnowledge GraphPaddleNLP
0 likes · 25 min read
TextToKnowledge (解语): Zero‑Shot Chinese Text Knowledge Annotation and Mining Framework
DataFunTalk
DataFunTalk
Jun 8, 2021 · Artificial Intelligence

CCKS 2021 Life Service Domain Knowledge Graph Question Answering Competition

The CCKS 2021 competition invites researchers to develop Chinese Knowledge Base Question Answering systems that leverage a life‑service knowledge graph from Meituan, offering detailed task description, dataset information, registration procedures, timelines, and prize incentives.

CCKS2021Chinese NLPKBQA
0 likes · 5 min read
CCKS 2021 Life Service Domain Knowledge Graph Question Answering Competition
DataFunSummit
DataFunSummit
Mar 30, 2021 · Artificial Intelligence

Chinese Short‑Text Entity Linking: Model Design, Multitask Learning, and Experimental Results on the Qianyan Dataset

This article presents a comprehensive approach to Chinese short‑text entity linking, describing the Qianyan dataset, pipeline and end‑to‑end task formulations, sample construction, a multitask model that jointly performs entity ranking and NIL classification, various optimization techniques including confidence learning and adversarial training, and detailed experimental analysis showing state‑of‑the‑art performance.

Chinese NLPadversarial trainingconfidence learning
0 likes · 13 min read
Chinese Short‑Text Entity Linking: Model Design, Multitask Learning, and Experimental Results on the Qianyan Dataset
Tencent Cloud Developer
Tencent Cloud Developer
Jul 8, 2020 · Artificial Intelligence

Graph-Based Chinese Word Embedding (AlphaEmbedding) for Improved Text Matching

AlphaEmbedding builds a weighted graph linking Chinese words, sub‑words, characters and pinyin, then uses random‑walk‑based node2vec training to produce embeddings that capture orthographic and phonetic similarity, markedly improving recall and ranking for homophones, typos and OOV terms in enterprise search.

Chinese NLPGraph ComputingText Matching
0 likes · 17 min read
Graph-Based Chinese Word Embedding (AlphaEmbedding) for Improved Text Matching
Xueersi Online School Tech Team
Xueersi Online School Tech Team
Jan 17, 2020 · Artificial Intelligence

Fine‑tuning BERT for Sentence Pair Similarity in an Online Education Platform

This article describes how a BERT‑based model is fine‑tuned to compute sentence‑pair similarity for improving recommendation accuracy in an online school, detailing the architecture, training mechanisms, code implementation, experimental results, and future extensions such as sentiment analysis.

BERTChinese NLPFine-tuning
0 likes · 20 min read
Fine‑tuning BERT for Sentence Pair Similarity in an Online Education Platform
Tencent Cloud Developer
Tencent Cloud Developer
Apr 24, 2019 · Artificial Intelligence

Chinese Text Sentiment Classification Using Multi‑layer LSTM: Data Preparation, Model Architecture, and Business Applications

The article details a practical workflow for Chinese sentiment classification in Tencent’s Goose Man product, covering data preparation, word‑segmentation challenges, a six‑layer multi‑LSTM architecture with word embeddings, training results achieving roughly 96 % accuracy, and its deployment for automatic detection of misleading and high‑impact user reviews.

Chinese NLPKerasLSTM
0 likes · 23 min read
Chinese Text Sentiment Classification Using Multi‑layer LSTM: Data Preparation, Model Architecture, and Business Applications
DataFunTalk
DataFunTalk
Nov 24, 2018 · Artificial Intelligence

Comprehensive Guide to Fine‑Tuning BERT on Chinese Datasets

This article provides a step‑by‑step guide for fine‑tuning Google’s open‑source BERT on Chinese datasets, covering model download, processor customization, code examples, training commands, and insights into the underlying TensorFlow estimator architecture and deployment considerations.

BERTChinese NLPFine-tuning
0 likes · 11 min read
Comprehensive Guide to Fine‑Tuning BERT on Chinese Datasets
AntTech
AntTech
Jan 18, 2018 · Artificial Intelligence

cw2vec: Learning Chinese Word Embeddings with Stroke n-grams

The cw2vec paper, presented at AAAI 2018, introduces a Chinese word embedding method that leverages stroke n‑grams to capture character semantics, proposes a novel loss function, demonstrates consistent improvements over existing models across similarity, analogy, classification and NER tasks, and discusses real‑world AI applications.

AAAI 2018AI researchChinese NLP
0 likes · 7 min read
cw2vec: Learning Chinese Word Embeddings with Stroke n-grams