Tagged articles
7 articles
Page 1 of 1
Tech Musings
Tech Musings
Feb 7, 2026 · Fundamentals

How to Clean and Convert a Chinese Poetry Dataset for RAG Projects

This guide explains how to clean a Chinese poetry corpus—removing special characters, filtering short entries, and converting traditional characters to simplified Chinese—using Python validation functions, batch file processing, and WSL‑based OpenCC conversion, then persisting the results as JSON.

JSONRAGdata cleaning
0 likes · 12 min read
How to Clean and Convert a Chinese Poetry Dataset for RAG Projects
JavaEdge
JavaEdge
Mar 15, 2025 · Artificial Intelligence

Boost NLP Model Performance with n-gram Feature Engineering

This article explains why feature engineering is crucial for NLP tasks, introduces n‑gram enhancements, provides Python implementations for generating bi‑gram and higher‑order features, demonstrates dynamic padding for text length standardization, and offers practical deployment tips such as feature dimension control and monitoring.

Deep LearningN-gramNLP
0 likes · 7 min read
Boost NLP Model Performance with n-gram Feature Engineering
Code DAO
Code DAO
Dec 21, 2021 · Artificial Intelligence

Four Keras Techniques for Preprocessing Text for Deep Learning

This article explains four Keras utilities—text_to_word_sequence, hashing_trick, one_hot, and Tokenizer—showing how each converts raw text into token lists, hash indices, integer encodings, or document matrices, with code examples and sample outputs.

KerasTokenizerhashing_trick
0 likes · 6 min read
Four Keras Techniques for Preprocessing Text for Deep Learning
Yuewen Technology
Yuewen Technology
Oct 15, 2021 · Artificial Intelligence

How Yuedu's TTS Platform Automates High‑Quality Audiobook Production

This article explains how Yuedu's TTS synthesis platform tackles the booming audiobook market by using AI‑driven text preprocessing, role graph construction, content structuring, emotion and effect recognition, and a streamlined post‑processing workflow to efficiently generate multi‑character, emotionally rich audio books at scale.

Audio SynthesisEmotion RecognitionNLP
0 likes · 13 min read
How Yuedu's TTS Platform Automates High‑Quality Audiobook Production
Yanxuan Tech Team
Yanxuan Tech Team
Apr 20, 2020 · Artificial Intelligence

How AI-Driven Clustering Boosts Smart Customer Service Knowledge Bases

This article outlines an AI-powered workflow for constructing and enriching a business knowledge base in intelligent customer service, covering preprocessing, intent detection, deep and shallow semantic feature engineering, hierarchical bucket clustering, and automated summary extraction to improve FAQ coverage and reduce manual workload.

AIKnowledge BaseNLP
0 likes · 15 min read
How AI-Driven Clustering Boosts Smart Customer Service Knowledge Bases