Tagged articles
10 articles
Page 1 of 1
Network Intelligence Research Center (NIRC)
Network Intelligence Research Center (NIRC)
May 10, 2023 · Artificial Intelligence

How LLaMA Preprocesses Training Data with CCNet Before Model Training

Before training large language models like LLaMA, MetaAI applies a multi‑stage CCNet pipeline that crawls web data, stores it in WET format, deduplicates paragraphs, detects and filters languages using fastText, and further refines content by similarity to Wikipedia and citation‑based linear models.

CCNetLLaMAdata preprocessing
0 likes · 7 min read
How LLaMA Preprocesses Training Data with CCNet Before Model Training
Code DAO
Code DAO
Jan 15, 2022 · Artificial Intelligence

Compressing Unsupervised fastText Models 300× Smaller with Near‑Identical NLP Performance

This article shows how the compress‑fasttext Python library can shrink a 7 GB fastText word‑embedding model to about 21 MB—a 300‑fold reduction—while preserving almost the same accuracy on downstream NLP tasks, and explains the underlying compression techniques, usage examples, and evaluation results.

NLPcompress-fasttextfastText
0 likes · 9 min read
Compressing Unsupervised fastText Models 300× Smaller with Near‑Identical NLP Performance
Code DAO
Code DAO
Dec 12, 2021 · Artificial Intelligence

How to Boost Text Analysis Accuracy on a 2‑Billion‑Word Corpus

This article explains practical techniques for improving NLP model accuracy on massive corpora, covering challenges of multi‑field text, word‑embedding choices, a fasttext‑based regression demo with book‑review data, feature engineering tricks, and a comparison with tf‑idf + LASSO.

NLPPythonWord2Vec
0 likes · 13 min read
How to Boost Text Analysis Accuracy on a 2‑Billion‑Word Corpus
58 Tech
58 Tech
Mar 1, 2021 · Artificial Intelligence

Intelligent QABot for 58.com: Classification and Retrieval Model Exploration

This article describes how 58.com’s AI Lab built and continuously improved the QABot intelligent customer‑service system by designing classification and retrieval models, evaluating FastText, LSTM‑DSSM, BERT and a self‑developed SPTM framework, and finally fusing them to boost answer rates and user experience.

AI chatbotBERTModel Fusion
0 likes · 9 min read
Intelligent QABot for 58.com: Classification and Retrieval Model Exploration
58 Tech
58 Tech
Jan 27, 2021 · Artificial Intelligence

Model Iteration and Architecture of the BangBang Intelligent Customer Service QABot

This article details the BangBang intelligent customer service system's overall architecture, core capabilities, knowledge‑base construction, and successive model upgrades—from FastText to TextCNN, Bi‑LSTM, and model fusion—showing how each iteration improved accuracy, recall, and F1 scores toward a stable 95% performance level.

LSTMTextCNNai
0 likes · 12 min read
Model Iteration and Architecture of the BangBang Intelligent Customer Service QABot
Tongcheng Travel Technology Center
Tongcheng Travel Technology Center
Nov 1, 2019 · Artificial Intelligence

Improving International Hotel Room‑Type Merging with Text Similarity and Machine‑Learning Models

This article describes how a large‑scale international hotel platform reduced room‑type merging errors and user complaints by applying rule‑based methods, text‑similarity algorithms (Jaccard, LCS, N‑Gram) and supervised machine‑learning classifiers such as fastText to standardize and merge heterogeneous room‑type data.

N-gramfastTexthotel
0 likes · 9 min read
Improving International Hotel Room‑Type Merging with Text Similarity and Machine‑Learning Models
Alibaba Cloud Developer
Alibaba Cloud Developer
Jun 18, 2019 · Artificial Intelligence

From Word2Vec to Quick-Thought: A Complete Guide to Modern Embeddings

This article reviews the evolution of word and sentence embeddings, covering foundational theories like vector semantics and distributional hypothesis, practical models such as Word2Vec, GloVe, fastText, Skip‑Thought, Quick‑Thought, and evaluation techniques, while offering implementation tips and real‑world use cases.

GloVeNLPWord2Vec
0 likes · 21 min read
From Word2Vec to Quick-Thought: A Complete Guide to Modern Embeddings
Alibaba Cloud Developer
Alibaba Cloud Developer
Jan 16, 2019 · Artificial Intelligence

How Machine Learning Can Clean Up Low‑Quality E‑Commerce Product Materials

This article explains a machine‑learning‑driven system that automatically detects and classifies poor‑quality e‑commerce product materials—such as misleading titles, exaggerated benefits, and over‑promotion—to protect consumers, reduce platform risk, and improve conversion rates during major sales events.

TF-IDFaicontent moderation
0 likes · 13 min read
How Machine Learning Can Clean Up Low‑Quality E‑Commerce Product Materials
Beike Product & Technology
Beike Product & Technology
Dec 6, 2018 · Artificial Intelligence

Designing and Deploying a Real‑Estate Dialogue System: Architecture, Challenges, and Practices

The talk outlines how Beike built a real‑estate conversational AI platform, covering the market need for dialogue systems, the five technical challenges, data‑driven intent and slot extraction, model choices such as FastText and Bi‑LSTM‑CRF, a three‑layer system architecture, multi‑intent handling, and future directions like 4D viewing and an internal AI dialogue platform.

BILSTM-CRFNLPdialogue system
0 likes · 26 min read
Designing and Deploying a Real‑Estate Dialogue System: Architecture, Challenges, and Practices