Artificial Intelligence 16 min read

Building and Deploying Language Models for Text Quality Evaluation and Generation

This article explains the concepts, training pipeline, deployment formats, and practical applications of language models—particularly LSTM‑based models—for evaluating and generating text quality in a real‑world rental listing platform, highlighting data preparation, model training, and online serving techniques.

58 Tech
58 Tech
58 Tech
Building and Deploying Language Models for Text Quality Evaluation and Generation

Language models are fundamental probabilistic tools in natural language processing, widely used in tasks such as speech recognition, sentiment analysis, and machine translation, and they can improve the ranking of rental and second‑hand housing posts by assessing textual quality.

The development of language models has progressed from n‑gram approaches to recurrent neural networks (RNN), long short‑term memory (LSTM), gated recurrent units (GRU) and their variants, as illustrated in the evolution diagram.

Training a language model involves three main steps: data preparation (collecting relevant posts, cleaning, removing irrelevant content, handling punctuation, stop‑words, HTML tags, and encoding words with one‑hot or embedding vectors), model training (using cross‑entropy loss, optimizers such as Adam, hyper‑parameters like batch_size, num_steps, hidden_size, dropout, early stopping, and evaluating with perplexity), and finally model export.

For deployment, TensorFlow provides several formats: checkpoint (weights only), GraphDef (graph structure without weights), and SavedModel (combined graph and weights) which is used with TensorFlow Serving and the SCF service to deliver online predictions.

In practice, the deployed model is applied to two tasks: (1) text‑quality evaluation, where perplexity serves as a quality score that is combined with manual ratings to influence post ranking, and (2) text generation, where a seed word and top‑K sampling produce diverse continuations to assist search query correction and suggestion.

Overall, LSTM‑based language models remain effective for Chinese text applications, and ongoing advances are expected to further improve performance in quality assessment and generation tasks.

DeploymentTensorFlowlanguage modelLSTMperplexitytext quality
58 Tech
Written by

58 Tech

Official tech channel of 58, a platform for tech innovation, sharing, and communication.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.