Intelligent Writing Assistant: TexSmart and Effidit Systems, Multi‑Level Unsupervised Text Rewriting, and the New ParaScore Evaluation Metric
This article presents Tencent AI Lab's intelligent writing assistant, detailing the TexSmart text‑understanding platform, the Effidit writing‑assistant features, a multi‑level controllable unsupervised text‑rewriting method, and a novel ParaScore metric that jointly measures semantic similarity and diversity for paraphrase evaluation.
Jiang Haiyun, a senior researcher at Tencent AI Lab, introduces the intelligent writing assistant in a four‑part talk: an overview of the TexSmart text‑understanding system, the Effidit (WenYong) writing‑assistant, a multi‑level controllable unsupervised text‑rewriting method, and a new evaluation metric for paraphrase.
TexSmart provides core NLP capabilities such as tokenization, POS tagging, fine‑grained NER (over 1,000 categories with hierarchical depth up to seven), semantic association, syntactic parsing, semantic role labeling, text matching, and a text‑graph that stores common relations (synonyms, similar words, hypernyms, etc.). The platform offers both lightweight models (CRF, DNN) for speed‑critical scenarios and BERT‑based models for high‑accuracy needs, balancing industrial and academic requirements.
The Effidit writing‑assistant (also called WenYong) builds on TexSmart and offers six main functions: text error correction (deletion, insertion, substitution), phrase and sentence completion (retrieval‑based and generative), phrase polishing and sentence rewriting/expansion, example‑sentence recommendation, a bilingual cloud input method, and an academic version that supports cross‑language example retrieval and paper search.
The multi‑level controllable unsupervised rewriting framework (MCPG) manipulates three aspects of a generated paraphrase: global semantics (via dropout‑perturbed semantic vectors), local lexical constraints (preserving keywords identified by NER), and overall style (controlled by style‑transfer vectors such as sci‑fi, military, wuxia, or officialdom). Experiments show that dropout magnitude directly influences output diversity and similarity.
Existing paraphrase metrics (BLEU, ROUGE, etc.) focus on similarity and ignore diversity, leading to poor correlation with human judgments. The authors propose ParaScore, which combines a similarity component (Sim) with a diversity component (DS). ParaScore adapts between reference‑free and reference‑based modes depending on the distance between candidate, reference, and source sentences, achieving higher Pearson and Spearman correlations on Twitter‑Para and BQ‑Para datasets.
Overall, the talk demonstrates how a robust NLP backbone (TexSmart) can power a versatile writing assistant (Effidit), how controllable unsupervised rewriting can be achieved, and why new evaluation metrics like ParaScore are essential for future paraphrase research.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.