Artificial Intelligence 14 min read

Intelligent Writing Assistant: TexSmart and Effidit Systems, Multi‑Level Unsupervised Text Rewriting, and the New ParaScore Evaluation Metric

This article presents Tencent AI Lab's intelligent writing assistant, detailing the TexSmart text‑understanding platform, the Effidit writing‑assistant features, a multi‑level controllable unsupervised text‑rewriting method, and a novel ParaScore metric that jointly measures semantic similarity and diversity for paraphrase evaluation.

DataFunSummit

Feb 19, 2023

Intelligent Writing Assistant: TexSmart and Effidit Systems, Multi‑Level Unsupervised Text Rewriting, and the New ParaScore Evaluation Metric

Jiang Haiyun, a senior researcher at Tencent AI Lab, introduces the intelligent writing assistant in a four‑part talk: an overview of the TexSmart text‑understanding system, the Effidit (WenYong) writing‑assistant, a multi‑level controllable unsupervised text‑rewriting method, and a new evaluation metric for paraphrase.

TexSmart provides core NLP capabilities such as tokenization, POS tagging, fine‑grained NER (over 1,000 categories with hierarchical depth up to seven), semantic association, syntactic parsing, semantic role labeling, text matching, and a text‑graph that stores common relations (synonyms, similar words, hypernyms, etc.). The platform offers both lightweight models (CRF, DNN) for speed‑critical scenarios and BERT‑based models for high‑accuracy needs, balancing industrial and academic requirements.

The Effidit writing‑assistant (also called WenYong) builds on TexSmart and offers six main functions: text error correction (deletion, insertion, substitution), phrase and sentence completion (retrieval‑based and generative), phrase polishing and sentence rewriting/expansion, example‑sentence recommendation, a bilingual cloud input method, and an academic version that supports cross‑language example retrieval and paper search.

The multi‑level controllable unsupervised rewriting framework (MCPG) manipulates three aspects of a generated paraphrase: global semantics (via dropout‑perturbed semantic vectors), local lexical constraints (preserving keywords identified by NER), and overall style (controlled by style‑transfer vectors such as sci‑fi, military, wuxia, or officialdom). Experiments show that dropout magnitude directly influences output diversity and similarity.

Existing paraphrase metrics (BLEU, ROUGE, etc.) focus on similarity and ignore diversity, leading to poor correlation with human judgments. The authors propose ParaScore, which combines a similarity component (Sim) with a diversity component (DS). ParaScore adapts between reference‑free and reference‑based modes depending on the distance between candidate, reference, and source sentences, achieving higher Pearson and Spearman correlations on Twitter‑Para and BQ‑Para datasets.

Overall, the talk demonstrates how a robust NLP backbone (TexSmart) can power a versatile writing assistant (Effidit), how controllable unsupervised rewriting can be achieved, and why new evaluation metrics like ParaScore are essential for future paraphrase research.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Evaluation Metrics AI writing NLP Unsupervised Learning Paraphrase Text Understanding

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.