Detecting COVID‑19 Public Sentiment with Chinese BERT: Competition Walkthrough
This article outlines the COVID‑19 public sentiment detection competition, detailing the three‑class classification task, data cleaning and exploratory analysis, a Chinese BERT baseline that reaches a 0.726 macro‑F1 score, submission pitfalls, and recommended further reading.
Competition Overview
The “Pandemic Public Sentiment Identification” challenge (https://www.datafountain.cn/competitions/423) was organized by the Beijing Economic and Information Technology Bureau and the China Computer Federation’s Big Data Committee. The goal is to support epidemic control and post‑pandemic recovery by applying big data, AI, and cloud‑computing techniques to social‑media data.
Task Description
Participants must classify each social‑media post into one of three sentiment polarities: -1 (negative), 0 (neutral), or 1 (positive). The official evaluation metric is the macro‑averaged F1 score.
Data Exploration
Initial inspection revealed noisy label values—many unexpected symbols appeared alongside the three valid classes. After removing these corrupt entries, the cleaned label column contains only -1, 0, and 1. A temporal analysis shows a rapid increase in posting activity from 2020‑01‑01, with a peak around the Chinese New Year and the Dr. Li Wenliang incident (approximately 2020‑02‑02 to 2020‑02‑10).
Baseline Model
A baseline was built using the chinese-bert-base transformer model with 5‑fold cross‑validation. The resulting macro‑F1 score is 0.726. The full training and inference script can be obtained by replying “疫情代码” to the competition backend.
Submission Tip
When generating the submission file, append a trailing space after each sample ID. Omitting this space triggers a platform error and causes the submission to be rejected.
Further Reading
Two methodological articles are frequently cited for competition strategies:
Kaggle GM qrfaction: Data competition methodology – https://mp.weixin.qq.com/s?__biz=MzIwNDY1NTU5Mg==∣=2247483819&idx=1&sn=82e64dbd3f66a9a52accc3f8063aa071
Kaggle GM spongebob: Advanced tricks and top‑10 domestic competition patterns – https://zhuanlan.zhihu.com/p/71609765
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Baobao Algorithm Notes
Author of the BaiMian large model, offering technology and industry insights.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
