Detecting COVID‑19 Public Sentiment with Chinese BERT: Competition Walkthrough

This article outlines the COVID‑19 public sentiment detection competition, detailing the three‑class classification task, data cleaning and exploratory analysis, a Chinese BERT baseline that reaches a 0.726 macro‑F1 score, submission pitfalls, and recommended further reading.

Baobao Algorithm Notes
Baobao Algorithm Notes
Baobao Algorithm Notes
Detecting COVID‑19 Public Sentiment with Chinese BERT: Competition Walkthrough

Competition Overview

The “Pandemic Public Sentiment Identification” challenge (https://www.datafountain.cn/competitions/423) was organized by the Beijing Economic and Information Technology Bureau and the China Computer Federation’s Big Data Committee. The goal is to support epidemic control and post‑pandemic recovery by applying big data, AI, and cloud‑computing techniques to social‑media data.

Task Description

Participants must classify each social‑media post into one of three sentiment polarities: -1 (negative), 0 (neutral), or 1 (positive). The official evaluation metric is the macro‑averaged F1 score.

Data Exploration

Initial inspection revealed noisy label values—many unexpected symbols appeared alongside the three valid classes. After removing these corrupt entries, the cleaned label column contains only -1, 0, and 1. A temporal analysis shows a rapid increase in posting activity from 2020‑01‑01, with a peak around the Chinese New Year and the Dr. Li Wenliang incident (approximately 2020‑02‑02 to 2020‑02‑10).

Sentiment over time chart
Sentiment over time chart

Baseline Model

A baseline was built using the chinese-bert-base transformer model with 5‑fold cross‑validation. The resulting macro‑F1 score is 0.726. The full training and inference script can be obtained by replying “疫情代码” to the competition backend.

Submission Tip

When generating the submission file, append a trailing space after each sample ID. Omitting this space triggers a platform error and causes the submission to be rejected.

Further Reading

Two methodological articles are frequently cited for competition strategies:

Kaggle GM qrfaction: Data competition methodology – https://mp.weixin.qq.com/s?__biz=MzIwNDY1NTU5Mg==∣=2247483819&idx=1&sn=82e64dbd3f66a9a52accc3f8063aa071

Kaggle GM spongebob: Advanced tricks and top‑10 domestic competition patterns – https://zhuanlan.zhihu.com/p/71609765

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Sentiment AnalysisEDAdata competitionCOVID-19Chinese BERTmacro-F1
Baobao Algorithm Notes
Written by

Baobao Algorithm Notes

Author of the BaiMian large model, offering technology and industry insights.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.