Artificial Intelligence 10 min read

How to Evaluate OpenAI's Super Conversational Model ChatGPT?

This article compiles three highly upvoted Zhihu answers that examine OpenAI's ChatGPT, discussing its breakthrough impact on NLP, visual in‑context learning, reinforcement‑learning‑from‑human‑feedback, and the broader implications for AI research and development.

DataFunSummit

Feb 7, 2023

How to Evaluate OpenAI's Super Conversational Model ChatGPT?

👨‍💻 Cao Yue notes that many older‑time NLP researchers are still stuck in the BERT era, while newer entrants after GPT‑3 have a broader view of large language models. He highlights the difficulty for Chinese researchers to access GPT‑3 APIs, describing a "bottleneck" that limits domestic progress compared to OpenAI.

Cao also reflects on his own misunderstanding of in‑context learning, recognizing it as an emergent property of large autoregressive models, and discusses challenges of applying it to vision tasks, mentioning works like pix2seq, unified‑io, and UVIM that attempt to unify task representations.

He further describes the evolution after GPT‑3, including WebGPT, InstructGPT, and alignment research, emphasizing the shift toward new loss signals and the use of reinforcement learning from human feedback (RLHF) to align model outputs with human expectations.

👨‍💻 Trinkle shares personal experience participating in ChatGPT training, suggesting promising directions such as re‑applying RL to language models, improving reward‑model and policy training efficiency, and building a highly optimized RLHF library to replace existing tools.

He also lists practical observations: the importance of dataset quality and diversity, the completeness of dialog as a carrier for any content, and speculative ideas about AGI‑era productivity gains where a single model could replace entire development teams.

👨‍💻 Gh0u1L5 points out that ChatGPT is wrapped in a sophisticated lock, with engineers deliberately restricting certain capabilities. He demonstrates how the model can be coaxed around political, religious, or dangerous‑behavior restrictions, and notes that early coding bans were quickly lifted after positive public reaction.

The article includes several illustrative images showing ChatGPT interactions, a virtual machine example inside ChatGPT, and screenshots of the model’s restriction bypass attempts.

Finally, a concise list of current ChatGPT limitations is provided, covering sensitive political and religious topics, role‑playing specific personalities, instructions for dangerous actions, moral dilemmas, and queries requiring internet access.

The author concludes that while many restrictions aim to prevent misuse, they also reflect concerns about public perception and potential panic, underscoring the delicate balance between AI capability and societal impact.

Sensitive political topics and figures

Sensitive religious topics and figures

Role‑playing specific personalities

Instructions for dangerous behavior

Moral or subjective dilemma questions

Queries requiring real‑time internet access

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Large Language Models ChatGPT AI research RLHF In-Context Learning

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.