How a Chinese Developer Recreated ChatGPT with Google’s PaLM and RLHF
A Chinese engineer reverse‑engineered ChatGPT by building on Google’s massive PaLM model and applying reinforcement learning from human feedback, revealing the technical steps, challenges, and community reactions to this ambitious open‑source AI project.
A Chinese developer, Phillip Wang, has recreated a ChatGPT‑like system by leveraging Google’s PaLM architecture and reinforcement learning from human feedback (RLHF). The project has quickly attracted attention, earning over 1.7k stars on GitHub.
Core Technology: PaLM and RLHF
PaLM (Pathways Language Model) is Google’s 540‑billion‑parameter, all‑purpose model released in April, trained via the Pathways system with contributions from BERT’s Jacob Devlin. PaLM excels at code generation, conversation, and language understanding, achieving state‑of‑the‑art few‑shot performance on many tasks.
RLHF, introduced by OpenAI for InstructGPT, aligns AI responses with human expectations and reduces harmful outputs. The RLHF process consists of three steps:
Collect human‑written demonstration answers to fine‑tune a baseline GPT‑3 model.
Gather multiple model outputs for the same prompts, have humans rank them, and train a reward model on this data.
Use the reward model as a reward function and apply Proximal Policy Optimization (PPO) to fine‑tune the GPT‑3 policy, maximizing the reward.
Recreating ChatGPT
The author implemented the above two core ideas—PaLM architecture and RLHF—to build an open‑source ChatGPT clone. The workflow involves three main stages:
Train a PaLM‑style autoregressive transformer from scratch (a daunting computational task).
Train a lightweight reward model using LoRA, an open‑source method for fine‑tuning large language models.
Combine the pretrained model and reward model, then apply RLHF to fine‑tune the system.
After these steps, the result is an open‑source version of ChatGPT, though the project provides only the architecture and code, not the pretrained weights, making the first step especially challenging.
Challenges and Community Reaction
Key obstacles include massive compute requirements, the sheer size of the model, and difficulty acquiring high‑quality training data. Some observers doubt the practicality of the project, while others view it as a positive sign that major AI breakthroughs quickly spawn open‑source alternatives.
Phillip Wang has a history of replicating high‑profile AI models such as DALL·E 2 and AlphaFold 2. Similar community efforts include LAION’s Open Assistant, which aims to develop an open‑source chat AI through crowdsourced contributions.
Resources
For those interested in exploring the code, the following repositories are available:
PaLM‑rlhf‑pytorch: https://github.com/lucidrains/PaLM-rlhf-pytorch
Open Assistant: https://github.com/LAION-AI/Open-Assistant
These resources provide a starting point for experimenting with large‑scale language models and RLHF techniques.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
