Artificial Intelligence 6 min read

How a Chinese Developer Recreated ChatGPT with Google’s PaLM and RLHF

A Chinese engineer reverse‑engineered ChatGPT by building on Google’s massive PaLM model and applying reinforcement learning from human feedback, revealing the technical steps, challenges, and community reactions to this ambitious open‑source AI project.

21CTO

Dec 30, 2022

How a Chinese Developer Recreated ChatGPT with Google’s PaLM and RLHF

A Chinese developer, Phillip Wang, has recreated a ChatGPT‑like system by leveraging Google’s PaLM architecture and reinforcement learning from human feedback (RLHF). The project has quickly attracted attention, earning over 1.7k stars on GitHub.

Core Technology: PaLM and RLHF

PaLM (Pathways Language Model) is Google’s 540‑billion‑parameter, all‑purpose model released in April, trained via the Pathways system with contributions from BERT’s Jacob Devlin. PaLM excels at code generation, conversation, and language understanding, achieving state‑of‑the‑art few‑shot performance on many tasks.

RLHF, introduced by OpenAI for InstructGPT, aligns AI responses with human expectations and reduces harmful outputs. The RLHF process consists of three steps:

Collect human‑written demonstration answers to fine‑tune a baseline GPT‑3 model.

Gather multiple model outputs for the same prompts, have humans rank them, and train a reward model on this data.

Use the reward model as a reward function and apply Proximal Policy Optimization (PPO) to fine‑tune the GPT‑3 policy, maximizing the reward.

Recreating ChatGPT

The author implemented the above two core ideas—PaLM architecture and RLHF—to build an open‑source ChatGPT clone. The workflow involves three main stages:

Train a PaLM‑style autoregressive transformer from scratch (a daunting computational task).

Train a lightweight reward model using LoRA, an open‑source method for fine‑tuning large language models.

Combine the pretrained model and reward model, then apply RLHF to fine‑tune the system.

After these steps, the result is an open‑source version of ChatGPT, though the project provides only the architecture and code, not the pretrained weights, making the first step especially challenging.

Challenges and Community Reaction

Key obstacles include massive compute requirements, the sheer size of the model, and difficulty acquiring high‑quality training data. Some observers doubt the practicality of the project, while others view it as a positive sign that major AI breakthroughs quickly spawn open‑source alternatives.

Phillip Wang has a history of replicating high‑profile AI models such as DALL·E 2 and AlphaFold 2. Similar community efforts include LAION’s Open Assistant, which aims to develop an open‑source chat AI through crowdsourced contributions.

Resources

For those interested in exploring the code, the following repositories are available:

PaLM‑rlhf‑pytorch: https://github.com/lucidrains/PaLM-rlhf-pytorch

Open Assistant: https://github.com/LAION-AI/Open-Assistant

These resources provide a starting point for experimenting with large‑scale language models and RLHF techniques.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

ChatGPT open-source AI RLHF PaLM

Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.