Artificial Intelligence 28 min read

Demystifying ChatGPT: From Transformer Basics to Business Applications

This article offers a non‑algorithmic engineer’s clear overview of large language models, explaining ChatGPT’s generative‑pre‑training‑transformer foundation, core mechanisms like attention, practical prompt‑engineering tips, and how enterprises can integrate LLMs into data analysis, smart‑customer service, and other business workflows while noting associated risks.

Alipay Experience Technology

Sep 12, 2023

Demystifying ChatGPT: From Transformer Basics to Business Applications

Preface

Large Language Models (LLMs) like ChatGPT are becoming ubiquitous in both business and daily life. Even if you are not an algorithm specialist, understanding the basic principles and how they can be linked to business scenarios is essential technical literacy.

What is ChatGPT?

ChatGPT stands for Generative Pre‑Training Transformer . Generative means it creates new data by learning from historical data, producing output token by token. Pre‑Training refers to training a model on massive generic data before fine‑tuning it for specific tasks, which greatly reduces the computational cost of training separate models. Transformer is the neural‑network architecture that powers ChatGPT.

Core Task of ChatGPT

The main task of ChatGPT is to generate the next token that conforms to human writing habits by statistically predicting the next character or word based on massive corpora of web pages, books, and other texts.

Token‑by‑Token Prediction Example

Input "湖" → possible outputs "泊", "人", "水"; highest probability is "人".

Input "湖人" → possible outputs "总", "真", "牛"; highest probability is "总".

Input "湖人总" → possible outputs "冠", "赢", "经"; highest probability is "冠".

Input "湖人总冠" → possible outputs "名", "王", "军"; highest probability is "军".

Fundamental AI Concepts

Machine Learning

Machine Learning (ML) learns general patterns from limited observation data and applies them to unseen samples. The goal is to build models with good generalization ability.

Parameters / Weights

All AI models have parameters (weights). For example, a simple linear model has two parameters. ChatGPT‑3.0 contains 175 billion parameters; GPT‑4 has even more, making training extremely resource‑intensive.

Supervised vs. Unsupervised Learning

Supervised learning uses labeled data to train a model to fit a function. Unsupervised learning lets the model discover structure from unlabeled data, e.g., clustering ripe vs. unripe watermelons.

Overfitting / Underfitting

Overfitting occurs when a model fits training data too closely and fails to generalize; underfitting occurs when the model is too simple to capture underlying patterns.

Overfitting vs Underfitting illustration

Supervised Fine‑Tuning (SFT)

SFT adjusts a pre‑trained model on a specific dataset using supervised learning, allowing rapid adaptation without retraining the entire network.

Reinforcement Learning with Human Feedback (RLHF)

RLHF trains GPT‑3.5‑series models in three steps: (1) supervised fine‑tuning, (2) training a reward model from human‑generated preference data, and (3) using Proximal Policy Optimization (PPO) to fine‑tune the model against the reward model.

Neural Networks

Artificial neural networks consist of an input layer, hidden layers, and an output layer, analogous to how neurons process information in the human brain.

Input layer: entry point for data.

Hidden layer: processes information.

Output layer: decides the next action.

Transformer Basics

Step 1: Embedding

Embedding converts each token into a high‑dimensional vector (e.g., 768‑dim for GPT‑2, 12288‑dim for GPT‑3). Token value vectors and positional vectors are summed to form the final embedding sequence.

Step 2: Attention

Attention mechanisms allow the model to weigh the relevance of each token in the context of others. Multi‑head attention uses several parallel attention heads to capture different relational aspects.

Step 3: From Vectors to Probabilities

Attention scores are passed through a SoftMax to obtain probabilities that sum to 1. These probabilities weight the value vectors (V) to produce the final context‑aware representation.

Training Data and Process

ChatGPT is trained on roughly 45 TB of data from sources such as Wikipedia, books, journals, Reddit, Common Crawl, and other datasets, amounting to about 1 trillion tokens.

The training pipeline consists of three steps:

Supervised Fine‑Tuning (SFT) on curated datasets.

Training a Reward Model (RM) from human‑annotated comparison data.

Using Proximal Policy Optimization (PPO) to fine‑tune the SFT model with the RM as the reward signal.

Prompt Engineering Tips

A prompt is a piece of instruction that guides the model. Clear, specific prompts yield better results. Examples:

Instead of "Please suggest me some essential words for IELTS", ask "Please suggest me 10 essential words for IELTS".

When asking for names, provide an example: "Help me generate three male names, e.g., 龙傲天".

Additional techniques:

Append a guiding token (e.g., "SELECT") when you want the model to generate code.

Separate instructions from the text to be processed using triple quotes (""") to improve accuracy.

Integrating Large Models into Business

Prompt Platform Construction

Build a centralized prompt‑management platform that stores templates for data analysis, business summarization, intelligent customer service, etc., and integrates with internal data platforms like ODPS or DataPhin.

Data Analysis + LLM

Use LLMs for conversational chart generation and trend prediction, lowering the skill barrier for non‑technical analysts.

Intelligent Customer Service

Deploy LLM‑powered chatbots to handle high‑volume queries across domains such as finance, healthcare, and legal services.

Code Generation & Review

Leverage LLMs for code snippets, SQL generation, and automated code review (e.g., detecting syntax issues during pull‑request checks).

Risks and Precautions

Hallucination

LLMs can produce confident but incorrect statements; users must verify outputs, especially in unfamiliar domains.

Data Security

Using external LLM services may expose proprietary data. Many countries have issued guidelines restricting the use of public AI services for sensitive workloads.

Other Threats

Recent OWASP research lists the top‑10 vulnerabilities specific to large language models.

Conclusion

We are in an era of rapid AI advancement. While large models offer powerful capabilities, professionals should understand their strengths, integrate them thoughtfully into business processes, and remain vigilant about risks.

References

What Is ChatGPT Doing and Why Does It Work? – Stephen Wolfram

Learning Prompt Wiki

Core Technologies Behind ChatGPT – modb.pro

prompt engineering Transformer ChatGPT AI applications Large Language Model

Written by

Alipay Experience Technology

Exploring ultimate user experience and best engineering practices

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.