Tagged articles
1023 articles
Page 11 of 11
DataFunSummit
DataFunSummit
Feb 7, 2023 · Artificial Intelligence

How to Evaluate OpenAI's Super Conversational Model ChatGPT?

This article compiles three highly upvoted Zhihu answers that examine OpenAI's ChatGPT, discussing its breakthrough impact on NLP, visual in‑context learning, reinforcement‑learning‑from‑human‑feedback, and the broader implications for AI research and development.

AI researchChatGPTIn-Context Learning
0 likes · 10 min read
How to Evaluate OpenAI's Super Conversational Model ChatGPT?
Architects' Tech Alliance
Architects' Tech Alliance
Feb 6, 2023 · Artificial Intelligence

What Makes ChatGPT Tick? A Deep Dive into Its Architecture, Limits, and Market Impact

This article provides a comprehensive analysis of ChatGPT, covering its origins within the OpenAI GPT family, core technical features such as RLHF training and model compression, current limitations, future improvement directions, and the broader industry and investment opportunities generated by large‑language‑model AI.

AI industryChatGPTReinforcement Learning from Human Feedback
0 likes · 20 min read
What Makes ChatGPT Tick? A Deep Dive into Its Architecture, Limits, and Market Impact
DataFunTalk
DataFunTalk
Jan 15, 2023 · Artificial Intelligence

Advances in Dialogue Systems: Baidu PLATO Large‑Scale Conversational Models

This article reviews the evolution of dialogue systems from modular task‑oriented designs to end‑to‑end large‑scale models, detailing Baidu's PLATO series, their technical innovations, real‑world deployments, challenges such as inference efficiency and safety, and future research directions in conversational AI.

AI SafetyConversational AIDialogue Systems
0 likes · 13 min read
Advances in Dialogue Systems: Baidu PLATO Large‑Scale Conversational Models
DataFunTalk
DataFunTalk
Jan 10, 2023 · Artificial Intelligence

Paradigm Shifts in Large Language Model Research and Future Directions

The article reviews the evolution of large language models from the pre‑GPT‑3 era to the present, analyzes the conceptual and technical gaps between Chinese and global research, and outlines key future research directions such as scaling laws, prompting techniques, multimodal training, and efficient model architectures.

AI researchChatGPTIn-Context Learning
0 likes · 73 min read
Paradigm Shifts in Large Language Model Research and Future Directions
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jan 10, 2023 · Artificial Intelligence

How GPT‑MoE Cuts Training Costs: Sparse Transformer Techniques and Performance Insights

This article examines the use of Mixture‑of‑Experts (MoE) sparse training for GPT models, detailing the architecture, training and inference efficiency gains, experimental comparisons with dense models, custom routing algorithms, and step‑by‑step deployment on Alibaba Cloud AI platforms.

AI efficiencyGPT-MoEModel Training
0 likes · 26 min read
How GPT‑MoE Cuts Training Costs: Sparse Transformer Techniques and Performance Insights
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Jan 3, 2023 · Artificial Intelligence

Insights into ChatGPT: Capabilities, Limitations, and Implications for AI Research

During Xiaohongshu’s REDtech livestream, AI researchers examined ChatGPT’s rapid adoption, versatile task performance, and underlying large‑scale pre‑training with in‑context learning, while highlighting persistent hallucinations, weak reasoning, high costs, and limited search‑engine replacement potential, and emphasized the importance of RLHF‑driven human feedback for future multimodal AI research.

AI researchChatGPTRLHF
0 likes · 14 min read
Insights into ChatGPT: Capabilities, Limitations, and Implications for AI Research
21CTO
21CTO
Dec 15, 2022 · Artificial Intelligence

Sam Altman & Reid Hoffman on AI’s Future: Business, Multimodal Models, Society

In a candid conversation, Sam Altman and Reid Hoffman explore the next stage of AI, discussing commercial opportunities of large language models, the rise of AI‑plus applications in science and the metaverse, future directions such as multimodal and continuously learning models, and the societal challenges of AGI, wealth distribution and universal basic income.

AGIAIAI commercialization
0 likes · 16 min read
Sam Altman & Reid Hoffman on AI’s Future: Business, Multimodal Models, Society
DataFunTalk
DataFunTalk
Aug 28, 2022 · Artificial Intelligence

Emerging Paths Toward General AI: Trends in Large‑Scale Pretrained Models

The article reviews how the Transformer breakthrough, the rapid scaling of large language models such as GPT‑3, Switch Transformer, and Alibaba's AliceMind and M6, together with multimodal research, are shaping the next phase of artificial intelligence toward more general, collaborative, and open AI systems.

AI trendsartificial intelligencelarge language models
0 likes · 5 min read
Emerging Paths Toward General AI: Trends in Large‑Scale Pretrained Models
DataFunSummit
DataFunSummit
Jul 27, 2022 · Artificial Intelligence

DataFun 2022 Natural Language Processing Summit – Leading Experts Discuss Large‑Scale Language Models, Multimodal Understanding, Dialogue Systems and AI Applications

The DataFun 2022 NLP Summit, held on July 30, brings together top researchers and industry leaders from Alibaba, Baidu, Microsoft, Amazon, and more to present the latest advances in large‑scale pre‑training, multimodal perception, information extraction, dialogue interaction, machine translation, and practical AI deployments, with live streaming and free registration via QR code.

AIDialogue SystemsInformation Extraction
0 likes · 44 min read
DataFun 2022 Natural Language Processing Summit – Leading Experts Discuss Large‑Scale Language Models, Multimodal Understanding, Dialogue Systems and AI Applications
DataFunSummit
DataFunSummit
Apr 25, 2022 · Artificial Intelligence

Token‑Level Pipeline Parallelism for Transformer‑based Language Models (TeraPipe)

The article introduces a token‑level pipeline parallelism strategy that splits the sequence‑length dimension of Transformer‑based language models, explains why this approach is feasible, presents a dynamic‑programming formulation for optimal slicing, discusses engineering challenges, and evaluates its performance on large GPT models.

Performance OptimizationPipeline ParallelismToken-level
0 likes · 13 min read
Token‑Level Pipeline Parallelism for Transformer‑based Language Models (TeraPipe)
Meituan Technology Team
Meituan Technology Team
Mar 17, 2022 · Artificial Intelligence

Tsinghua University & Meituan Digital Life Joint Research Institute Academic Salon: Large Model Technologies and Challenges

The Tsinghua‑Meituan Digital Life Joint Research Institute’s Academic Salon on March 23 featured Associate Professor Liu Zhiyuan presenting the latest advances and ten key challenges in large‑model technologies, aiming to foster industry‑academia collaboration and drive innovation in representation learning, knowledge graphs, and social computing.

Academic SeminarNLPTsinghua University
0 likes · 4 min read
Tsinghua University & Meituan Digital Life Joint Research Institute Academic Salon: Large Model Technologies and Challenges
Volcano Engine Developer Services
Volcano Engine Developer Services
Mar 16, 2022 · Artificial Intelligence

How veGiantModel Boosts Large Language Model Training Up to 6.9× Faster

The article introduces Volcano Engine's veGiantModel, a high‑performance large‑model training framework built on PyTorch, Megatron and DeepSpeed, details its distributed parallel strategies, hardware setups, benchmark results showing up to 6.9× speedup over Megatron and DeepSpeed, and provides open‑source links for further use.

ByteCCLDistributed Traininglarge language models
0 likes · 6 min read
How veGiantModel Boosts Large Language Model Training Up to 6.9× Faster
DataFunTalk
DataFunTalk
Mar 16, 2022 · Artificial Intelligence

Parameter-Efficient Sparsity Training for the PLUG Large-Scale Language Model

This article presents the PLUG 270‑billion‑parameter Chinese language model and introduces a parameter‑efficient sparsity training (PST) framework that combines unstructured and structured pruning with low‑rank decomposition to dramatically reduce model size while preserving downstream performance.

Deep LearningPLUGParameter-Efficient Training
0 likes · 13 min read
Parameter-Efficient Sparsity Training for the PLUG Large-Scale Language Model
DataFunTalk
DataFunTalk
Jan 3, 2022 · Artificial Intelligence

Top AI Stories of 2021: Large‑Scale Pretrained Models, Transformers, Multimodal AI, and Emerging Challenges

The article reviews the 2021 AI landscape, highlighting the race for ever‑larger pretrained models, the dominance of Transformers across modalities, the promise and limits of large models, the rise of multimodal systems, regulatory considerations, and the still‑nascent progress in reinforcement learning.

AI GovernanceAI industryMultimodal AI
0 likes · 12 min read
Top AI Stories of 2021: Large‑Scale Pretrained Models, Transformers, Multimodal AI, and Emerging Challenges
DataFunTalk
DataFunTalk
Dec 24, 2021 · Artificial Intelligence

Large-Scale Pretrained Model Compression and Distillation: AdaBERT, L2A, and Meta‑KD

This article reviews three consecutive works from Alibaba DAMO Academy on compressing and distilling large pretrained language models—AdaBERT, L2A, and Meta‑KD—detailing their motivations, neural‑architecture‑search‑based designs, loss formulations, experimental results, and insights from a Q&A session.

AINeural Architecture Searchknowledge distillation
0 likes · 10 min read
Large-Scale Pretrained Model Compression and Distillation: AdaBERT, L2A, and Meta‑KD
Programmer DD
Programmer DD
Aug 29, 2021 · Artificial Intelligence

When AI Code Assistants Leak Fake IDs: What GitHub Copilot’s Slip Reveals

GitHub Copilot, powered by the Codex model, recently generated a seemingly real Chinese ID number for Bilibili CEO Chen Rui, sparking concerns about privacy leaks, model training data, and the broader risks of AI code assistants inadvertently exposing personal information.

AI code generationGitHub Copilotlarge language models
0 likes · 6 min read
When AI Code Assistants Leak Fake IDs: What GitHub Copilot’s Slip Reveals
DataFunTalk
DataFunTalk
Jul 1, 2021 · Artificial Intelligence

Pre‑Trained Models: Past, Present, and Future – A Comprehensive Survey

This article surveys the evolution of pre‑trained models, covering the origins of transfer and self‑supervised learning, the rise of transformer‑based PTMs such as BERT and GPT, efficient architecture designs, multimodal and multilingual extensions, theoretical analyses, and future research directions for scalable and robust AI systems.

AI researchefficient traininglarge language models
0 likes · 27 min read
Pre‑Trained Models: Past, Present, and Future – A Comprehensive Survey
DataFunTalk
DataFunTalk
Jul 7, 2020 · Artificial Intelligence

Optimizing Pretrained Language Model Inference: Lessons from the NLPCC Small Model Competition and Deployment at Xiaomi

This article shares the Xiaomi AI Lab NLP team's experience in the NLPCC lightweight language model competition, discusses efficiency challenges of large pretrained models like BERT, and details practical inference optimizations—including model distillation, batching, FP16 quantization, and FasterTransformer integration—that dramatically reduce latency and hardware costs in production.

AIBERTInference Optimization
0 likes · 15 min read
Optimizing Pretrained Language Model Inference: Lessons from the NLPCC Small Model Competition and Deployment at Xiaomi