Tag

large language models

1 views collected around this technical thread.

ByteFE
ByteFE
Jun 13, 2025 · Artificial Intelligence

How AI Coding Powered a 3‑Day English Learning App: Insights from ByteDance’s TRAE

In a three‑day sprint, ByteDance’s VP Hong Dingkun built an English‑learning app using the AI‑coding platform TRAE, illustrating how large‑model‑driven code completion, natural‑language programming, and AI‑enhanced development can dramatically boost productivity, democratize coding, and push the limits of software intelligence.

AI codingByteDanceSoftware Development
0 likes · 14 min read
How AI Coding Powered a 3‑Day English Learning App: Insights from ByteDance’s TRAE
Architects' Tech Alliance
Architects' Tech Alliance
Jun 11, 2025 · Artificial Intelligence

From Transformers to DeepSeek‑R1: The 2017‑2025 Evolution of Large Language Models

This article chronicles the rapid development of large language models from the 2017 Transformer breakthrough through the rise of BERT, GPT‑3, ChatGPT, multimodal systems like GPT‑4V/o, and the recent cost‑efficient DeepSeek‑R1, highlighting key architectural innovations, scaling trends, alignment techniques, and their transformative impact on AI research and industry.

AI alignmentBERTGPT
0 likes · 26 min read
From Transformers to DeepSeek‑R1: The 2017‑2025 Evolution of Large Language Models
DataFunTalk
DataFunTalk
Jun 9, 2025 · Artificial Intelligence

Can AI Models Pass the Chinese Math Gaokao? A Fair, Objective Test

The author conducts a transparent, objective assessment of several large language models on the 2025 Chinese national math exam, converting all questions to LaTeX, applying strict Gaokao scoring rules, and revealing each model's strengths and weaknesses across single‑choice, multiple‑choice, and fill‑in‑the‑blank items.

AI benchmarkingGaokaolarge language models
0 likes · 7 min read
Can AI Models Pass the Chinese Math Gaokao? A Fair, Objective Test
DataFunSummit
DataFunSummit
Jun 8, 2025 · Artificial Intelligence

Mastering LLM Applications: Practical Agent Design and Implementation Strategies

This comprehensive guide explores the core implementation paths for large language model (LLM) applications, focusing on agent design, workflow orchestration, tool integration, memory management, multi‑agent architectures, and future trends, providing actionable methodologies and real‑world examples for practitioners.

AI AgentAgent DesignAutomation
0 likes · 25 min read
Mastering LLM Applications: Practical Agent Design and Implementation Strategies
DataFunSummit
DataFunSummit
Jun 6, 2025 · Artificial Intelligence

Automating High‑Quality NL2SQL Data Synthesis with Intermediate Representations

This work tackles the difficulty of incorporating extensive domain knowledge into in‑domain NL2SQL tasks by proposing an intermediate‑representation‑based data synthesis method that decouples knowledge compliance from SQL generation, enabling automated creation of high‑quality training data with 60× human efficiency and over 97% accuracy.

NL2SQLSQL generationdata synthesis
0 likes · 2 min read
Automating High‑Quality NL2SQL Data Synthesis with Intermediate Representations
IT Services Circle
IT Services Circle
Jun 6, 2025 · Artificial Intelligence

Master Retrieval‑Augmented Generation (RAG): From Basics to Advanced Practices

This article introduces Retrieval‑Augmented Generation (RAG), explains its core components—knowledge embedding, retriever, and generator—covers practical system construction, optimization techniques, evaluation metrics, and advanced paradigms such as GraphRAG and Multi‑Modal RAG, while highlighting a comprehensive guidebook for hands‑on implementation.

AIRAGRetrieval-Augmented Generation
0 likes · 12 min read
Master Retrieval‑Augmented Generation (RAG): From Basics to Advanced Practices
Code Mala Tang
Code Mala Tang
Jun 5, 2025 · Artificial Intelligence

Mastering LLM Prompts: Proven Techniques to Get Precise Answers

By rethinking how we interact with large language models—using role‑play, task decomposition, chain‑of‑thought, ReAct, and other advanced prompting strategies—readers can transform generic ChatGPT answers into precise, context‑aware responses, leveraging pattern recognition and context windows for superior AI assistance.

AI reasoningChain-of-ThoughtLLM techniques
0 likes · 21 min read
Mastering LLM Prompts: Proven Techniques to Get Precise Answers
Kuaishou Large Model
Kuaishou Large Model
Jun 5, 2025 · Artificial Intelligence

7 Kuaishou Papers Accepted at ACL 2025 Reveal Cutting‑Edge AI Advances

Kuaishou's foundational large‑model team secured seven papers at the prestigious ACL 2025 conference, covering alignment bias during model training, safety in inference, decoding strategies, fine‑grained video‑temporal understanding, and new evaluation benchmarks that push the frontier of multimodal large language models.

ACL 2025benchmarklarge language models
0 likes · 16 min read
7 Kuaishou Papers Accepted at ACL 2025 Reveal Cutting‑Edge AI Advances
Kuaishou Tech
Kuaishou Tech
Jun 5, 2025 · Artificial Intelligence

7 Kuaishou AI Papers Accepted at ACL 2025: Video Understanding & Safe LLM Decoding

Kuaishou’s foundational large-model team has secured seven papers at ACL 2025, spanning alignment bias in training, safety defenses during inference, decoding strategies, fine-grained video-temporal understanding, reward fairness in RLHF, multimodal captioning benchmarks, and methods to curb hallucinations in vision-language models.

ACLAI safetybenchmark
0 likes · 13 min read
7 Kuaishou AI Papers Accepted at ACL 2025: Video Understanding & Safe LLM Decoding
AntTech
AntTech
Jun 4, 2025 · Artificial Intelligence

LLaDA and LLaDA‑V: Large Language Diffusion Models and Their Multimodal Extensions

This article presents the LLaDA series of diffusion‑based large language models, explains how their generative‑modeling principle yields language intelligence comparable to autoregressive models, and details the multimodal LLaDA‑V architecture, training methods, experimental results, and broader implications for AI research.

diffusion modelsgenerative modelinginstruction following
0 likes · 10 min read
LLaDA and LLaDA‑V: Large Language Diffusion Models and Their Multimodal Extensions
DataFunTalk
DataFunTalk
Jun 3, 2025 · Artificial Intelligence

Meta‑Capability Alignment: Psychologically Inspired Training to Endow Large Language Models with Stable Reasoning

Researchers from NUS, Tsinghua and Salesforce AI Research introduce a meta‑capability alignment framework that integrates deductive, inductive and abductive reasoning via a psychology‑based triple, automatically generates and validates training data, and demonstrates over 10% accuracy gains on math, coding and scientific benchmarks for 7B and 32B models.

Artificial IntelligenceMeta‑Capability Alignmentlarge language models
0 likes · 8 min read
Meta‑Capability Alignment: Psychologically Inspired Training to Endow Large Language Models with Stable Reasoning
AntTech
AntTech
May 31, 2025 · Artificial Intelligence

Machine Reasoning and Deep Thinking: Insights from Ant Financial’s NLP Lead Wu Wei

The article explores how DeepSeek R1 and long‑thinking chains have revived interest in machine reasoning, tracing the evolution of natural‑language models, defining reasoning as logical knowledge composition, and outlining future research directions in efficient reasoning architectures and deep‑thinking applications.

AI researchdeep thinkingefficient reasoning
0 likes · 8 min read
Machine Reasoning and Deep Thinking: Insights from Ant Financial’s NLP Lead Wu Wei
AntTech
AntTech
May 30, 2025 · Artificial Intelligence

Insights from Ant Group’s 10th Technical Open Day: Multimodal, Embodied, and Future Model Architectures for AGI

The Ant Group’s 10th Technical Open Day gathered leading AI experts who examined the current state and future directions of multimodal large models, embodied AI, world models, transformer architectures, and vertical applications, offering a comprehensive view of the challenges and opportunities on the path toward AGI.

AGIAI safetyMultimodal Models
0 likes · 16 min read
Insights from Ant Group’s 10th Technical Open Day: Multimodal, Embodied, and Future Model Architectures for AGI
Model Perspective
Model Perspective
May 30, 2025 · Artificial Intelligence

Why Large Language Models Are Just Mathematical Functions: A Rational Perspective

The article argues that large language models are fundamentally mathematical functions that model human language, emphasizing their role as simplified representations, explaining their structural nature, sources of errors, the importance of prompts as boundary conditions, and the need for clear usage assumptions to avoid anthropomorphic misconceptions.

AI fundamentalslarge language modelsmathematical modeling
0 likes · 11 min read
Why Large Language Models Are Just Mathematical Functions: A Rational Perspective
DevOps
DevOps
May 28, 2025 · Artificial Intelligence

Google Proposes a “Sufficient Context” Framework to Strengthen Enterprise Retrieval‑Augmented Generation Systems

Google researchers introduce a “sufficient context” framework that classifies retrieved passages as adequate or inadequate for answering a query, enabling large language models in enterprise RAG systems to decide when to answer, refuse, or request more information, thereby improving accuracy and reducing hallucinations.

AI ReliabilityContext EvaluationRAG
0 likes · 9 min read
Google Proposes a “Sufficient Context” Framework to Strengthen Enterprise Retrieval‑Augmented Generation Systems
JD Tech Talk
JD Tech Talk
May 27, 2025 · Artificial Intelligence

Solving Real-World AI Challenges at JD Retail: Reward Model Ensembles, Query Expansion, and Model Pruning

This article recounts how JD Retail's young algorithm engineers tackled diverse AI problems—optimizing reward‑model ensembles for ad image generation, building large‑language‑model‑based query expansion, and pruning diffusion models with FFT and RDP—while sharing their technical approaches, code snippets, and growth reflections.

AIalgorithm engineeringlarge language models
0 likes · 14 min read
Solving Real-World AI Challenges at JD Retail: Reward Model Ensembles, Query Expansion, and Model Pruning
Efficient Ops
Efficient Ops
May 26, 2025 · Artificial Intelligence

How AI Agents Are Revolutionizing AIOps: Boosting Automation and Efficiency

This article explains how AI agents enhance large‑model capabilities for AIOps, detailing single‑agent use cases like knowledge retrieval, tool guidance, and fault diagnosis, as well as multi‑agent collaborations, required skills, and future prospects for autonomous operations.

AIAIOpsAutomation
0 likes · 7 min read
How AI Agents Are Revolutionizing AIOps: Boosting Automation and Efficiency
Architect
Architect
May 26, 2025 · Artificial Intelligence

Parallelism Strategies for Large-Scale Model Training: Data, Tensor, Pipeline, Sequence, and Expert Parallelism

This article explains the memory limits of a single GPU and systematically introduces data parallelism, tensor parallelism, pipeline parallelism, sequence parallelism, and expert parallelism, describing their communication costs, advantages, drawbacks, and practical implementation details for training large AI models.

AI trainingData Parallelismexpert parallelism
0 likes · 14 min read
Parallelism Strategies for Large-Scale Model Training: Data, Tensor, Pipeline, Sequence, and Expert Parallelism
JD Tech
JD Tech
May 26, 2025 · Artificial Intelligence

Solving Technical Challenges at JD Retail: Multi‑Reward Models, LLM‑Based Query Expansion, Model Pruning, and Reinforcement Learning

This article details how JD Retail's young algorithm engineers tackled a series of AI engineering problems—including advertising image quality assessment with multi‑reward models, large‑language‑model‑driven query expansion, FFT‑and‑RDP‑based model pruning, and agent‑centric reinforcement learning—while sharing practical growth insights and code snippets.

AIComputer Visionlarge language models
0 likes · 15 min read
Solving Technical Challenges at JD Retail: Multi‑Reward Models, LLM‑Based Query Expansion, Model Pruning, and Reinforcement Learning
DataFunTalk
DataFunTalk
May 24, 2025 · Artificial Intelligence

Why Apple and WeChat’s AI Rollouts Are Slower Than Expected

The article analyses how privacy concerns, data‑security priorities and an application‑first strategy cause both Apple’s Apple Intelligence and WeChat’s AI features to lag behind hype, examining product decisions, technical constraints, and the potential future of AI agents within these ecosystems.

AI integrationAppleWeChat
0 likes · 13 min read
Why Apple and WeChat’s AI Rollouts Are Slower Than Expected