Tagged articles
1025 articles
Page 9 of 11
JD Tech Talk
JD Tech Talk
Jun 25, 2024 · Artificial Intelligence

Understanding Large Language Models: From Parameters to Transformer Architecture

This article explains the fundamental concepts behind large language models, including their two-file structure, training process, neural network basics, perceptron examples, weight and threshold calculations, the TensorFlow Playground, and a detailed walkthrough of the Transformer architecture with tokenization, positional encoding, self‑attention, normalization, and feed‑forward layers.

Large Language ModelsNeural NetworksSelf-Attention
0 likes · 20 min read
Understanding Large Language Models: From Parameters to Transformer Architecture
NewBeeNLP
NewBeeNLP
Jun 24, 2024 · Artificial Intelligence

How Domain Large Models Are Shaping the Future of AI: Challenges and Solutions

This article reviews Fudan University's Knowledge Factory Lab research on domain large models, covering background, three major deployment challenges, data‑selection strategies, ability‑enhancement techniques, collaborative workflows, and retrieval‑augmented generation methods that aim to make large models practical for real‑world tasks.

Large Language ModelsModel Alignmentdomain adaptation
0 likes · 18 min read
How Domain Large Models Are Shaping the Future of AI: Challenges and Solutions
DataFunSummit
DataFunSummit
Jun 23, 2024 · Artificial Intelligence

Tongyi Xingchen Personalized Large Model: Technical Overview and Applications

This article summarizes the development background of large language models, Alibaba's progression in foundational and personalized AI, the design and capabilities of the Tongyi Xingchen personalized model, its multimodal and agent-based architecture, various industry use cases, and the safety and responsibility measures applied to ensure trustworthy AI deployment.

AI SafetyLarge Language ModelsMultimodal AI
0 likes · 13 min read
Tongyi Xingchen Personalized Large Model: Technical Overview and Applications
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Jun 21, 2024 · Databases

How Vector Databases and Large Models are Transforming AI-Driven Database Operations

This article reviews the evolution of databases and large models, explains the role of vector databases and Retrieval‑Augmented Generation (RAG) in AI‑enhanced data management, and showcases Baidu Cloud's VectorDB and DBSC solutions for intelligent database operations and knowledge‑driven services.

AI4DBDatabase operationsLarge Language Models
0 likes · 15 min read
How Vector Databases and Large Models are Transforming AI-Driven Database Operations
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Jun 20, 2024 · Artificial Intelligence

Xiaohongshu 2024 Large Model Frontier Paper Sharing Live Event

On June 27, 2024, Xiaohongshu’s technical team will livestream a two‑hour session across WeChat Channels, Bilibili, Douyin and Xiaohongshu, showcasing six top‑conference papers on large‑model advances—including early‑stopping and fine‑grained self‑consistency, novel evaluation methods, negative‑sample‑assisted distillation, and LLM‑based note recommendation—followed by a Q&A and recruitment briefing.

AI researchLarge Language ModelsModel Evaluation
0 likes · 12 min read
Xiaohongshu 2024 Large Model Frontier Paper Sharing Live Event
JD Tech Talk
JD Tech Talk
Jun 20, 2024 · Artificial Intelligence

Applying Large Language Models to Courier Operations: Intelligent Operations, Q&A, Prompting, and Agents

This article describes how large language models such as ChatGPT are integrated into courier terminal systems to automate tasks, enhance intelligent voice operations, enable retrieval‑augmented question answering, generate smart prompts, and explore agent‑based workflows, supported by code examples for data extraction, splitting, and embedding.

AI for logisticsIntelligent OperationsLarge Language Models
0 likes · 14 min read
Applying Large Language Models to Courier Operations: Intelligent Operations, Q&A, Prompting, and Agents
DataFunSummit
DataFunSummit
Jun 18, 2024 · Artificial Intelligence

Conditional and Multimodal Knowledge Graph Construction, Extraction, and Integration with Large Models

This article presents a comprehensive overview of conditional and multimodal knowledge graphs, covering their background, construction pipelines, extraction techniques, dataset creation, semi‑supervised learning strategies, and how they can be fused with large language models for enhanced reasoning and application in tasks such as intelligent QA and video scene graph generation.

Information ExtractionLarge Language Modelsai
0 likes · 23 min read
Conditional and Multimodal Knowledge Graph Construction, Extraction, and Integration with Large Models
DataFunTalk
DataFunTalk
Jun 15, 2024 · Artificial Intelligence

Research on Domain Large Models by Fudan University Knowledge Factory Lab

This article presents Fudan University's Knowledge Factory Lab research on domain large models, covering background, challenges, data selection, source‑enhanced tagging, capability improvements, self‑correction, collaborative workflows, and retrieval‑augmented generation for practical AI deployment.

AI researchLarge Language Modelsdomain adaptation
0 likes · 16 min read
Research on Domain Large Models by Fudan University Knowledge Factory Lab
Baidu Tech Salon
Baidu Tech Salon
Jun 14, 2024 · Artificial Intelligence

Why Large Models Signal the Dawn of General AI: Insights from Baidu’s CTO

In a keynote at the 2024 Beijing Zhiyuan Conference, Baidu’s CTO Wang Haifeng explained how large‑model universality and comprehensive capabilities are driving artificial general intelligence forward, highlighting scale laws, multimodal advances, agent technologies, and the industrial‑scale production of AI.

AI industrializationAI trendsDeep Learning
0 likes · 7 min read
Why Large Models Signal the Dawn of General AI: Insights from Baidu’s CTO
DataFunTalk
DataFunTalk
Jun 14, 2024 · Artificial Intelligence

Shopee's E‑commerce Knowledge Graph Construction and Integration with Large Models

This article presents Shopee's comprehensive exploration of building an e‑commerce knowledge graph, detailing its challenges, construction pipeline, AI‑driven extraction and fusion techniques, multilingual and multimodal modeling, and practical applications ranging from search and recommendation to AI assistants and real‑time updates.

AI applicationsInformation ExtractionLarge Language Models
0 likes · 21 min read
Shopee's E‑commerce Knowledge Graph Construction and Integration with Large Models
AntTech
AntTech
Jun 13, 2024 · Artificial Intelligence

Exploring Multi‑Agent Applications in Financial Scenarios and the agentUniverse Framework

The article reviews the evolution from large language models to stateful agents, discusses the specific challenges of information‑dense, knowledge‑dense, and decision‑dense financial tasks, and introduces the open‑source agentUniverse multi‑agent framework with its PEER collaboration model and real‑world investment‑research applications.

AI Research AssistantFinancial AILarge Language Models
0 likes · 18 min read
Exploring Multi‑Agent Applications in Financial Scenarios and the agentUniverse Framework
AntTech
AntTech
Jun 6, 2024 · Information Security

AIGC Era Trends in Next‑Generation Identity Recognition: DeepFake Risks, AIGC as a New Production Force, and Cross‑Terminal Interaction

The talk at the 18th Security Identification Technology Expo and Summit outlines three emerging trends for identity verification in the AIGC era: the surge of deep‑fake attacks, the use of generative AI as a new data‑production engine, and the shift toward cross‑device, agent‑based authentication paradigms.

AIGCBiometricsIdentity verification
0 likes · 10 min read
AIGC Era Trends in Next‑Generation Identity Recognition: DeepFake Risks, AIGC as a New Production Force, and Cross‑Terminal Interaction
Baobao Algorithm Notes
Baobao Algorithm Notes
Jun 3, 2024 · Artificial Intelligence

Can Adversarial Training Make Retrieval‑Augmented Generators More Robust?

Recent arXiv work introduces ATM, an adversarially‑tuned multi‑agent system that iteratively pits a fake‑knowledge attacker against a generator, dramatically improving retrieval‑augmented language models’ resistance to hallucinated content and boosting performance on knowledge‑intensive benchmarks, even with noisy or irrelevant documents.

Large Language ModelsRAGadversarial training
0 likes · 12 min read
Can Adversarial Training Make Retrieval‑Augmented Generators More Robust?
58 Tech
58 Tech
Jun 3, 2024 · Artificial Intelligence

Parameter-Efficient Fine-Tuning (PEFT) Methods for Large Language Models: LoRA, QLoRA, AdaLoRA, SoRA, and Training Acceleration with Unsloth

This article systematically analyzes popular parameter‑efficient fine‑tuning (PEFT) techniques for large language models—including Adapter Tuning, Prefix Tuning, LoRA, QLoRA, AdaLoRA, and SoRA—detailing their principles, implementation code, experimental results on NLU tasks, and practical acceleration using the Unsloth library.

AdaLoRALarge Language ModelsLoRA
0 likes · 39 min read
Parameter-Efficient Fine-Tuning (PEFT) Methods for Large Language Models: LoRA, QLoRA, AdaLoRA, SoRA, and Training Acceleration with Unsloth
DataFunSummit
DataFunSummit
Jun 1, 2024 · Artificial Intelligence

Graph Foundation Models: Concepts, Progress, and Future Directions

This article provides a comprehensive overview of Graph Foundation Models (GFMs), covering their definition, key characteristics, historical development of graph machine learning, recent research trends such as PT‑HGNN, Specformer, and GraphTranslator, and discusses future challenges and research directions.

Large Language Modelsfoundation-modelsgraph neural networks
0 likes · 23 min read
Graph Foundation Models: Concepts, Progress, and Future Directions
DataFunTalk
DataFunTalk
May 31, 2024 · Artificial Intelligence

The Role of Knowledge Graphs in Industry: Importance, Product Forms, and Practical Cases

This article explains why knowledge graphs are crucial for industrial applications, describes the main product forms and architectural considerations, and shares real‑world case studies illustrating how AI, large models, and graph databases can be combined to improve knowledge management and decision‑making.

Graph DatabaseIndustrial ApplicationsLarge Language Models
0 likes · 20 min read
The Role of Knowledge Graphs in Industry: Importance, Product Forms, and Practical Cases
Kuaishou Tech
Kuaishou Tech
May 27, 2024 · Artificial Intelligence

What Kuaishou’s Four ACL Papers Reveal About the Future of Large Language Models

The 62nd ACL conference accepted four papers from Kuaishou that explore multi‑turn instruction following, self‑agreement reasoning, fine‑grained reinforcement learning, and dynamic routing in Mixture‑of‑Experts models, each with detailed methods, experimental results, author lists, and public arXiv links.

ACL 2024Kuaishou ResearchLarge Language Models
0 likes · 11 min read
What Kuaishou’s Four ACL Papers Reveal About the Future of Large Language Models
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
May 27, 2024 · Databases

Baidu’s Enterprise Vector Database: Architecture, Performance, and RAG Secrets

An exclusive interview with Baidu’s senior database architects reveals the motivations behind building a dedicated enterprise vector database, details its novel column‑store engine, C++‑based retrieval stack, performance gains over open‑source solutions, multi‑modal support, RAG integration, and future research directions.

Large Language ModelsRAGStorage Engine
0 likes · 28 min read
Baidu’s Enterprise Vector Database: Architecture, Performance, and RAG Secrets
Baidu Tech Salon
Baidu Tech Salon
May 20, 2024 · Artificial Intelligence

Boosting Ad Efficiency with Baidu’s Multi‑Agent AI Architecture

In the AI‑native era, Baidu's ad platform adopts a multi‑agent architecture that combines large and small LLMs, SOP‑driven workflows, long‑term memory, and vector databases to achieve high query accuracy, low latency, and significant business gains while tackling challenges such as hallucination, planning, execution, and personalization.

AI agentsLLM optimizationLarge Language Models
0 likes · 18 min read
Boosting Ad Efficiency with Baidu’s Multi‑Agent AI Architecture
NewBeeNLP
NewBeeNLP
May 16, 2024 · Artificial Intelligence

How Large Language Models Transform Advertising Copy Generation

This article examines the adoption of large language models for intelligent advertising copy creation, detailing business challenges, model selection criteria, training data preparation, fine‑tuning methods, performance evaluation, deployment results, while highlighting the trade‑offs between model size, cost, and output quality.

AI marketingFine-tuningLarge Language Models
0 likes · 20 min read
How Large Language Models Transform Advertising Copy Generation
DeWu Technology
DeWu Technology
May 15, 2024 · Artificial Intelligence

Accelerating Large Language Model Inference: Techniques and Framework Recommendations

Deploying a dedicated inference cluster and applying four key optimizations—FlashAttention‑based attention computation, PageAttention KV‑cache management, Mixture‑of‑Experts parameter reduction, and tensor parallelism—can accelerate large language model inference by up to 50% for models as large as 70 B parameters while cutting deployment costs.

FlashAttentionInference AccelerationLarge Language Models
0 likes · 17 min read
Accelerating Large Language Model Inference: Techniques and Framework Recommendations
Baidu Geek Talk
Baidu Geek Talk
May 15, 2024 · Artificial Intelligence

Accelerating Large Model Training and Inference with Baidu Baige AIAK‑LLM: Challenges, Techniques, and Optimizations

The talk outlines how Baidu’s Baige AIAK‑LLM suite tackles the exploding compute demands of trillion‑parameter models by boosting Model FLOPS Utilization through advanced parallelism, memory‑saving recompute, zero‑offload, adaptive scheduling, and cross‑chip orchestration, delivering 30‑60% training and inference speedups and a unified cloud product.

AI InfrastructureBaiduInference Optimization
0 likes · 25 min read
Accelerating Large Model Training and Inference with Baidu Baige AIAK‑LLM: Challenges, Techniques, and Optimizations
NewBeeNLP
NewBeeNLP
May 15, 2024 · Artificial Intelligence

How Large Language Models and Knowledge Graphs Can Boost Each Other

This talk reviews recent advances in large language models, compares them with knowledge graphs, explores how LLMs enhance knowledge extraction and completion, examines how knowledge graphs aid LLM evaluation and safe deployment, and outlines future interactive integration between the two technologies.

AI researchKnowledge GraphsLarge Language Models
0 likes · 13 min read
How Large Language Models and Knowledge Graphs Can Boost Each Other
21CTO
21CTO
May 11, 2024 · Artificial Intelligence

Will U.S. AI Export Controls Stall Global Large‑Model Development?

The United States is drafting a bipartisan bill to impose export controls on advanced proprietary AI models, aiming to shield its technology from China, Russia, North Korea and Iran, while confronting challenges of open‑source model regulation and potential geopolitical retaliation.

AI policyChinaExport controls
0 likes · 7 min read
Will U.S. AI Export Controls Stall Global Large‑Model Development?
Baidu Tech Salon
Baidu Tech Salon
May 10, 2024 · Artificial Intelligence

Baidu Comate: Core Capabilities of Intelligent Code Assistant

The article surveys Baidu Comate, an AI‑powered code assistant built on the Wenxin (ERNIE) large model, tracing software development from the 1950s crisis through the internet and open‑source era to today’s AI‑driven tools, and highlights its features and demonstration at a global development conference.

AI CodingBaidu ComateIDE plugin
0 likes · 7 min read
Baidu Comate: Core Capabilities of Intelligent Code Assistant
Architects' Tech Alliance
Architects' Tech Alliance
May 9, 2024 · Artificial Intelligence

AI Servers: Market Opportunities, Architecture, and Future Demand Driven by Generative AI

The article examines how the surge of generative AI (AIGC) is fueling rapid growth in AI server demand, detailing the emerging AIGC ecosystem, server hardware composition, model scaling, heterogeneous computing, training vs. inference workloads, market size forecasts, and the competitive landscape of AI server manufacturers.

AI InfrastructureAI serversGPU
0 likes · 15 min read
AI Servers: Market Opportunities, Architecture, and Future Demand Driven by Generative AI
DataFunTalk
DataFunTalk
May 7, 2024 · Artificial Intelligence

Large Language Models and Knowledge Graphs: Recent Advances, Synergies, and Future Directions

This article reviews the rapid progress of large language models, compares them with knowledge graphs, explores how LLMs can aid knowledge extraction and completion, discusses how knowledge graphs can evaluate and enhance LLMs, and outlines future interactive integration between the two technologies.

Information ExtractionKnowledge GraphsLarge Language Models
0 likes · 12 min read
Large Language Models and Knowledge Graphs: Recent Advances, Synergies, and Future Directions
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
May 2, 2024 · Artificial Intelligence

Understanding Large Language Models: Principles, Training, Risks, and Application Security

This article provides a comprehensive overview of large language models (LLMs), explaining their core concepts, transformer architecture, training stages, known shortcomings such as hallucination and reversal curse, and highlights emerging security threats like prompt injection and jailbreaking, offering guidance for safe deployment.

AI SafetyLLMLarge Language Models
0 likes · 21 min read
Understanding Large Language Models: Principles, Training, Risks, and Application Security
21CTO
21CTO
Apr 28, 2024 · Artificial Intelligence

5 Transformative Business Use Cases for Conversational AI

This article explores how conversational AI, powered by large language models, is reshaping enterprise operations across five key scenarios—from customer support assistants and AI‑driven data interfaces to HR bots, unstructured data processing, and multi‑agent digital assistants—highlighting benefits, implementation considerations, and privacy challenges.

Conversational AIData IntegrationLarge Language Models
0 likes · 13 min read
5 Transformative Business Use Cases for Conversational AI
DataFunTalk
DataFunTalk
Apr 26, 2024 · Artificial Intelligence

Large Language Models in the Automotive Industry: Overview, Impact, and Practical Exploration

This article examines how large language models such as GPT and Transformer‑based architectures are reshaping the automotive sector by enhancing in‑vehicle intelligence, streamlining product development, improving customer service, and redefining data analyst roles, while also presenting practical experiments, deployment challenges, and future directions.

Automotive AIGPTLLM applications
0 likes · 18 min read
Large Language Models in the Automotive Industry: Overview, Impact, and Practical Exploration
Sohu Tech Products
Sohu Tech Products
Apr 24, 2024 · Artificial Intelligence

Evolution, Architecture, Training Data, Methods, and Performance of Meta's Llama Series (Llama 1, 2, 3)

Meta's Llama series has progressed from the 7‑65B Llama‑1 in early 2023 to the 8B and 70B Llama‑3 in 2024, scaling token counts from 1 T to over 15 T, adopting decoder‑only Transformers with RMSNorm, SwiGLU, RoPE and GQA, and adding supervised fine‑tuning, RLHF and DPO, resulting in state‑of‑the‑art benchmark performance and a vibrant open‑source ecosystem.

LLaMALarge Language ModelsModel architecture
0 likes · 25 min read
Evolution, Architecture, Training Data, Methods, and Performance of Meta's Llama Series (Llama 1, 2, 3)
21CTO
21CTO
Apr 23, 2024 · Artificial Intelligence

Deploy Large Language Models with vLLM and Quantization for Low Latency

This guide explains how to deploy open‑source large language models using vLLM, benchmark latency and throughput, and apply 8‑bit/4‑bit quantization techniques such as BitsandBytes and NF4 to achieve faster inference on limited‑GPU hardware.

LLM deploymentLarge Language ModelsPython
0 likes · 13 min read
Deploy Large Language Models with vLLM and Quantization for Low Latency
MoonWebTeam
MoonWebTeam
Apr 23, 2024 · Artificial Intelligence

Exploring Devika AI: An Open‑Source AI Programmer’s Capabilities and Limits

Devika AI, an open‑source AI programmer from Stition AI, is examined for its architecture, supported actions, installation steps, and real‑world performance across tasks such as building a Snake game, Conway’s Game of Life, Vue3 components, and unit‑test generation, highlighting strengths, weaknesses, and future potential.

Devika AILarge Language Modelstool evaluation
0 likes · 21 min read
Exploring Devika AI: An Open‑Source AI Programmer’s Capabilities and Limits
NewBeeNLP
NewBeeNLP
Apr 22, 2024 · Artificial Intelligence

Why LLAMA‑3’s Scaling Laws Signal the Next AI Frontier

The article analyzes LLAMA‑3’s architectural tweaks, massive data expansion, scaling‑law implications, open‑source versus closed‑source dynamics, and the critical role of synthetic data in sustaining large‑model progress beyond 2025.

LLAMA-3Large Language Modelsopen-source AI
0 likes · 10 min read
Why LLAMA‑3’s Scaling Laws Signal the Next AI Frontier
Baobao Algorithm Notes
Baobao Algorithm Notes
Apr 21, 2024 · Artificial Intelligence

Why Llama 3’s Open‑Source Release Could Redefine Large‑Model Scaling and Synthetic Data

The article analyzes Llama 3’s architecture, training data expansion, model variants, Meta’s open‑source strategy, the evolving gap between open and closed models, and how future breakthroughs in synthetic data will shape scaling laws and large‑model progress through 2025 and beyond.

AI trendsLarge Language ModelsLlama3
0 likes · 12 min read
Why Llama 3’s Open‑Source Release Could Redefine Large‑Model Scaling and Synthetic Data
Xiaohe Frontend Team
Xiaohe Frontend Team
Apr 21, 2024 · Artificial Intelligence

What’s New in Generative AI? VASA‑1, Llama‑3, Stable Diffusion 3 & More

The article reviews the latest breakthroughs in generative AI, including Microsoft’s VASA‑1 video synthesis model, Meta’s open‑source Llama‑3 large language model, Stability AI’s Stable Diffusion 3 API, Adobe’s integration of third‑party AI video tools into Premiere Pro, and a free image‑style‑recreation platform from Freepik, highlighting their technical details and potential applications.

AI toolsDiffusion ModelsLarge Language Models
0 likes · 13 min read
What’s New in Generative AI? VASA‑1, Llama‑3, Stable Diffusion 3 & More
AntTech
AntTech
Apr 19, 2024 · Artificial Intelligence

OneKE: Open-Source Bilingual Knowledge Extraction Framework for Large Language Models

OneKE, an open‑source bilingual (Chinese‑English) knowledge extraction framework jointly developed by Ant Group and Zhejiang University, enables efficient extraction of entities, relations, and events to build domain knowledge graphs that enhance large language models’ reasoning, reduce hallucinations, and support applications in medical, financial, and governmental sectors.

Artificial IntelligenceKnowledge GraphsLarge Language Models
0 likes · 5 min read
OneKE: Open-Source Bilingual Knowledge Extraction Framework for Large Language Models
DevOps
DevOps
Apr 17, 2024 · Artificial Intelligence

Engineering Capabilities for Enterprise Large Model Applications: Prompt Engineering, RAG, and Fine‑Tuning

The article explores how enterprises can build and improve large‑model applications by combining prompt engineering, retrieval‑augmented generation (RAG), and fine‑tuning, discusses their relationships, optimization dimensions, testing challenges, and provides practical guidance for SE4AI implementation.

AI EngineeringEnterprise AIFine-tuning
0 likes · 20 min read
Engineering Capabilities for Enterprise Large Model Applications: Prompt Engineering, RAG, and Fine‑Tuning
AntTech
AntTech
Apr 17, 2024 · Artificial Intelligence

LLMRG: Improving Recommendations through Large Language Model Reasoning Graphs

LLMRG introduces a novel framework that leverages large language models to construct personalized reasoning graphs, integrating chain reasoning, self‑verification, divergent extension, and knowledge‑base self‑improvement, thereby enhancing recommendation accuracy, interpretability, and performance across multiple benchmark datasets without additional user or item information.

InterpretabilityLarge Language ModelsRecommendation Systems
0 likes · 9 min read
LLMRG: Improving Recommendations through Large Language Model Reasoning Graphs
360 Tech Engineering
360 Tech Engineering
Apr 15, 2024 · Artificial Intelligence

Fine‑Tuning Large Language Models: A Practical Guide Using Qwen‑14B on the 360AI Platform

This article explains the concept, motivations, and step‑by‑step workflow for fine‑tuning large language models—specifically Qwen‑14B—covering data preparation, training commands with DeepSpeed, hyper‑parameter settings, evaluation, and deployment via FastChat, all illustrated with code snippets and configuration details.

DeepSpeedFastChatFine-tuning
0 likes · 10 min read
Fine‑Tuning Large Language Models: A Practical Guide Using Qwen‑14B on the 360AI Platform
DataFunSummit
DataFunSummit
Apr 13, 2024 · Artificial Intelligence

Understanding and Mitigating Hallucinations in Large Language Model Industry Q&A with Knowledge Graphs

This article examines why large language models often produce hallucinations in industry question‑answering, defines the phenomenon, explores its data and training origins, proposes evaluation metrics, and presents practical strategies—including high‑quality fine‑tuning data, honest refusal mechanisms, advanced decoding methods, and external knowledge‑graph augmentation—to reduce hallucinations and improve reliability.

AI EvaluationLarge Language Modelshallucination
0 likes · 21 min read
Understanding and Mitigating Hallucinations in Large Language Model Industry Q&A with Knowledge Graphs
NewBeeNLP
NewBeeNLP
Apr 13, 2024 · Artificial Intelligence

How a Multimodal ‘Joke‑King’ Model Beats GPT‑4 at Humor Generation

A research team from Sun Yat‑sen University, Sea AI Lab and Harvard built a multimodal large model that learns to generate creative jokes and memes by training on the Oogiri‑GO dataset, introducing a Leap‑of‑Thought (LoT) paradigm and CLoT fine‑tuning, which outperforms GPT‑4 and other state‑of‑the‑art models in humor tasks.

CLoTLarge Language ModelsLeap-of-Thought
0 likes · 9 min read
How a Multimodal ‘Joke‑King’ Model Beats GPT‑4 at Humor Generation
Data Thinking Notes
Data Thinking Notes
Apr 11, 2024 · Artificial Intelligence

How Financial Institutions Are Building Their Own Large Language Models

This article explores how the finance sector is creating specialized large language models—covering the shift from generic to domain‑specific models, training innovations, evaluation methods, and real‑world applications such as marketing, customer service, risk control, and operational analytics.

ApplicationsLarge Language ModelsModel Training
0 likes · 16 min read
How Financial Institutions Are Building Their Own Large Language Models
Cloud Native Technology Community
Cloud Native Technology Community
Apr 11, 2024 · Cloud Native

Why Kubernetes Is the Ideal Platform for Deploying Large Language Models

Deploying large language models demands massive compute, flexible scaling, and robust resource management, and this article explains how Kubernetes’s auto‑scaling, portability, cloud‑native features, observability tools, and multi‑tenant isolation make it the optimal platform for training, serving, and iterating LLM workloads.

Cloud NativeDistributed TrainingKubernetes
0 likes · 17 min read
Why Kubernetes Is the Ideal Platform for Deploying Large Language Models
DataFunSummit
DataFunSummit
Apr 9, 2024 · Artificial Intelligence

Knowledge Map for Large Model Application Development

This article outlines a comprehensive knowledge map for building large‑model applications, detailing a four‑layer technical architecture, development lifecycle, core elements such as prompt engineering and fine‑tuning, evaluation methods, and real‑world case studies across various AI use cases.

AI application developmentLarge Language Modelsmodel fine-tuning
0 likes · 12 min read
Knowledge Map for Large Model Application Development
NewBeeNLP
NewBeeNLP
Apr 8, 2024 · Artificial Intelligence

What Will Recommendation Systems Look Like in 2026? Emerging Trends and Challenges

This article analyzes the current bottlenecks of conventional recommendation systems and outlines ten forward‑looking research directions for 2026, including retention improvement, user growth, content ecosystem, multi‑objective Pareto optimization, long‑term value estimation, site‑wide optimization, interactive recommendation, personalized modeling, decision‑theoretic framing, and the integration of large language models via the OneRec framework.

Large Language ModelsUser Retentioninteractive recommendation
0 likes · 18 min read
What Will Recommendation Systems Look Like in 2026? Emerging Trends and Challenges
DataFunTalk
DataFunTalk
Apr 4, 2024 · Artificial Intelligence

Enhancing Interactive Agents with Large Language Models: The SwiftSage Framework

This article reviews the challenges of textual‑only large language model interaction, introduces benchmark environments such as AFL World and ScienceWorld, compares baseline reinforcement‑learning approaches, and presents SwiftSage—a hybrid system that combines a fast T5‑based small model with a powerful LLM for planning and grounding, demonstrating superior performance, efficiency, and cost‑effectiveness while outlining current limitations and future research directions.

Large Language ModelsReinforcement LearningSwiftSage
0 likes · 22 min read
Enhancing Interactive Agents with Large Language Models: The SwiftSage Framework
DataFunTalk
DataFunTalk
Apr 3, 2024 · Artificial Intelligence

Future Directions of Recommendation Systems: Retention, User Growth, Content Ecosystem, Multi‑Objective Optimization, and Large‑Model Fusion

This presentation outlines the current bottlenecks of conventional recommendation pipelines and proposes a 2026 roadmap that includes retention improvement, user‑growth strategies, content‑ecosystem metrics, Pareto‑optimal multi‑objective optimization, long‑term value modeling, site‑wide spatial optimization, interactive recommendation, personalized modeling, and the integration of large‑model fusion through the OneRec framework.

Large Language ModelsRecommendation SystemsUser Retention
0 likes · 18 min read
Future Directions of Recommendation Systems: Retention, User Growth, Content Ecosystem, Multi‑Objective Optimization, and Large‑Model Fusion
DataFunTalk
DataFunTalk
Apr 2, 2024 · Artificial Intelligence

User Portrait Algorithms: From Ontology‑Based Methods to Deep Learning and Future Directions

This article provides a comprehensive overview of user portrait algorithms, covering their historical development, ontology‑based traditional approaches, deep‑learning enhancements, representation‑learning techniques such as lookalike, active‑learning driven iteration, and the integration of large‑model world knowledge, while also discussing current challenges and future research directions.

Deep LearningLarge Language ModelsOntology
0 likes · 26 min read
User Portrait Algorithms: From Ontology‑Based Methods to Deep Learning and Future Directions
DataFunSummit
DataFunSummit
Mar 31, 2024 · Artificial Intelligence

Challenges and Techniques in Distributed Training of Large Language Models

This article reviews the rapid development of large language models since 2019, outlines the historical background, identifies key challenges such as massive compute demand, memory constraints, and system complexity, and then details distributed training technologies—including data parallelism, pipeline parallelism, and advanced optimization strategies—while also discussing future research directions and answering common questions.

AI InfrastructureData ParallelismDeepSpeed
0 likes · 23 min read
Challenges and Techniques in Distributed Training of Large Language Models
DaTaobao Tech
DaTaobao Tech
Mar 29, 2024 · Artificial Intelligence

Text-to-SQL with Large Language Models: DIN-SQL Approach

The DIN‑SQL approach enhances Text‑to‑SQL performance by using large language models in a decomposed in‑context learning framework with schema linking, query classification, SQL generation, and self‑correction modules, achieving state‑of‑the‑art 85.3% execution accuracy on the Spider benchmark by breaking complex queries into manageable sub‑tasks.

AI researchDatabase QueryingLarge Language Models
0 likes · 34 min read
Text-to-SQL with Large Language Models: DIN-SQL Approach
Sohu Tech Products
Sohu Tech Products
Mar 27, 2024 · Artificial Intelligence

NVIDIA NeMo Framework, TensorRT‑LLM, and RAG for Large Language Model Solutions

NVIDIA’s comprehensive LLM ecosystem combines the full‑stack NeMo Framework for data curation, distributed training, fine‑tuning, inference acceleration with TensorRT‑LLM and Triton, plus Retrieval‑Augmented Generation and Guardrails, enabling efficient, low‑latency, knowledge‑grounded model deployment across clusters.

AI accelerationLarge Language ModelsModel Training
0 likes · 16 min read
NVIDIA NeMo Framework, TensorRT‑LLM, and RAG for Large Language Model Solutions
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Mar 26, 2024 · Artificial Intelligence

MoE LLMs: How Alibaba Cloud & NVIDIA Megatron-Core Accelerate Training

This article reviews the evolution of Mixture-of-Experts (MoE) models, details Alibaba Cloud’s collaboration with NVIDIA’s Megatron-Core to build a high-performance MoE framework, and presents extensive training optimizations, benchmark results, conversion tools, and best-practice guidelines for large-scale LLM development and deployment.

Alibaba CloudLarge Language ModelsMegatron-Core
0 likes · 18 min read
MoE LLMs: How Alibaba Cloud & NVIDIA Megatron-Core Accelerate Training
NewBeeNLP
NewBeeNLP
Mar 21, 2024 · Artificial Intelligence

Mastering Large Language Model Training: Key Challenges and Optimization Strategies

This article examines the resource and efficiency challenges of scaling large language model training, explains data, model, pipeline, and tensor parallelism, and provides practical I/O, communication, and stability optimization techniques—including high‑availability storage, RDMA networking, NCCL tuning, and fault‑tolerant recovery—to improve throughput and reliability.

AI EngineeringDistributed TrainingI/O optimization
0 likes · 15 min read
Mastering Large Language Model Training: Key Challenges and Optimization Strategies
TAL Education Technology
TAL Education Technology
Mar 20, 2024 · Artificial Intelligence

Understanding AI: From Brain Differences to Data Science Practices and Large Model Applications

This article explains why current AI cannot achieve self‑awareness, outlines data‑science steps for large models—including preprocessing, exploratory analysis, modeling, and evaluation—then surveys general and vertical applications of large language models and details a complete machine‑learning workflow with transformer fine‑tuning techniques.

ApplicationsData ScienceFine-tuning
0 likes · 14 min read
Understanding AI: From Brain Differences to Data Science Practices and Large Model Applications
DataFunTalk
DataFunTalk
Mar 20, 2024 · Artificial Intelligence

Challenges and Optimization Techniques for Large Language Model Training

The article outlines the resource and efficiency challenges of scaling large language models, explains data and model parallelism strategies, and details practical I/O, communication, and stability optimizations—including high‑availability storage, RDMA networking, and fault‑tolerance measures—to improve training throughput and reliability.

AI EngineeringI/O optimizationLarge Language Models
0 likes · 13 min read
Challenges and Optimization Techniques for Large Language Model Training
DataFunTalk
DataFunTalk
Mar 17, 2024 · Artificial Intelligence

Leveraging Large Language Models to Enhance Comprehensive Graph Learning Capabilities

In this talk, researcher Jiang Zhuoren from Zhejiang University reviews the current state of large language models applied to graph learning, discusses their roles across various graph scenarios, and outlines promising research directions for unified cross‑domain graph learning.

Artificial IntelligenceLarge Language Modelscross-domain learning
0 likes · 3 min read
Leveraging Large Language Models to Enhance Comprehensive Graph Learning Capabilities
Model Perspective
Model Perspective
Mar 16, 2024 · Artificial Intelligence

What Watching a TV Drama Reveals About AI Model Training and Learning Strategies

The article draws parallels between expert viewers dissecting the drama "The Legend of Zhen Huan," efficient paper‑reading techniques, and the active‑prediction plus contrast‑learning approach that underpins modern AI model training, highlighting how proactive thinking boosts both personal and machine learning outcomes.

AI trainingLarge Language ModelsPrediction
0 likes · 8 min read
What Watching a TV Drama Reveals About AI Model Training and Learning Strategies
DataFunSummit
DataFunSummit
Mar 14, 2024 · Artificial Intelligence

Multi‑Level Efficiency Challenges and Emerging Paradigms for Large AI Models

The article examines how large AI models are moving toward a unified, low‑knowledge‑density paradigm that raises computational efficiency challenges across model, algorithm, framework, and infrastructure layers, while also highlighting NVIDIA's GTC 2024 China AI Day sessions that showcase practical solutions and upcoming training opportunities.

AI InfrastructureAI conferencesLarge Language Models
0 likes · 10 min read
Multi‑Level Efficiency Challenges and Emerging Paradigms for Large AI Models
21CTO
21CTO
Mar 12, 2024 · Artificial Intelligence

How Google’s ‘Social Learning’ AI Framework Boosts Privacy‑Safe Model Training

Google’s newly unveiled “Social Learning” AI framework lets large models teach each other via natural language, improving task performance while avoiding direct use of sensitive data, and uses teacher‑student interactions, synthetic data, and instruction generation to enhance privacy‑preserving model training.

Large Language Modelsaiprivacy
0 likes · 4 min read
How Google’s ‘Social Learning’ AI Framework Boosts Privacy‑Safe Model Training
DataFunTalk
DataFunTalk
Mar 10, 2024 · Artificial Intelligence

Aligning Graph Models with Large Language Models for Open-Task Scenarios

This talk presents GraphTranslator, a framework that bridges pretrained graph models and large language models to enable unified handling of both predefined and open-ended graph analysis tasks by translating node representations into language tokens and training an alignment producer for node‑text pairs.

AI researchLarge Language ModelsModel Alignment
0 likes · 3 min read
Aligning Graph Models with Large Language Models for Open-Task Scenarios
NewBeeNLP
NewBeeNLP
Mar 10, 2024 · Industry Insights

What WWW'24 Papers Reveal About LLMs in Search & Recommendation

This overview summarizes six WWW 2024 industry papers that apply large language models to e‑commerce search, personalized query suggestion, article recommendation, collaborative filtering, and lifelong sequential behavior understanding, highlighting their methods, experimental results, deployment status, and emerging trends in LLM‑driven search and recommendation.

LLMLarge Language ModelsSearch
0 likes · 16 min read
What WWW'24 Papers Reveal About LLMs in Search & Recommendation
DataFunTalk
DataFunTalk
Mar 7, 2024 · Artificial Intelligence

Enhancing Interactive Agents with Large Language Models: The SwiftSage Framework and Benchmark Analysis

This article reviews recent advances in using large language models for interactive embodied agents, introduces the SwiftSage dual‑model framework that combines a fast T5‑based small model with a powerful LLM for planning, evaluates it on benchmarks such as AFL World and ScienceWorld, and discusses efficiency, cost‑effectiveness, limitations, and future research directions.

Large Language ModelsReinforcement LearningSwiftSage
0 likes · 23 min read
Enhancing Interactive Agents with Large Language Models: The SwiftSage Framework and Benchmark Analysis
Model Perspective
Model Perspective
Mar 6, 2024 · Fundamentals

Why Managing a City Is Like Designing a Spaceship: Exploring Complex Systems

An insightful look at how both spacecraft design and city governance exemplify complex systems, distinguishing closed versus open systems, outlining characteristics of complex and mega-complex systems, and linking these concepts to system engineering pioneers like Qian Xuesen and modern large language models.

Large Language ModelsQian Xuesenopen vs closed systems
0 likes · 9 min read
Why Managing a City Is Like Designing a Spaceship: Exploring Complex Systems
DataFunSummit
DataFunSummit
Mar 6, 2024 · Artificial Intelligence

Document Intelligence: Background, Technology, Large Models, and Enterprise Applications

This article presents a comprehensive overview of document intelligence, covering its background, technical evolution, large‑model advancements, and practical enterprise digital transformation use cases, with a focus on multimodal processing, unified document representation, and industry‑specific applications such as legal contract automation.

Document IntelligenceEnterprise AutomationLarge Language Models
0 likes · 14 min read
Document Intelligence: Background, Technology, Large Models, and Enterprise Applications
Efficient Ops
Efficient Ops
Feb 27, 2024 · Artificial Intelligence

Can Large Language Models Truly Elevate Software Engineering? Insights and Roadmap

This article reviews the 2023 surge of large language models in software engineering, evaluates their current code generation, testing, and knowledge‑query capabilities, highlights persistent challenges in design and maintenance, and proposes concrete recommendations for advancing toward higher‑level intelligent development.

Large Language Modelscode-generationdigital twins
0 likes · 21 min read
Can Large Language Models Truly Elevate Software Engineering? Insights and Roadmap
NewBeeNLP
NewBeeNLP
Feb 17, 2024 · Artificial Intelligence

How Sora Highlights the Next Leap Toward AGI and Shifts AI Competition

The article analyzes OpenAI's Sora video model, arguing that its integration of large‑language‑model reasoning with diffusion techniques marks a major step toward true world understanding, reshapes creative workflows, widens the AI talent gap, and accelerates the path to artificial general intelligence.

AGIAI trendsLarge Language Models
0 likes · 7 min read
How Sora Highlights the Next Leap Toward AGI and Shifts AI Competition
NewBeeNLP
NewBeeNLP
Feb 11, 2024 · Industry Insights

What 2023 Taught Us About LLMs and AI‑Guided Optimization

The author reviews a year of rapid progress in large language models, highlighting breakthrough papers such as Positional Interpolation, StreamingLLM, Deja Vu, and RLCD, and discusses how AI‑guided optimization techniques like SurCo, LANCER, and GenCo are reshaping research and industry applications.

AI OptimizationLLMLarge Language Models
0 likes · 13 min read
What 2023 Taught Us About LLMs and AI‑Guided Optimization
DataFunTalk
DataFunTalk
Feb 10, 2024 · Artificial Intelligence

Mitigating Hallucinations in Large Language Model Applications with Knowledge Graphs

This article examines the challenges of using large language models for industry Q&A, defines hallucination phenomena, evaluates their causes and impact, and proposes a set of strategies—including high‑quality fine‑tuning data, honest alignment, advanced decoding, and external knowledge‑graph augmentation—to reduce hallucinations and improve answer reliability.

Large Language ModelsModel Evaluationhallucination
0 likes · 21 min read
Mitigating Hallucinations in Large Language Model Applications with Knowledge Graphs
Cloud Native Technology Community
Cloud Native Technology Community
Feb 8, 2024 · Artificial Intelligence

How Retrieval‑Augmented Generation Boosts LLM Accuracy and Trust

Retrieval‑augmented generation (RAG) enhances large language models by fetching up‑to‑date, authoritative information from external sources, addressing hallucinations, outdated knowledge, and lack of citations, while offering cost‑effective implementation, improved relevance, user trust, and greater developer control through vector databases, semantic search, and prompt engineering.

Large Language ModelsPrompt engineeringRAG
0 likes · 10 min read
How Retrieval‑Augmented Generation Boosts LLM Accuracy and Trust
DataFunSummit
DataFunSummit
Feb 5, 2024 · Artificial Intelligence

Ant Group's Knowledge Graph: Overview, Construction, Applications, and Integration with Large Models

Ant Group shares its comprehensive knowledge graph initiatives, detailing the fundamentals, construction pipeline, fusion techniques, cognitive representations, diverse business applications, and the emerging synergy between knowledge graphs and large language models, illustrating how graph-based AI enhances accuracy, interpretability, and downstream services.

Artificial IntelligenceData IntegrationGraph Fusion
0 likes · 14 min read
Ant Group's Knowledge Graph: Overview, Construction, Applications, and Integration with Large Models
MaGe Linux Operations
MaGe Linux Operations
Jan 31, 2024 · Artificial Intelligence

Does Gemini Pro Really Outperform GPT‑4? A Deep Comparative Review

This article critically examines Google’s Gemini Pro against OpenAI’s GPT‑4 across reasoning, vision, token limits, benchmark data, and real‑world tasks, revealing where Gemini excels, where it falls short, and what to expect from the upcoming Gemini Ultra.

AI model comparisonGPT-4Gemini Pro
0 likes · 13 min read
Does Gemini Pro Really Outperform GPT‑4? A Deep Comparative Review
DataFunTalk
DataFunTalk
Jan 31, 2024 · Artificial Intelligence

Industry Trends and Challenges of Large Language Models in Enterprise Applications (2023 Review)

The article reviews the rapid development of large language models in enterprise settings, covering internal collaboration tools, AI assistants for development and marketing, multimodal generation, inference speed bottlenecks, resource constraints, and future directions such as open‑source models and academic‑industry cooperation.

AI assistantsAI in marketingEnterprise AI
0 likes · 8 min read
Industry Trends and Challenges of Large Language Models in Enterprise Applications (2023 Review)
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jan 29, 2024 · Artificial Intelligence

Unlocking Sparse MoE Large Model Training with Megatron-Core on Alibaba Cloud

This article explains how Alibaba Cloud's PAI platform and NVIDIA's Megatron-Core enable efficient training of sparse Mixture-of-Experts (MoE) large language models, covering algorithm basics, the Megatron-Core MoE framework, weight conversion pipelines, and performance results on Mixtral‑8x7B.

Large Language ModelsMegatron-CoreMixture of Experts
0 likes · 18 min read
Unlocking Sparse MoE Large Model Training with Megatron-Core on Alibaba Cloud
ZhongAn Tech Team
ZhongAn Tech Team
Jan 22, 2024 · Artificial Intelligence

Weekly Tech Overview: Major Industry Updates and AI Insights

This weekly tech overview summarizes major industry developments, including Huawei's HarmonyOS NEXT release, SenseTime's open‑source large language model InternLM2, the Apple‑Epic App Store dispute resolution, Xiaomi's 5G satellite terminal approval, Microsoft overtaking Apple in market value, and recent AI energy consumption concerns.

HarmonyOSIndustry UpdatesLarge Language Models
0 likes · 10 min read
Weekly Tech Overview: Major Industry Updates and AI Insights
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Jan 20, 2024 · Artificial Intelligence

Decoding Xiaohongshu’s Recommendation System: How Ordinary Users Gain Visibility

Xiaohongshu’s recommendation system uses large‑scale multimodal embeddings, dual‑tower and graph models, and diversity techniques like DPP and SSD to quickly surface high‑quality user‑generated content, enabling ordinary users to gain visibility while balancing personalization, exploration, and efficient LLM‑augmented pipelines.

Large Language ModelsMultimodal AIXiaohongshu
0 likes · 15 min read
Decoding Xiaohongshu’s Recommendation System: How Ordinary Users Gain Visibility
Cognitive Technology Team
Cognitive Technology Team
Jan 17, 2024 · Artificial Intelligence

Redis Founder antirez Reflects on Large Language Models in 2024

In his first 2024 blog post, Redis founder antirez shares a programmer's perspective on large language models, sharply critiques Google's search engine, evaluates current AIGC as both foolish and historically knowledgeable, and argues that generative AI mainly amplifies the abilities of already strong developers.

AI CommentaryLarge Language Modelsredis
0 likes · 2 min read
Redis Founder antirez Reflects on Large Language Models in 2024
21CTO
21CTO
Jan 14, 2024 · Artificial Intelligence

Can Large Language Models Really Boost Programming Productivity? Insights from Redis Founder

The article reflects on the Redis founder's 2024 blog about large language models, examining their strengths and limits in software development, illustrating how they can accelerate coding for experienced programmers while highlighting challenges in system programming and the need for careful prompt engineering.

AI programmingLarge Language Modelsproductivity
0 likes · 19 min read
Can Large Language Models Really Boost Programming Productivity? Insights from Redis Founder
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Jan 3, 2024 · Artificial Intelligence

Llama 2: Open Foundation and Fine‑Tuned Chat Models – Ghost Attention, RLHF Results, and Safety Evaluation

This article summarizes the Llama 2 series, describing the Ghost Attention technique for maintaining system‑message consistency across multi‑turn dialogs, presenting RLHF and human evaluation results, and discussing extensive safety pre‑training, benchmark assessments, and model release details.

AI EvaluationGhost AttentionLarge Language Models
0 likes · 20 min read
Llama 2: Open Foundation and Fine‑Tuned Chat Models – Ghost Attention, RLHF Results, and Safety Evaluation
OPPO Kernel Craftsman
OPPO Kernel Craftsman
Dec 29, 2023 · Information Security

OPPO Releases White Paper on Mobile Application Trustworthy Technology at CAICT ICT+ Deep Observation Conference

At the CAICT ICT+ Deep Observation Conference, OPPO unveiled a white paper on mobile application trustworthy technology, analyzing lifecycle security risks, policy and patent developments, and the role of large‑model AI in intelligent terminals, while urging standardized security practices and accelerated AI‑driven vulnerability detection tools.

CAICTIntelligent TerminalsLarge Language Models
0 likes · 4 min read
OPPO Releases White Paper on Mobile Application Trustworthy Technology at CAICT ICT+ Deep Observation Conference
OPPO Amber Lab
OPPO Amber Lab
Dec 29, 2023 · Information Security

Large Models Transform Mobile App Security – Key Takeaways from OPPO’s White Paper

The 2024 China Academy of ICT deep‑observation summit in Shanghai unveiled OPPO’s new white paper on trustworthy mobile application technology, highlighting how large language models enhance smart terminal security, outlining industry trends, and outlining future directions for secure, intelligent mobile ecosystems.

Large Language ModelsOPPOSoftware Security
0 likes · 6 min read
Large Models Transform Mobile App Security – Key Takeaways from OPPO’s White Paper
DataFunTalk
DataFunTalk
Dec 25, 2023 · Artificial Intelligence

Tool Learning with Foundation Models: Frameworks, Datasets, and Open‑Source Toolkits

This article reviews the emerging field of tool learning for large foundation models, outlining its background, categorization, core framework components, training strategies, and applications such as WebCPM, BMTools, and ToolBench, while highlighting recent research results and open‑source resources.

AI toolsLarge Language ModelsWeb Search
0 likes · 21 min read
Tool Learning with Foundation Models: Frameworks, Datasets, and Open‑Source Toolkits
Java High-Performance Architecture
Java High-Performance Architecture
Dec 22, 2023 · Artificial Intelligence

Is Google Gemini Echoing Baidu? A Deep Dive into Model Contamination

The article investigates recent tests showing that Google Gemini sometimes claims to be Baidu's AI, reproduces Baidu‑related responses, and appears to have its Chinese and English corpora contaminated with competitor data, highlighting the challenges of data provenance in large language models.

AI model contaminationAI testingBaidu Wenxin
0 likes · 6 min read
Is Google Gemini Echoing Baidu? A Deep Dive into Model Contamination
DataFunTalk
DataFunTalk
Dec 21, 2023 · Artificial Intelligence

Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning – Best Long Paper at EMNLP 2023

At EMNLP 2023, the joint WeChat AI and Peking University paper 'Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning' won the Best Long Paper award, revealing that label tokens act as anchors driving information aggregation in shallow layers and prediction flow in deep layers, and proposing methods to improve and diagnose in‑context learning.

AI researchIn-Context LearningInformation Flow
0 likes · 13 min read
Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning – Best Long Paper at EMNLP 2023
DataFunTalk
DataFunTalk
Dec 19, 2023 · Artificial Intelligence

Enterprise Large‑Model Deployment and Data Governance: Insights from Deepexi’s President

The article examines how enterprises can adopt domain‑specific large models by balancing demand‑side cost‑reduction needs with supply‑side mature training techniques, discusses team composition, fine‑tuning methods, data governance for unstructured data, and outlines Deepexi’s product ecosystem designed to improve efficiency, performance, and user experience.

AI deploymentEnterprise AILarge Language Models
0 likes · 13 min read
Enterprise Large‑Model Deployment and Data Governance: Insights from Deepexi’s President
21CTO
21CTO
Dec 17, 2023 · Artificial Intelligence

Why AI‑Native Apps Matter: Insights from Baidu, ByteDance Ban, and New PHP Server

The article examines Baidu CEO Li Yanhong’s call to focus on AI‑native applications, reports ByteDance’s suspension by OpenAI for misusing GPT, outlines Google’s phased removal of third‑party cookies, and announces the release of the Go‑based PHP server FrankenPHP 1.0.

AI-native applicationsLarge Language ModelsPHP server
0 likes · 7 min read
Why AI‑Native Apps Matter: Insights from Baidu, ByteDance Ban, and New PHP Server
DataFunSummit
DataFunSummit
Dec 14, 2023 · Artificial Intelligence

Enterprise Large‑Model Deployment: Data Governance, Fine‑Tuning Strategies, and Cost Economics

The article examines how enterprises can adopt domain‑specific large language models by addressing data governance, model fine‑tuning techniques, dataset balance, and product architecture to achieve cost‑effective, high‑performance AI solutions across various business scenarios.

Large Language ModelsModel Fine‑tuningcost efficiency
0 likes · 14 min read
Enterprise Large‑Model Deployment: Data Governance, Fine‑Tuning Strategies, and Cost Economics
Huawei Cloud Developer Alliance
Huawei Cloud Developer Alliance
Dec 14, 2023 · Artificial Intelligence

Unlocking LLaMA: Key Innovations, Architecture Insights, and MindSpore Inference Guide

This article reviews the LLaMA large‑language‑model series, covering its background, architectural innovations such as Add&Norm, SwiGLU, and RoPE, a known reversal‑curse bug, and provides step‑by‑step MindSpore Transformers code for model configuration, inference, and pipeline usage while previewing the upcoming LLaMA‑2 session.

InferenceLLaMALarge Language Models
0 likes · 6 min read
Unlocking LLaMA: Key Innovations, Architecture Insights, and MindSpore Inference Guide
DataFunTalk
DataFunTalk
Dec 12, 2023 · Artificial Intelligence

Challenges and Considerations of Recommendation Systems: Evaluation, Data Leakage, and the Role of Large Models

This article examines recommendation system problem definitions, differences between academia and industry, offline evaluation pitfalls and data leakage issues, data construction challenges with datasets like MovieLens, and evaluates whether large language models can serve as effective solutions for modern recommendation tasks.

Large Language ModelsRecommendation Systemsdata leakage
0 likes · 20 min read
Challenges and Considerations of Recommendation Systems: Evaluation, Data Leakage, and the Role of Large Models
21CTO
21CTO
Dec 7, 2023 · Artificial Intelligence

Google Gemini vs GPT‑4: Can the New AI Model Outperform ChatGPT?

Google's Gemini AI suite, unveiled in December, brings three model sizes—Nano, Pro, and Ultra—to power Bard and other services, claims superior performance over GPT‑4 across most benchmarks, and introduces multimodal capabilities that signal a major shift in the AI landscape.

AI language modelGPT-4 comparisonGoogle Gemini
0 likes · 6 min read
Google Gemini vs GPT‑4: Can the New AI Model Outperform ChatGPT?
JD Tech
JD Tech
Nov 30, 2023 · Artificial Intelligence

Understanding ChatGPT: Mechanisms, Attention, Emergence, and the Chinese Room

This article examines the principles behind ChatGPT, detailing its continuation-based operation, the role of attention mechanisms and transformer architecture, the scaling of neural networks that leads to emergent abilities, and interprets these phenomena through the lenses of compression theory and the Chinese Room thought experiment.

Attention MechanismChatGPTEmergence
0 likes · 27 min read
Understanding ChatGPT: Mechanisms, Attention, Emergence, and the Chinese Room
AntTech
AntTech
Nov 24, 2023 · Artificial Intelligence

Code Model Evaluation Framework and the CodeFuseEval Benchmark Overview

This article presents a comprehensive overview of code large‑model evaluation, describing the need for multi‑dimensional benchmarks, the CodeFuseEval benchmark suite, dataset construction, evaluation methods, framework architecture, result visualisation, and future directions for enterprise‑grade code generation models.

CodeFuseEvalLarge Language Modelsai
0 likes · 12 min read
Code Model Evaluation Framework and the CodeFuseEval Benchmark Overview
Ant R&D Efficiency
Ant R&D Efficiency
Nov 24, 2023 · Artificial Intelligence

CodeFuseEval: An Enterprise‑Level Multi‑Task Benchmark for Evaluating Code Large Models

CodeFuseEval is an enterprise‑grade, multi‑task benchmark that evaluates code‑generation large models across six languages and thousands of real‑world tasks using both objective metrics (pass@k, BLEU, CodeBLEU) and expert human review, with an open‑source framework, continuous dataset expansion, and a focus on correctness, efficiency, robustness, and service‑level quality.

Large Language Modelsaibenchmark
0 likes · 12 min read
CodeFuseEval: An Enterprise‑Level Multi‑Task Benchmark for Evaluating Code Large Models
DataFunTalk
DataFunTalk
Nov 21, 2023 · Artificial Intelligence

Improving Efficiency of Large-Scale Distributed Training for Large Language Models

Recent advances in large language models have dramatically increased model size and training data, leading to soaring computational costs; this article examines the scaling trends, hardware utilization challenges, distributed training techniques, and ethical considerations, highlighting methods to improve efficiency, reduce costs, and mitigate environmental impact.

AI ethicsDistributed TrainingLarge Language Models
0 likes · 29 min read
Improving Efficiency of Large-Scale Distributed Training for Large Language Models
Baobao Algorithm Notes
Baobao Algorithm Notes
Nov 21, 2023 · Artificial Intelligence

How Much Data Do You Need for a 10B LLM? Decoding Scaling Laws

This article explains how scaling laws can answer common LLM development questions—such as the data required for a 10B model, the model size achievable with 1 TB of data, and the optimal compute‑data‑model trade‑off for a fixed GPU budget—by presenting core formulas, practical derivations, and insights from OpenAI, DeepMind and Google.

Compute EfficiencyData RequirementsLarge Language Models
0 likes · 12 min read
How Much Data Do You Need for a 10B LLM? Decoding Scaling Laws
360 Smart Cloud
360 Smart Cloud
Nov 20, 2023 · Artificial Intelligence

Overview of Recent Open‑Source AI Models and Tools (November 2023)

This article summarizes a collection of newly released open‑source AI projects covering natural‑language processing, multimodal processing, intelligent agents, recommendation systems, and model training acceleration, providing brief descriptions, key capabilities, and links to their repositories.

Large Language ModelsMultimodalRecommendation Systems
0 likes · 9 min read
Overview of Recent Open‑Source AI Models and Tools (November 2023)