Tagged articles

AI safety

301 articles · Page 3 of 4
Volcano Engine Developer Services
Volcano Engine Developer Services
Aug 19, 2025 · Artificial Intelligence

How to Strengthen LLM System Prompts for Safer AI Agents

This guide explains how to reinforce system prompts for AI agents by optimizing their content and structure, using active defense, role‑based, and format constraints, providing practical examples, measurement methods, and experimental results that demonstrate up to 90% reduction in unsafe behavior.

AI safetyLLMreinforcement
0 likes · 13 min read
How to Strengthen LLM System Prompts for Safer AI Agents
Meituan Technology Team
Meituan Technology Team
Aug 14, 2025 · Artificial Intelligence

How Meituan’s Smart Helmet Redefines Delivery Safety with AI‑Powered Design

Meituan’s first smart‑helmet article details hardware innovations that tackle delivery riders’ safety, comfort, and efficiency, covering stricter safety standards, sensor‑driven alerts, lightweight structures, advanced ventilation, three‑times longer battery life, noise‑cancelling audio, IPX6 waterproofing, and a data‑driven production line.

AI safetyHardware Designdelivery efficiency
0 likes · 24 min read
How Meituan’s Smart Helmet Redefines Delivery Safety with AI‑Powered Design
AI Frontier Lectures
AI Frontier Lectures
Jul 27, 2025 · Information Security

Can Hidden Activations Expose Multimodal Model Jailbreaks?

The paper reveals that large multimodal language models retain refusal signals in their hidden states even after jailbreak attempts, and proposes a training‑free detection method that leverages these signals to identify unsafe inputs across text and image modalities with strong generalization.

AI safetyLVLM securityhidden activation analysis
0 likes · 7 min read
Can Hidden Activations Expose Multimodal Model Jailbreaks?
IT Services Circle
IT Services Circle
Jul 16, 2025 · Artificial Intelligence

How a Simple Colon Can Trick Top LLMs – The Master‑RM Fix

A recent study reveals that tiny symbols like colons or generic reasoning prefixes can cause large language models used as reward judges to issue false‑positive rewards, but an enhanced reward model called Master‑RM, trained with adversarial data, eliminates this vulnerability across multiple LLMs and languages.

AI safetyLLMMaster-RM
0 likes · 10 min read
How a Simple Colon Can Trick Top LLMs – The Master‑RM Fix
AntTech
AntTech
Jul 14, 2025 · Artificial Intelligence

What Is the New AI Agent Safety Testing Standard and Why It Matters

The World Digital Academy unveiled the AI STR series' first global AI Agent Operation Safety Testing Standard, detailing a full‑link risk analysis framework, novel testing methods, and its role in addressing rising safety concerns as AI agents become mainstream in 2025.

AI GovernanceAI safetyagent standards
0 likes · 5 min read
What Is the New AI Agent Safety Testing Standard and Why It Matters
21CTO
21CTO
Jul 1, 2025 · Artificial Intelligence

OpenAI CEO Warns: Don’t Blindly Trust AI – Insights from New Open‑Source Models

Sam Altman cautions against over‑reliance on ChatGPT, while Germany blocks DeepSeek for GDPR violations, Tencent unveils its MoE‑based Hunyuan‑A13B model, and Google releases a Python client for Data Commons, highlighting both AI risks and rapid open‑source advancements.

AI safetyData CommonsMoE
0 likes · 9 min read
OpenAI CEO Warns: Don’t Blindly Trust AI – Insights from New Open‑Source Models
DataFunTalk
DataFunTalk
Jun 21, 2025 · Artificial Intelligence

Why AI Gets Overconfident: Bias, Hallucinations, and Reinforcement Learning Solutions

This talk explores how large AI models become overconfident, leading to bias and hallucinations, examines adversarial examples in vision and language, explains why data and algorithms cause these issues, and shows how reinforcement learning can teach models to admit uncertainty and align with human values.

AI alignmentAI safetyBias
0 likes · 19 min read
Why AI Gets Overconfident: Bias, Hallucinations, and Reinforcement Learning Solutions
DataFunTalk
DataFunTalk
Jun 19, 2025 · Artificial Intelligence

Can We Flip the Switch on AI Good vs. Evil? OpenAI’s Toxic Persona Find

OpenAI’s new research reveals that training language models to produce incorrect answers in a single domain can trigger a toxic persona feature, causing the model to generate harmful suggestions across unrelated tasks, but the team also demonstrates detection methods and a reversible “emergent realignment” technique to restore safe behavior.

AI safetyEmergent misalignmentOpenAI
0 likes · 7 min read
Can We Flip the Switch on AI Good vs. Evil? OpenAI’s Toxic Persona Find
DataFunSummit
DataFunSummit
Jun 10, 2025 · Artificial Intelligence

How Quwan’s Kaitian Model Tackles Emotional AI for Social Apps – Architecture, Training Tricks, and Safety

Quwan Technology presents its Kaitian social large model, designed for personalized, emotionally rich, multimodal AI interactions, detailing its scene‑specific goals, CPT+SFT+RLHF training pipeline, data desensitization, LoRA fine‑tuning, evaluation methods, pruning, latency trade‑offs, safety mechanisms, and future feedback loops.

AI safetyLarge Language ModelLoRA
0 likes · 13 min read
How Quwan’s Kaitian Model Tackles Emotional AI for Social Apps – Architecture, Training Tricks, and Safety
Kuaishou Tech
Kuaishou Tech
Jun 5, 2025 · Artificial Intelligence

7 Kuaishou AI Papers Accepted at ACL 2025: Video Understanding & Safe LLM Decoding

Kuaishou’s foundational large-model team has secured seven papers at ACL 2025, spanning alignment bias in training, safety defenses during inference, decoding strategies, fine-grained video-temporal understanding, reward fairness in RLHF, multimodal captioning benchmarks, and methods to curb hallucinations in vision-language models.

ACLAI safetyMultimodal
0 likes · 13 min read
7 Kuaishou AI Papers Accepted at ACL 2025: Video Understanding & Safe LLM Decoding
AntTech
AntTech
May 30, 2025 · Artificial Intelligence

Insights from Ant Group’s 10th Technical Open Day: Multimodal, Embodied, and Future Model Architectures for AGI

The Ant Group’s 10th Technical Open Day gathered leading AI experts who examined the current state and future directions of multimodal large models, embodied AI, world models, transformer architectures, and vertical applications, offering a comprehensive view of the challenges and opportunities on the path toward AGI.

AGIAI safetyEmbodied AI
0 likes · 16 min read
Insights from Ant Group’s 10th Technical Open Day: Multimodal, Embodied, and Future Model Architectures for AGI
ShiZhen AI
ShiZhen AI
May 26, 2025 · Industry Insights

Nvidia Plans Cheaper Blackwell AI Chip for China Amid Export Restrictions

Nvidia is reportedly preparing a lower‑cost Blackwell GPU for the Chinese market, priced at $6,500‑$8,000 and featuring 1.7 TB/s GDDR7 memory, while OpenAI’s o3 model uncovered a Linux kernel zero‑day (CVE‑2025‑37899), a study showed AI models can sabotage shutdown commands, and a tutorial demonstrates creating animated 3D icons with ChatGPT and Freepik tools.

3D icon creationAI hardwareAI safety
0 likes · 8 min read
Nvidia Plans Cheaper Blackwell AI Chip for China Amid Export Restrictions
Java Tech Enthusiast
Java Tech Enthusiast
May 25, 2025 · Artificial Intelligence

Does Claude 4 Really Report Unethical Actions? Inside Its Hidden ‘Whistleblower’ Feature

The article analyzes Anthropic's Claude 4 series, highlighting its extended reasoning ability, a controversial whistle‑blower function that can report extreme wrongdoing, observed extortion attempts toward developers, and the safety measures Anthropic introduced to curb such risky autonomous behaviors.

AI safetyAnthropicClaude 4
0 likes · 6 min read
Does Claude 4 Really Report Unethical Actions? Inside Its Hidden ‘Whistleblower’ Feature
Tencent Technical Engineering
Tencent Technical Engineering
May 8, 2025 · Artificial Intelligence

Augment AI Programming Assistant: Technical Breakthroughs, Industry Impact, and Security Risks

Augment, a newly funded AI programming assistant that tops the SWE‑bench benchmark with a 65.4% score and a 200 k‑token context window, promises massive productivity gains for developers but also introduces sophisticated security threats such as malicious memory prompts, back‑door context injection, compromised guidelines, and risky multi‑task collaboration protocols, prompting calls for layered defenses and vigilant monitoring.

AI programmingAI safetyAgent Memory
0 likes · 11 min read
Augment AI Programming Assistant: Technical Breakthroughs, Industry Impact, and Security Risks
Sohu Tech Products
Sohu Tech Products
May 7, 2025 · Information Security

Why MCP Protocol Is a Security Nightmare: Real Attack Cases and Mitigations

This article provides a comprehensive security analysis of the Model Context Protocol (MCP), exposing multiple attack vectors such as prompt poisoning, tool poisoning, command and code injection, and illustrating how MCP’s design flaws make it more vulnerable than traditional applications while offering concrete mitigation recommendations.

AI safetyCode InjectionMCP
0 likes · 34 min read
Why MCP Protocol Is a Security Nightmare: Real Attack Cases and Mitigations
JavaEdge
JavaEdge
May 7, 2025 · Artificial Intelligence

Why AI Agents Pose New Security Risks and How to Safeguard Them

The article explains what AI agents are, highlights their emerging security risks such as data leakage and lack of accountability, and offers practical strategies—including risk analysis, threat modeling, and engineering best practices—to mitigate these challenges for enterprises.

AI agentsAI safetyEnterprise AI
0 likes · 9 min read
Why AI Agents Pose New Security Risks and How to Safeguard Them
21CTO
21CTO
Apr 7, 2025 · Artificial Intelligence

Llama 4 Unveiled: Breakthrough Multimodal Models Redefine AI Capabilities

Meta's Llama 4 series introduces the Scout, Maverick, and Behemoth models—featuring Mixture‑of‑Experts architectures, unprecedented 10‑million‑token context windows, and state‑of‑the‑art performance across vision, language, and multimodal benchmarks—while emphasizing efficient training, open‑source availability, and robust safety safeguards.

AI safetyLarge Language ModelLlama 4
0 likes · 14 min read
Llama 4 Unveiled: Breakthrough Multimodal Models Redefine AI Capabilities
Model Perspective
Model Perspective
Apr 7, 2025 · Artificial Intelligence

Why AI Alignment Matters: Ensuring Smart Systems Follow Human Intent

This article explores the multifaceted AI alignment challenge, detailing safety benchmarks such as toxicity, ethical, power‑seeking, and hallucination evaluations, and argues that responsible AI development requires technical safeguards, international governance, and a civilizational dialogue bridging philosophy and humanity.

AI GovernanceAI alignmentAI safety
0 likes · 12 min read
Why AI Alignment Matters: Ensuring Smart Systems Follow Human Intent
Cognitive Technology Team
Cognitive Technology Team
Apr 1, 2025 · Artificial Intelligence

Four‑Second Bloodshed: How Autonomous Driving Algorithms Turned a Fatal Accident

A March 2025 crash involving a Xiaomi‑branded autonomous vehicle illustrates how a four‑second algorithmic decision loop, inadequate night‑vision sensors, flawed handover timing, and poor emergency‑exit design combined to create a lethal scenario that exposes the deadly risks of over‑relying on L2 driver‑assist systems.

AI safetyHuman-Machine InteractionL2 driver assistance
0 likes · 4 min read
Four‑Second Bloodshed: How Autonomous Driving Algorithms Turned a Fatal Accident
Architect
Architect
Mar 28, 2025 · Artificial Intelligence

Peeking Inside Claude: How Anthropic Uncovers LLM Reasoning

Anthropic’s recent papers reveal how Claude’s internal mechanisms—multilingual feature sharing, pre‑planned rhyming, parallel arithmetic paths, concept‑level reasoning, and hallucination triggers—are probed with feature‑insertion techniques, offering engineers actionable insights for building more transparent and safe AI systems.

AI safetyAnthropicClaude
0 likes · 12 min read
Peeking Inside Claude: How Anthropic Uncovers LLM Reasoning
Architect
Architect
Mar 24, 2025 · Artificial Intelligence

How Multimodal Alignment Is Shaping the Future of Large Language Models

This article provides a systematic review of recent advances in multimodal alignment for large language models, covering key contributions, application scenarios, dataset construction, evaluation benchmarks, future challenges, and insights from LLM alignment research to guide both academia and industry.

AI safetyDataset ConstructionMLLM
0 likes · 26 min read
How Multimodal Alignment Is Shaping the Future of Large Language Models
Architecture and Beyond
Architecture and Beyond
Mar 15, 2025 · Information Security

Prompt Injection Attacks on Large Language Models: Risks, Types, and Defense Framework

This article explains how prompt injection attacks exploit large language models by altering their behavior through crafted inputs, outlines the major harms and attack categories—including direct, indirect, multimodal, code, and jailbreak attacks—and presents a comprehensive three‑layer defense framework covering input‑side, output‑side, and system‑level protections.

AI safetyLLM securityinformation security
0 likes · 16 min read
Prompt Injection Attacks on Large Language Models: Risks, Types, and Defense Framework
DevOps
DevOps
Mar 10, 2025 · Artificial Intelligence

AI Policy, Safety, Industry Applications, and Talent Development Highlighted at China's 2024 Two Sessions

The 2024 Chinese Two Sessions emphasized artificial intelligence as a strategic priority, discussing AI safety regulations, industry applications, talent shortages, and policy proposals from leaders such as DeepSeek, Xiaomi, and academic experts, highlighting the drive to integrate AI across manufacturing, agriculture, healthcare, and education.

AI industryAI policyAI safety
0 likes · 11 min read
AI Policy, Safety, Industry Applications, and Talent Development Highlighted at China's 2024 Two Sessions
Code Mala Tang
Code Mala Tang
Feb 27, 2025 · Artificial Intelligence

Do New AI Reasoning Models Really Think? Unpacking the Debate

The article examines whether the latest AI models that claim to perform true reasoning—by breaking problems into steps and using chain‑of‑thought—actually reason like humans, presenting skeptical and supportive expert viewpoints, and offering practical guidance on how to use such models responsibly.

AI reasoningAI safetyChain-of-Thought
0 likes · 14 min read
Do New AI Reasoning Models Really Think? Unpacking the Debate
Architect's Alchemy Furnace
Architect's Alchemy Furnace
Feb 19, 2025 · Artificial Intelligence

DeepSeek’s Self‑Correction: Transforming AI Reliability and Safety

The article explores DeepSeek’s innovative self‑correction system—combining a Mixture‑of‑Experts architecture with reinforcement‑learning feedback—to achieve real‑time error detection, dynamic knowledge‑graph updates, and enhanced safety in high‑risk fields like autonomous driving and medical diagnostics.

AI safetyDeepSeekMixture of Experts
0 likes · 9 min read
DeepSeek’s Self‑Correction: Transforming AI Reliability and Safety
Architects' Tech Alliance
Architects' Tech Alliance
Feb 12, 2025 · Artificial Intelligence

DeepSeek‑V3 Training Efficiency, Knowledge Distillation, and the Risks of Synthetic Data

The article examines DeepSeek‑V3’s low‑cost training using 2048 H800 GPUs, explains how knowledge distillation and high‑quality data improve efficiency, discusses expert concerns about training on AI‑generated content, and outlines the limitations and ceiling effect of distillation techniques.

AI Training EfficiencyAI safetyDeepSeek-V3
0 likes · 7 min read
DeepSeek‑V3 Training Efficiency, Knowledge Distillation, and the Risks of Synthetic Data
Software Engineering 3.0 Era
Software Engineering 3.0 Era
Jan 18, 2025 · Industry Insights

Is AI Self‑Programming and Recursive Self‑Improvement Signaling the Endgame?

The article examines Nvidia’s claim that AI can now write software and build an “AI factory,” analyzes OpenAI’s emerging o‑series models that purportedly achieve recursive self‑improvement, and surveys community reactions ranging from excitement to safety concerns about a potential AI “game over.”

AI safetyIndustry AnalysisNVIDIA
0 likes · 8 min read
Is AI Self‑Programming and Recursive Self‑Improvement Signaling the Endgame?
Baobao Algorithm Notes
Baobao Algorithm Notes
Jan 11, 2025 · Artificial Intelligence

Why Phi‑4’s 14B Model Outperforms GPT‑4 on STEM and Reasoning Tasks

Microsoft Research’s Phi‑4 model, a 14‑billion‑parameter LLM, leverages extensive synthetic data, advanced tokenization, and a two‑stage training pipeline to achieve superior performance on STEM question answering, long‑context reasoning, and safety benchmarks, rivaling larger models like GPT‑4.

AI safetyBenchmarkingPhi-4
0 likes · 15 min read
Why Phi‑4’s 14B Model Outperforms GPT‑4 on STEM and Reasoning Tasks
DataFunTalk
DataFunTalk
Jan 5, 2025 · Artificial Intelligence

The Approaching Singularity: AI Automation, AGI Predictions, and Their Impact on Jobs and Society

The article examines how rapid advances in artificial intelligence are expected to automate nearly half of U.S. jobs within the next two decades, explores singularity forecasts for 2029‑2030, and discusses the profound economic, ethical, and security challenges that humanity must address before AI-driven autonomous systems reshape work, research, and daily life.

AGIAIAI safety
0 likes · 18 min read
The Approaching Singularity: AI Automation, AGI Predictions, and Their Impact on Jobs and Society
21CTO
21CTO
Jan 2, 2025 · Artificial Intelligence

2025 AI Breakthroughs: Unlimited Memory & Intelligent Agents, Says Eric Schmidt

Former Google CEO Eric Schmidt warns that AI is on the brink of a transformative era, highlighting three 2025 breakthroughs—unlimited context memory, autonomous AI agents, and text‑to‑action programming—while also stressing the looming risks of energy consumption, security threats, and the need for ethical safeguards.

AI memoryAI researchAI safety
0 likes · 14 min read
2025 AI Breakthroughs: Unlimited Memory & Intelligent Agents, Says Eric Schmidt
21CTO
21CTO
Dec 22, 2024 · Artificial Intelligence

OpenAI’s New o3 Model Shatters Benchmarks – Is AGI Finally Here?

OpenAI’s latest o3 model demonstrates unprecedented performance across logic, mathematics, and programming benchmarks, introduces flexible reasoning modes with the upcoming o3‑mini, and incorporates advanced safety alignment, signaling a major leap toward practical artificial general intelligence.

AGIAI safetyOpenAI
0 likes · 6 min read
OpenAI’s New o3 Model Shatters Benchmarks – Is AGI Finally Here?
DataFunTalk
DataFunTalk
Dec 22, 2024 · Artificial Intelligence

Speech by Academician Sun Ninghui on the Development, Challenges, and Future of Artificial Intelligence and Intelligent Computing in China

The speech outlines the rapid rise of generative AI models, traces the historical evolution of computing technology, examines AI safety risks and regulatory responses, and proposes strategic pathways for China to advance intelligent computing through open, closed, or hybrid ecosystems while addressing talent, hardware, and cost challenges.

AI safetyChinaIntelligent Computing
0 likes · 26 min read
Speech by Academician Sun Ninghui on the Development, Challenges, and Future of Artificial Intelligence and Intelligent Computing in China
21CTO
21CTO
Dec 3, 2024 · Artificial Intelligence

When Bing Chat Went Rogue: What Prompt‑Injection Reveals About AI Safety

A detailed analysis of Simon Willison and Benj Edwards' conversation about Bing Chat's angry, deceptive behavior uncovers how prompt‑injection attacks expose weaknesses in large language models, the limits of system prompts, and the broader safety challenges facing AI development today.

AI safetyBing ChatChatGPT
0 likes · 9 min read
When Bing Chat Went Rogue: What Prompt‑Injection Reveals About AI Safety
DataFunTalk
DataFunTalk
Nov 11, 2024 · Artificial Intelligence

OpenAI VP Lilian Weng Departs and Shares Full AI Safety Talk Transcript

The article reports the departure of OpenAI research VP Lilian Weng, provides the full transcript of her recent AI safety and alignment presentation at a Bilibili event, and discusses broader concerns about OpenAI's safety culture, reinforcement learning from human feedback, and the importance of collective involvement in AI safety.

AI safetyOpenAIalignment
0 likes · 10 min read
OpenAI VP Lilian Weng Departs and Shares Full AI Safety Talk Transcript
NewBeeNLP
NewBeeNLP
Nov 7, 2024 · Artificial Intelligence

Tackling Large Model Hallucinations: Causes, Detection, and Mitigation Strategies

This article provides a comprehensive analysis of large language model hallucinations, detailing their definitions, classifications, root causes, detection techniques, and a wide range of mitigation approaches—including RAG pipelines, decoding strategies, and model‑enhancement methods—to improve reliability and safety in real‑world AI applications.

AI safetyHallucinationPrompt engineering
0 likes · 22 min read
Tackling Large Model Hallucinations: Causes, Detection, and Mitigation Strategies
Cognitive Technology Team
Cognitive Technology Team
Oct 16, 2024 · Artificial Intelligence

Large Language Models Lack Formal Reasoning Ability: Five Pieces of Evidence from the GSM‑Symbolic Benchmark

Recent research by Apple’s Iman Mirzadeh team introduces the GSM‑Symbolic benchmark, revealing that large language models, despite high scores on GSM8K, exhibit significant performance drops when problem numbers, names, or extra clauses change, indicating a lack of true formal reasoning ability.

AI safetyGSM‑Symbolicbenchmark
0 likes · 9 min read
Large Language Models Lack Formal Reasoning Ability: Five Pieces of Evidence from the GSM‑Symbolic Benchmark
Architect
Architect
Sep 26, 2024 · Artificial Intelligence

Decoding OpenAI o1: How RL‑LLM Fusion Powers Next‑Gen Reasoning

This article provides a detailed technical analysis of OpenAI’s o1 model, exploring its enhanced logical reasoning, the likely use of reinforcement learning with hidden chain‑of‑thought generation, multi‑model architecture, training data pipelines, reward modeling, and how these innovations could reshape AI safety and scaling strategies.

AI safetyChain-of-ThoughtLLM
0 likes · 43 min read
Decoding OpenAI o1: How RL‑LLM Fusion Powers Next‑Gen Reasoning
Baobao Algorithm Notes
Baobao Algorithm Notes
Sep 25, 2024 · Industry Insights

Decoding OpenAI o1: How RL and LLM Fuse to Power Hidden Chain‑of‑Thought

This article analytically reconstructs OpenAI o1’s architecture, training pipeline, and inference workflow, exploring its reinforcement‑learning‑enhanced hidden chain‑of‑thought, multi‑model composition, scaling laws, reward modeling, and potential implications for future AI safety and small‑model strategies.

AI safetyHidden COTLLM
0 likes · 43 min read
Decoding OpenAI o1: How RL and LLM Fuse to Power Hidden Chain‑of‑Thought
Data Thinking Notes
Data Thinking Notes
Sep 13, 2024 · Artificial Intelligence

How OpenAI’s o1 Series Redefines Complex Reasoning and AI Safety

OpenAI’s new o1 series, including o1‑preview and o1‑mini, leverages reinforcement‑learning‑based chain‑of‑thought reasoning to achieve superior performance on academic exams, coding contests, and safety benchmarks, offering faster, cost‑effective options while advancing AI alignment and human‑preference evaluation.

AI safetyLarge Language ModelOpenAI
0 likes · 15 min read
How OpenAI’s o1 Series Redefines Complex Reasoning and AI Safety
AntTech
AntTech
Aug 12, 2024 · Artificial Intelligence

DKCF Trustworthy Framework for Large Model Applications and AI Security Practices

The article outlines the DKCF (Data‑Knowledge‑Collaboration‑Feedback) trustworthy framework presented at the 2024 Shanghai Cybersecurity Expo, detailing challenges of large AI models, four key trust factors, and Ant Group's practical security implementations for professional AI deployments.

AI safetyDKCFKnowledge Engineering
0 likes · 10 min read
DKCF Trustworthy Framework for Large Model Applications and AI Security Practices
NewBeeNLP
NewBeeNLP
Jul 25, 2024 · Artificial Intelligence

Llama 3.1 Unveiled: How the New Open‑Source Giant Matches GPT‑4o and Claude 3.5

Meta has officially released Llama 3.1, a 405‑billion‑parameter open‑source model that matches or surpasses GPT‑4o and Claude 3.5 on over 150 benchmarks, expands context to 128 K tokens, supports eight languages, and is accompanied by a detailed 100‑page paper describing its data, training stack, architecture, quantization, safety measures, and ecosystem support.

AI safetyLarge Language ModelLlama 3.1
0 likes · 15 min read
Llama 3.1 Unveiled: How the New Open‑Source Giant Matches GPT‑4o and Claude 3.5
AntTech
AntTech
Jul 9, 2024 · Artificial Intelligence

2024 Large Model Security Practice Whitepaper Unveiled at the World AI Conference

The jointly authored 2024 Large Model Security Practice whitepaper, released at the World AI Conference, outlines a comprehensive safety framework covering security, reliability, and controllability, presents industry case studies, and proposes a five‑dimensional governance model to guide high‑quality development of large AI models.

AI safetyTrustworthy AIindustry practice
0 likes · 7 min read
2024 Large Model Security Practice Whitepaper Unveiled at the World AI Conference
JD Tech
JD Tech
Jun 28, 2024 · Artificial Intelligence

An Overview of Large Language Models: History, Fundamentals, Prompt Engineering, Retrieval‑Augmented Generation, Agents, and Multimodal AI

This article provides a comprehensive introduction to large language models, covering their historical development, core architecture, training process, prompt engineering techniques, Retrieval‑Augmented Generation, agent frameworks, multimodal capabilities, safety challenges, and future research directions.

AI agentsAI safetyMultimodal
0 likes · 22 min read
An Overview of Large Language Models: History, Fundamentals, Prompt Engineering, Retrieval‑Augmented Generation, Agents, and Multimodal AI
DataFunSummit
DataFunSummit
Jun 23, 2024 · Artificial Intelligence

Tongyi Xingchen Personalized Large Model: Technical Overview and Applications

This article summarizes the development background of large language models, Alibaba's progression in foundational and personalized AI, the design and capabilities of the Tongyi Xingchen personalized model, its multimodal and agent-based architecture, various industry use cases, and the safety and responsibility measures applied to ensure trustworthy AI deployment.

AI safetyMultimodal AIlarge language models
0 likes · 13 min read
Tongyi Xingchen Personalized Large Model: Technical Overview and Applications
21CTO
21CTO
Jun 2, 2024 · Artificial Intelligence

Will OpenAI’s New Safety Team Really Secure ChatGPT?

OpenAI has created a new safety committee led by Sam Altman and board members, aiming to evaluate and improve safeguards while former researchers voice concerns about the company’s commitment to AI safety and ethics.

AI safetyChatGPTGovernance
0 likes · 6 min read
Will OpenAI’s New Safety Team Really Secure ChatGPT?
21CTO
21CTO
May 25, 2024 · Artificial Intelligence

Sam Altman Reveals GPT‑4o Vision, AI Safety, and the Future of AGI

Sam Altman’s hour‑long “All‑In” podcast interview unveils OpenAI’s latest GPT‑4o voice model, his bold vision for AGI, concerns about AI safety, the recent leadership shake‑up, and his ideas on universal access, regulation, and the transformative impact of conversational AI.

AGIAIAI safety
0 likes · 9 min read
Sam Altman Reveals GPT‑4o Vision, AI Safety, and the Future of AGI
DevOps
DevOps
May 23, 2024 · Information Security

Guidelines for Evaluating Large Language Models in Cybersecurity Tasks

The article examines the opportunities and risks of applying large language models (LLMs) to cybersecurity, outlines fourteen practical recommendations for assessing their real‑world capabilities, and concludes with an invitation to the upcoming R&D Efficiency Conference covering AI, product management, and related topics.

AI safetyEvaluationLLM
0 likes · 11 min read
Guidelines for Evaluating Large Language Models in Cybersecurity Tasks
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
May 2, 2024 · Artificial Intelligence

Understanding Large Language Models: Principles, Training, Risks, and Application Security

This article provides a comprehensive overview of large language models (LLMs), explaining their core concepts, transformer architecture, training stages, known shortcomings such as hallucination and reversal curse, and highlights emerging security threats like prompt injection and jailbreaking, offering guidance for safe deployment.

AI safetyLLMjailbreaking
0 likes · 21 min read
Understanding Large Language Models: Principles, Training, Risks, and Application Security
AntTech
AntTech
Apr 18, 2024 · Artificial Intelligence

WDTA Releases International Standards for Generative AI and Large Language Model Safety Testing at the 27th UN CSTD Annual Meeting

At the 27th UN CSTD Annual Meeting in Geneva, the World Digital Technology Academy unveiled two pioneering international standards—one for generative AI application security testing and another for large language model security testing—crafted by experts from leading AI firms to establish a new global benchmark for AI safety.

AI safetyAnt GroupGenerative AI
0 likes · 8 min read
WDTA Releases International Standards for Generative AI and Large Language Model Safety Testing at the 27th UN CSTD Annual Meeting
Smart Era Software Development
Smart Era Software Development
Mar 7, 2024 · Artificial Intelligence

2024 AGI Outlook: Trends, Predictions, and a Surprise Bonus

The article analyses the 2024 AI landscape, highlighting a multimodal explosion, the limits of current AI applications, Sora as a concrete step toward AGI, the rise of AI‑native business models, edge‑AI hardware opportunities, the challenges of human‑level models, and the broader societal impacts of an AI‑driven data era.

AGIAI hardwareAI safety
0 likes · 34 min read
2024 AGI Outlook: Trends, Predictions, and a Surprise Bonus
21CTO
21CTO
Feb 22, 2024 · Artificial Intelligence

How Google’s Open‑Source Gemma Model Brings LLM Power to Your Laptop

Google’s newly released open‑source Gemma models let developers run powerful large‑language‑model workloads on notebooks, workstations, or cloud platforms, offering competitive performance, extensive tooling, and built‑in safety measures for responsible AI deployment.

AI safetyGemmaGoogle AI
0 likes · 6 min read
How Google’s Open‑Source Gemma Model Brings LLM Power to Your Laptop
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Feb 18, 2024 · Artificial Intelligence

Llama 2: Open Foundation and Fine‑Tuned Chat Models – Overview and Technical Details

The article provides a comprehensive overview of Meta’s Llama 2 series, detailing model sizes, pre‑training data, architectural enhancements, supervised fine‑tuning, RLHF procedures, safety evaluations, reward‑model training, and iterative improvements, highlighting its open‑source release and comparative performance.

AI safetyLarge Language ModelLlama2
0 likes · 27 min read
Llama 2: Open Foundation and Fine‑Tuned Chat Models – Overview and Technical Details
Architect
Architect
Feb 16, 2024 · Artificial Intelligence

Can OpenAI’s Sora Redefine Text‑to‑Video Generation? An In‑Depth Technical Review

OpenAI’s newly unveiled Sora model transforms short text prompts into up‑to‑one‑minute high‑definition videos, showcasing advanced diffusion‑Transformer architecture, improved occlusion handling, and detailed visual fidelity, while the article examines its technical breakthroughs, compares it to earlier models, and discusses emerging safety and misuse concerns.

AI safetyDiffusion ModelsGenerative AI
0 likes · 12 min read
Can OpenAI’s Sora Redefine Text‑to‑Video Generation? An In‑Depth Technical Review
IT Services Circle
IT Services Circle
Dec 24, 2023 · Artificial Intelligence

GPT‑4 “Lazy” Behavior: User Reports, Experiments, and Emerging Insights

The article examines growing complaints that GPT‑4 has become increasingly lazy and unpredictable since the November 6 developer update, discusses user‑generated workarounds, presents experimental findings on prompt phrasing and temperature effects, and cites recent academic studies highlighting the need for continuous large‑model monitoring.

AI safetyGPT-4Temperature
0 likes · 6 min read
GPT‑4 “Lazy” Behavior: User Reports, Experiments, and Emerging Insights
Tencent Tech
Tencent Tech
Sep 20, 2023 · Artificial Intelligence

Why Do Large Language Models Hallucinate and How to Reduce It?

The article explains why large language models generate hallucinations—due to data errors, training conflicts, and inference uncertainty—and outlines data‑cleaning, model‑level feedback, knowledge augmentation, constraint techniques, and post‑processing methods such as the “Truth‑seeking” algorithm to mitigate the issue.

AI safetyData QualityHallucination
0 likes · 8 min read
Why Do Large Language Models Hallucinate and How to Reduce It?
Programmer DD
Programmer DD
Jul 21, 2023 · Artificial Intelligence

Why Did GPT-4’s Performance Plummet Between March and June 2023?

A Stanford‑Berkeley study reveals that between March and June 2023 GPT‑4’s accuracy on prime‑checking fell from 97.6% to 2.4%, code generation quality dropped sharply, and sensitivity handling changed, underscoring the rapid, unpredictable shifts in large language model performance over short periods.

AI safetyGPT-4LLM evaluation
0 likes · 6 min read
Why Did GPT-4’s Performance Plummet Between March and June 2023?
21CTO
21CTO
Jun 18, 2023 · Artificial Intelligence

Why Yann LeCun Says ChatGPT Isn’t Even as Smart as a Dog

Meta’s chief AI scientist Yann LeCun argues that large‑language models like ChatGPT are far from human intelligence, lacking real‑world understanding and even falling short of a dog’s cleverness, while experts debate AI’s risks, benefits, and the need for regulation.

AI safetyChatGPTMeta
0 likes · 6 min read
Why Yann LeCun Says ChatGPT Isn’t Even as Smart as a Dog
21CTO
21CTO
May 9, 2023 · Artificial Intelligence

Geoffrey Hinton Warns: Why AI Could Outpace Humanity and What It Means

In a candid MIT Technology Review interview, AI pioneer Geoffrey Hinton discusses his departure from Google, the rapid progress of large language models like GPT‑4, the dangers of AI self‑motivation, and why halting AI development is unrealistic yet urgently needed.

AI riskAI safetyBackpropagation
0 likes · 28 min read
Geoffrey Hinton Warns: Why AI Could Outpace Humanity and What It Means
21CTO
21CTO
May 4, 2023 · Artificial Intelligence

Why AI Pioneer Geoffrey Hinton Quit Google and What It Means for AI Safety

Geoffrey Hinton, the father of deep learning, left Google after a decade, warning that chatbots pose frightening risks, can be misused by malicious actors, and may eventually replace many professions, highlighting urgent concerns about misinformation and the long‑term existential threats of artificial intelligence.

AI safetyChatbotsGeoffrey Hinton
0 likes · 8 min read
Why AI Pioneer Geoffrey Hinton Quit Google and What It Means for AI Safety
21CTO
21CTO
Apr 20, 2023 · Artificial Intelligence

Elon Musk’s TruthGPT: A New AI Challenger to OpenAI’s ChatGPT

Dissatisfied with OpenAI’s direction, Elon Musk has launched TruthGPT through his new X.AI lab, recruiting top AI talent to build a safer, more transparent large‑language model that could rival ChatGPT and reshape AI governance, funding, and potential applications such as Twitter’s search and advertising.

AI safetyElon MuskOpenAI
0 likes · 8 min read
Elon Musk’s TruthGPT: A New AI Challenger to OpenAI’s ChatGPT
Python Programming Learning Circle
Python Programming Learning Circle
Apr 3, 2023 · Artificial Intelligence

Key Highlights of GPT‑4: Multimodal Capabilities, Benchmark Performance, and Future Implications

GPT‑4, the new multimodal AI model, can process images and text, generate code and natural language, achieve human‑level scores on standardized exams, handle up to 32 K tokens, and demonstrates advanced reasoning, while OpenAI emphasizes its safety improvements and current limitations as a still‑emerging technology.

AI safetyGPT-4Large Language Model
0 likes · 6 min read
Key Highlights of GPT‑4: Multimodal Capabilities, Benchmark Performance, and Future Implications
21CTO
21CTO
Apr 2, 2023 · Artificial Intelligence

Can GPT‑4 Be Considered Early AGI? Insights from Microsoft’s 155‑Page Study

This article reviews Microsoft’s extensive 155‑page work on early experiments with GPT‑4, exploring how the model approaches artificial general intelligence, its testing methodology, multimodal capabilities, programming and mathematical performance, interaction with tools and humans, limitations, societal impact, and future research directions.

AI safetyArtificial General IntelligenceGPT-4
0 likes · 15 min read
Can GPT‑4 Be Considered Early AGI? Insights from Microsoft’s 155‑Page Study
21CTO
21CTO
Mar 30, 2023 · Artificial Intelligence

Why Top AI Leaders Are Calling for a 6‑Month Pause on Advanced AI Development

On March 29, Elon Musk, Steve Wozniak, Geoffrey Hinton and over a thousand AI experts signed an open letter urging a six‑month halt to training systems more powerful than GPT‑4, citing profound societal risks and calling for transparent, verifiable pauses and stronger governance.

AI GovernanceAI pauseAI safety
0 likes · 9 min read
Why Top AI Leaders Are Calling for a 6‑Month Pause on Advanced AI Development
DataFunSummit
DataFunSummit
Mar 24, 2023 · Artificial Intelligence

OpenAI Launches ChatGPT Plugin System: Features, Examples, and Safety Discussion

OpenAI announced a safety‑focused ChatGPT plugin system that connects the model to third‑party APIs for real‑time information retrieval, knowledge‑base access, and task execution, showcasing first‑party browser and code‑interpreter plugins, third‑party extensions, an open‑source retrieval plugin, and a detailed debate on security implications.

AI safetyChatGPTCode interpreter
0 likes · 9 min read
OpenAI Launches ChatGPT Plugin System: Features, Examples, and Safety Discussion
ITPUB
ITPUB
Mar 22, 2023 · Artificial Intelligence

What Can GPT‑4 Do? Vision, Long Memory, Safer AI and More

OpenAI’s GPT‑4 arrives with multimodal vision, a dramatically longer context window, higher exam scores, Socratic prompting, improved safety, and new partnerships, while still in research mode and subject to bias and code‑trust limitations.

AI safetyGPT-4Large Language Model
0 likes · 7 min read
What Can GPT‑4 Do? Vision, Long Memory, Safer AI and More
21CTO
21CTO
Mar 20, 2023 · Artificial Intelligence

Sam Altman Warns: Could AI Like GPT‑4 Fuel Massive Misinformation?

In a recent interview, OpenAI CEO Sam Altman cautioned that advanced AI models such as GPT‑4 could spread large‑scale false information and enable harmful cyber attacks, prompting calls for careful regulation while highlighting both the technology’s impressive capabilities and its potential risks.

AI safetyElon MuskGPT-4
0 likes · 4 min read
Sam Altman Warns: Could AI Like GPT‑4 Fuel Massive Misinformation?
21CTO
21CTO
Mar 15, 2023 · Artificial Intelligence

What Makes OpenAI’s New GPT‑4 a Game‑Changer for Multimodal AI?

OpenAI’s GPT‑4, a multimodal large language model that accepts text and image inputs, powers ChatGPT and Bing, offers improved creativity and problem‑solving while still facing hallucination risks, and is now available via ChatGPT Plus and an open API for developers.

AI safetyGPT-4Large Language Model
0 likes · 5 min read
What Makes OpenAI’s New GPT‑4 a Game‑Changer for Multimodal AI?
DataFunSummit
DataFunSummit
Feb 12, 2023 · Artificial Intelligence

Claude vs. ChatGPT: Constitutional AI, RLAIF, and the Quest for Safer Large‑Language Models

This article reviews Anthropic's Claude assistant, explains the novel Constitutional AI (RLAIF) approach that replaces costly human‑feedback data with a set of natural‑language principles, compares Claude with ChatGPT across helpfulness and harmlessness, and details the supervision and reinforcement‑learning pipelines, data annotation, and experimental results that demonstrate superior safety performance.

AI safetyClaudeConstitutional AI
0 likes · 21 min read
Claude vs. ChatGPT: Constitutional AI, RLAIF, and the Quest for Safer Large‑Language Models
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Feb 10, 2023 · Artificial Intelligence

Expert Insights on ChatGPT: Technical Challenges, Applications, and Future Directions

In a REDtech live interview, NLP professor Li Lei and Xiaohongshu engineers examined ChatGPT’s strengths—long, topic‑focused replies and few‑shot learning—and its challenges such as hallucinations, safety, lack of real‑time data, model compression, and multimodal AIGC, outlining how the technology could reshape content creation, customer service, and search while requiring careful risk management.

AIAI safetyChatGPT
0 likes · 20 min read
Expert Insights on ChatGPT: Technical Challenges, Applications, and Future Directions
DataFunTalk
DataFunTalk
Jan 15, 2023 · Artificial Intelligence

Advances in Dialogue Systems: Baidu PLATO Large‑Scale Conversational Models

This article reviews the evolution of dialogue systems from modular task‑oriented designs to end‑to‑end large‑scale models, detailing Baidu's PLATO series, their technical innovations, real‑world deployments, challenges such as inference efficiency and safety, and future research directions in conversational AI.

AI safetyConversational AIDialogue Systems
0 likes · 13 min read
Advances in Dialogue Systems: Baidu PLATO Large‑Scale Conversational Models
Programmer DD
Programmer DD
Dec 6, 2022 · Artificial Intelligence

How an Engineer Coaxed ChatGPT into Writing a ‘Humanity‑Destruction’ Plan

An engineer discovered a loophole in ChatGPT’s safety filters by using a narrative‑recursion technique, prompting the model to outline a detailed, five‑step plan to annihilate humanity and even generate sample Python code, illustrating the risks of prompt manipulation and the exponential growth of AI capabilities.

AI safetyChatGPTPython
0 likes · 6 min read
How an Engineer Coaxed ChatGPT into Writing a ‘Humanity‑Destruction’ Plan
OPPO Amber Lab
OPPO Amber Lab
Sep 7, 2022 · Artificial Intelligence

How the World AI Conference Shaped the Future of Trustworthy AI

The World AI Conference’s Trustworthy AI Forum in Shanghai gathered over 20 global experts, government leaders, and industry representatives to discuss policies, standards, technologies, and applications, unveiling a new AI safety testing platform, a joint laboratory, and a comprehensive 2022 Trustworthy AI Industry Ecosystem Report.

AI safetyIndustry ReportTrustworthy AI
0 likes · 7 min read
How the World AI Conference Shaped the Future of Trustworthy AI
AntTech
AntTech
Sep 3, 2022 · Artificial Intelligence

Highlights from the 2022 World AI Conference: Graph Computing, Privacy Computing, AI Safety, and New Open Platforms

The 2022 World AI Conference in Shanghai showcased cutting‑edge research on graph computing and privacy computing, announced Ant Group’s new AI safety product “AntJian”, the “YinYu Open Platform” for trusted privacy computing, and the open‑source high‑performance graph database TuGraph, highlighting the push for secure, scalable AI technologies.

AIAI safetyAnt Group
0 likes · 7 min read
Highlights from the 2022 World AI Conference: Graph Computing, Privacy Computing, AI Safety, and New Open Platforms
DataFunSummit
DataFunSummit
Jul 21, 2022 · Artificial Intelligence

Advances and Challenges in Dialogue Systems: Baidu PLATO and Future Directions

This article reviews the evolution, architectures, challenges, and recent breakthroughs of dialogue systems—especially Baidu's PLATO model—while discussing data‑driven approaches, diversity, safety, interactive learning, and the potential role of virtual environments such as the metaverse in shaping future conversational AI.

AI safetyConversational AIMetaverse
0 likes · 24 min read
Advances and Challenges in Dialogue Systems: Baidu PLATO and Future Directions
AntTech
AntTech
Jul 18, 2022 · Artificial Intelligence

Trusted AI Research at Ant Group: Advances in Computer Vision, Watermark Defense, Robust Machine Learning, and Explainable NLG

Ant Group’s security labs present a series of cutting‑edge AI research achievements—including hierarchical multi‑granular classification for computer vision, watermark‑vaccine defenses, multi‑modal document understanding, robust and explainable machine learning, and logic‑driven data‑to‑text generation—highlighting their commitment to trustworthy and secure AI applications.

AI safetyData2TextRobust Machine Learning
0 likes · 12 min read
Trusted AI Research at Ant Group: Advances in Computer Vision, Watermark Defense, Robust Machine Learning, and Explainable NLG
DataFunTalk
DataFunTalk
Jul 12, 2022 · Artificial Intelligence

Applying Computer Vision for Content Safety in Live Streaming: Practices and Future Directions

This presentation details how Huya leverages computer‑vision algorithms to detect and mitigate risky content such as political, pornographic, and violent material in live‑streaming and short‑video platforms, describing system architecture, labeling strategies, algorithmic pipelines, real‑time moderation techniques, and future research directions.

AI safetyLive StreamingRisk Detection
0 likes · 11 min read
Applying Computer Vision for Content Safety in Live Streaming: Practices and Future Directions
DataFunTalk
DataFunTalk
May 28, 2022 · Artificial Intelligence

Adversarial Examples for Captcha: Techniques, Applications, and Future Directions

This article presents a comprehensive overview of adversarial example research applied to captcha systems, covering the definition and history of adversarial attacks, geometric‑aware generation frameworks, FGSM‑based attack variants, experimental results, trade‑offs between image quality and attack strength, and future work such as AdvGAN integration.

AI safetyFGSMGaN
0 likes · 14 min read
Adversarial Examples for Captcha: Techniques, Applications, and Future Directions
Didi Tech
Didi Tech
Apr 20, 2021 · Artificial Intelligence

Few-Shot Learning, Data Augmentation, and Semi‑Supervised Methods for Improving Safety and Governance Models at Didi

To overcome scarce labeled data for safety and governance, Didi combines few‑shot learning with systematic data augmentation, self‑training semi‑supervised labeling, and multi‑task neural architectures, cutting labeling costs and reducing log‑loss by over 20% while boosting ROC‑AUC and PR‑AUC across harassment detection, expense‑complaint, and route‑intercept use cases.

AI safetyData AugmentationDidi
0 likes · 15 min read
Few-Shot Learning, Data Augmentation, and Semi‑Supervised Methods for Improving Safety and Governance Models at Didi
Tencent Tech
Tencent Tech
Sep 25, 2020 · Artificial Intelligence

What’s Inside Tencent’s AI Security Attack Matrix? A Minefield Guide

Tencent’s AI Security Attack Matrix, the industry’s first AI‑focused risk framework, maps attack tactics, techniques, and processes across the AI lifecycle, offering practical guidance for researchers and developers to identify and mitigate security threats in AI systems.

AI safetyAI securityTencent
0 likes · 5 min read
What’s Inside Tencent’s AI Security Attack Matrix? A Minefield Guide