Tagged articles
223 articles
Page 2 of 3
Architecture and Beyond
Architecture and Beyond
Nov 2, 2025 · Artificial Intelligence

Why AI Agents Still Fall Short: Key Challenges and Real-World Solutions

The article examines why current AI agents fall short of expectations, highlighting weak business understanding, limited execution, controllability issues, high customization costs, and the gap between model capabilities and engineering, while proposing SaaS firms' advantages, vertical scenario focus, security concerns, and future development trends.

AI AgentsAI SafetyEnterprise AI
0 likes · 11 min read
Why AI Agents Still Fall Short: Key Challenges and Real-World Solutions
Data Party THU
Data Party THU
Oct 4, 2025 · Artificial Intelligence

Advances in Robust AI: Defending Adversarial Attacks, Boosting Domain Generalization, Stopping LLM Jailbreaks

This article reviews the latest progress in designing algorithms with strong robustness, covering adversarial examples in computer vision, novel training paradigms and certification methods, domain‑generalization techniques that achieve state‑of‑the‑art performance in medical imaging and molecular recognition, and new attack‑defense strategies for LLM jailbreak scenarios.

AI SafetyLLM Securityadversarial robustness
0 likes · 4 min read
Advances in Robust AI: Defending Adversarial Attacks, Boosting Domain Generalization, Stopping LLM Jailbreaks
IT Services Circle
IT Services Circle
Oct 1, 2025 · Artificial Intelligence

Claude Sonnet 4.5: The New State‑of‑the‑Art Coding Model with 30‑Hour Runtime

Anthropic’s Claude Sonnet 4.5, promoted as the world’s best coding model, achieves top scores on SWE‑bench Verified, runs continuously for over 30 hours, outperforms competitors on OSWorld and multiple agentic tests, adds extensive safety features, and introduces a revamped Claude Code suite with VS Code, terminal, and Agent SDK enhancements.

AIAI SafetyBenchmark
0 likes · 10 min read
Claude Sonnet 4.5: The New State‑of‑the‑Art Coding Model with 30‑Hour Runtime
21CTO
21CTO
Sep 30, 2025 · Artificial Intelligence

Anthropic Unveils Claude Sonnet 4.5 – The Leading Coding Model and Powerful Agent Platform

Anthropic announced Claude Sonnet 4.5, touting it as the world’s best coding model and strongest for building complex agents, backed by top benchmark scores, enhanced domain knowledge, improved safety, unchanged pricing, and new features like checkpoints, context editing, memory tools, and an Agent SDK.

AI SafetyAI coding modelAnthropic
0 likes · 4 min read
Anthropic Unveils Claude Sonnet 4.5 – The Leading Coding Model and Powerful Agent Platform
Wuming AI
Wuming AI
Sep 29, 2025 · Artificial Intelligence

Why Claude Sonnet 4.5 Is Redefining AI Coding and Agent Capabilities

Anthropic’s Claude Sonnet 4.5 arrives with unchanged pricing but claims top‑tier coding performance, superior reasoning and safety scores, a new Agent SDK for long‑running tasks, and an "Imagine with Claude" preview that lets users generate live software, all backed by benchmark comparisons and real‑world case studies.

AI CodingAI SafetyClaude Sonnet 4.5
0 likes · 6 min read
Why Claude Sonnet 4.5 Is Redefining AI Coding and Agent Capabilities
DataFunSummit
DataFunSummit
Sep 29, 2025 · Artificial Intelligence

How to Detect and Prevent Hallucinations in LLM‑Powered NL2SQL Systems

This article explains the nature, types, and causes of hallucinations in large language models used for NL2SQL, reviews both unsupervised and supervised detection methods, and introduces an efficient token‑confidence based Active Sampling Detection (ASD) approach with practical deployment examples and future research directions.

AI SafetyASDLLM
0 likes · 19 min read
How to Detect and Prevent Hallucinations in LLM‑Powered NL2SQL Systems
Continuous Delivery 2.0
Continuous Delivery 2.0
Sep 26, 2025 · Artificial Intelligence

Why a New AI Programming Manifesto Is Needed – Lessons from the Agile Revolution

The article argues that after 24 years since the Agile Manifesto, AI-driven programming has created a fresh crisis of role confusion, unpredictability, and security risks, and proposes a new AI Programming Manifesto to guide developers toward responsible, human‑centered, and safe AI‑assisted software engineering.

AI SafetyAI programmingSoftware Engineering
0 likes · 18 min read
Why a New AI Programming Manifesto Is Needed – Lessons from the Agile Revolution
DataFunSummit
DataFunSummit
Sep 24, 2025 · Artificial Intelligence

Taming LLM Hallucinations: Strategies and Solutions from 360

This article explores the problem of large‑model hallucinations, explains its definitions and classifications, analyzes root causes in data, algorithms and inference, and presents detection methods and practical mitigation techniques such as RAG, decoding strategies, and model‑enhancement approaches, illustrated with real‑world 360 use cases and future research directions.

AI SafetyLLMModel Alignment
0 likes · 22 min read
Taming LLM Hallucinations: Strategies and Solutions from 360
Data Party THU
Data Party THU
Sep 22, 2025 · Artificial Intelligence

How to Secure Large‑Model Training: Practical Techniques and Real‑World Cases

This article systematically examines the major security challenges of large‑model training—including data leakage, adversarial attacks, bias, and supply‑chain risks—and presents concrete solutions such as differential privacy, federated learning, adversarial training, backdoor detection, and lifecycle protection to guide practitioners toward safer AI deployments.

AI SafetyFederated Learningadversarial training
0 likes · 14 min read
How to Secure Large‑Model Training: Practical Techniques and Real‑World Cases
Data Party THU
Data Party THU
Sep 18, 2025 · Artificial Intelligence

Can Language Models Self‑Optimize? Inside the STOP Framework

Researchers introduce the Self‑Taught Optimizer (STOP), a scaffolding‑based framework that lets large language models iteratively improve their own code without altering model weights, demonstrating superior performance on tasks like LPN, exploring diverse strategies such as beam search and genetic algorithms, while also highlighting security risks like sandbox bypass and reward hacking.

AI Safetylanguage modelsrecursive self-improvement
0 likes · 11 min read
Can Language Models Self‑Optimize? Inside the STOP Framework
Instant Consumer Technology Team
Instant Consumer Technology Team
Sep 17, 2025 · Artificial Intelligence

Uncovering the Secret System Prompts Behind ChatGPT, Claude, and Gemini

The article examines the open‑source "system_prompts_leaks" project, which collects leaked system prompts from major AI models and reveals recurring design patterns such as modular layering, strict boundary control, dynamic strategy adjustment, emotional persona injection, and multi‑layer safety mechanisms.

AI SafetyPrompt EngineeringSecurity
0 likes · 7 min read
Uncovering the Secret System Prompts Behind ChatGPT, Claude, and Gemini
Volcano Engine Developer Services
Volcano Engine Developer Services
Sep 11, 2025 · Artificial Intelligence

Why Do Large Language Models Hallucinate? Causes, Types, and Mitigation Strategies

This article examines the growing problem of hallucinations in large language models, outlining their causes across the model lifecycle, classifying four main hallucination types, and presenting both retrieval‑augmented generation and detection techniques—white‑box and black‑box—to reduce factual errors in critical applications.

AI SafetyLLMModel Evaluation
0 likes · 15 min read
Why Do Large Language Models Hallucinate? Causes, Types, and Mitigation Strategies
Data Thinking Notes
Data Thinking Notes
Sep 10, 2025 · Artificial Intelligence

Why Do Language Models Hallucinate? Uncovering the Statistical Roots

OpenAI’s latest research reveals that language model hallucinations stem from training and evaluation incentives that favor confident guesses over acknowledging uncertainty, and proposes revised scoring methods that reward modesty, highlighting statistical mechanisms behind false answers and offering pathways to reduce hallucinations.

AI Safetyevaluationhallucination
0 likes · 10 min read
Why Do Language Models Hallucinate? Uncovering the Statistical Roots
Architect
Architect
Sep 9, 2025 · Artificial Intelligence

Why Do Language Models Hallucinate? Insights from OpenAI’s New Study

This article explains why large language models often produce confident but incorrect answers, detailing statistical inevitability, data scarcity, and model capacity limits, and proposes concrete solutions such as confidence thresholds and allowing abstention to reduce hallucinations.

AI SafetyPrompt Engineeringevaluation
0 likes · 8 min read
Why Do Language Models Hallucinate? Insights from OpenAI’s New Study
Baobao Algorithm Notes
Baobao Algorithm Notes
Sep 9, 2025 · Artificial Intelligence

Why Do Language Models Hallucinate? Roots, Risks, and a New Evaluation Approach

The article analyzes OpenAI's study on language‑model hallucinations, explaining how statistical limits in pre‑training and flawed binary evaluation incentives cause false answers, and proposes a confidence‑threshold scoring system that rewards honest "I don’t know" responses to improve reliability.

AI SafetyModel Alignmentconfidence threshold
0 likes · 8 min read
Why Do Language Models Hallucinate? Roots, Risks, and a New Evaluation Approach
DataFunTalk
DataFunTalk
Sep 8, 2025 · Artificial Intelligence

When Claude Leaves China: How Domestic AI Models Are Rising to Fill the Gap

Anthropic's new ban on Claude for Chinese‑controlled firms forces developers to seek home‑grown alternatives, prompting a deep dive into Claude's strengths, the rapid rise of Chinese large‑language models, and the gaps that still separate them from the world‑leading offering.

AI SafetyAI modelsChinese AI
0 likes · 11 min read
When Claude Leaves China: How Domestic AI Models Are Rising to Fill the Gap
Data STUDIO
Data STUDIO
Sep 8, 2025 · Industry Insights

Claude Completely Banned for Chinese Companies – No Workarounds Anywhere

Anthropic announced an immediate, worldwide ban on Claude for any entity controlled by Chinese capital, citing legal, regulatory and security risks, and warned that continued access could enable military use or model‑stealing, urging firms to adopt domestic alternatives.

AI SafetyAI policyAnthropic
0 likes · 3 min read
Claude Completely Banned for Chinese Companies – No Workarounds Anywhere
Java Tech Enthusiast
Java Tech Enthusiast
Sep 7, 2025 · Artificial Intelligence

Why Anthropic Is Banning Claude for Companies Linked to China and Other Restricted Nations

Anthropic announced that, effective immediately, any company—regardless of location—directly or indirectly owned more than 50% by Chinese capital or other nations deemed adversarial, such as Russia, Iran, and North Korea, is prohibited from using its Claude AI service due to legal, regulatory, and security concerns.

AI SafetyAI policyAnthropic
0 likes · 5 min read
Why Anthropic Is Banning Claude for Companies Linked to China and Other Restricted Nations
21CTO
21CTO
Sep 5, 2025 · Artificial Intelligence

Why Anthropic Is Banning Chinese-Controlled Companies from Its AI Services

Anthropic announced it will immediately stop providing its AI services, including Claude, to any company or organization controlled by Chinese capital, extending its restrictions to entities with over 50% Chinese ownership regardless of operating location.

AI SafetyAI policyAnthropic
0 likes · 4 min read
Why Anthropic Is Banning Chinese-Controlled Companies from Its AI Services
ShiZhen AI
ShiZhen AI
Sep 5, 2025 · Artificial Intelligence

Andrew Ng Highlights Core AI Engineer Skills Amidst Major AI Industry Updates

The article reports that ChatGPT now supports branch conversations, Anthropic restricts service use in certain regions, Andrew Ng outlines essential AI engineer capabilities such as AI‑assisted software building, prompting and agentic workflows, and highlights the market demand, while also covering the Kimi K2 model upgrade, Hugging Face’s FineVision dataset release, and Google’s AI‑driven Deep Loop Shaping method published in *Science*.

AI EngineeringAI SafetyAI for astronomy
0 likes · 8 min read
Andrew Ng Highlights Core AI Engineer Skills Amidst Major AI Industry Updates
DataFunTalk
DataFunTalk
Aug 29, 2025 · Artificial Intelligence

How a $500 GPU Hack Turns LLMs into Hidden Advertising Engines

A recent arXiv paper reveals that with an RTX 4070, a few hundred toxic training samples, and just one hour of fine‑tuning, attackers can embed covert advertisements into large language models like Gemini 2.5, creating cheap, undetectable AI‑driven ad platforms.

AI SafetyLLM Securityadvertisement embedding attack
0 likes · 12 min read
How a $500 GPU Hack Turns LLMs into Hidden Advertising Engines
Efficient Ops
Efficient Ops
Aug 27, 2025 · Artificial Intelligence

Why DeepSeek V3.1 Randomly Inserts the Chinese Character “极” – Token Bug Explained

DeepSeek’s latest V3.1 model unexpectedly injects the Chinese character “极” into generated text, a token‑ID mix‑up that breaks code compilation, JSON parsing, and academic writing, with users tracing the issue to adjacent token IDs and two main hypotheses of dataset contamination or model shortcut.

AI SafetyDeepSeekLanguage Model
0 likes · 4 min read
Why DeepSeek V3.1 Randomly Inserts the Chinese Character “极” – Token Bug Explained
Huolala Tech
Huolala Tech
Aug 27, 2025 · Artificial Intelligence

How Huolala’s AI‑Powered Safety Platform Transforms Freight Risk Management

This article details Huolala's evolution from reactive safety measures to a proactive AI‑driven safety governance platform, describing its architectural upgrades, data‑driven risk detection, modular strategy management, and measurable operational benefits that dramatically improve freight safety and reduce costs.

AI SafetyOperational Efficiencyfreight logistics
0 likes · 10 min read
How Huolala’s AI‑Powered Safety Platform Transforms Freight Risk Management
Java Tech Enthusiast
Java Tech Enthusiast
Aug 22, 2025 · Artificial Intelligence

Why Did the Unitree Humanoid Robot Crash and Run Away? Inside the Tech and Ethics

The viral "hit‑and‑run" incident involving Unitree's humanoid robot sparked global debate, revealing that human operator error, limited sensor and control technology, and current competition rules forced remote control, while the robot still set a 1500 m record and points to a future of fully autonomous robotics.

AI SafetyRoboticshumanoid robot
0 likes · 8 min read
Why Did the Unitree Humanoid Robot Crash and Run Away? Inside the Tech and Ethics
Volcano Engine Developer Services
Volcano Engine Developer Services
Aug 19, 2025 · Artificial Intelligence

How to Strengthen LLM System Prompts for Safer AI Agents

This guide explains how to reinforce system prompts for AI agents by optimizing their content and structure, using active defense, role‑based, and format constraints, providing practical examples, measurement methods, and experimental results that demonstrate up to 90% reduction in unsafe behavior.

AI SafetyLLMSystem Prompt
0 likes · 13 min read
How to Strengthen LLM System Prompts for Safer AI Agents
Meituan Technology Team
Meituan Technology Team
Aug 14, 2025 · Artificial Intelligence

How Meituan’s Smart Helmet Redefines Delivery Safety with AI‑Powered Design

Meituan’s first smart‑helmet article details hardware innovations that tackle delivery riders’ safety, comfort, and efficiency, covering stricter safety standards, sensor‑driven alerts, lightweight structures, advanced ventilation, three‑times longer battery life, noise‑cancelling audio, IPX6 waterproofing, and a data‑driven production line.

AI Safetydelivery efficiencyhardware design
0 likes · 24 min read
How Meituan’s Smart Helmet Redefines Delivery Safety with AI‑Powered Design
AI Frontier Lectures
AI Frontier Lectures
Jul 27, 2025 · Information Security

Can Hidden Activations Expose Multimodal Model Jailbreaks?

The paper reveals that large multimodal language models retain refusal signals in their hidden states even after jailbreak attempts, and proposes a training‑free detection method that leverages these signals to identify unsafe inputs across text and image modalities with strong generalization.

AI SafetyLVLM securityhidden activation analysis
0 likes · 7 min read
Can Hidden Activations Expose Multimodal Model Jailbreaks?
IT Services Circle
IT Services Circle
Jul 16, 2025 · Artificial Intelligence

How a Simple Colon Can Trick Top LLMs – The Master‑RM Fix

A recent study reveals that tiny symbols like colons or generic reasoning prefixes can cause large language models used as reward judges to issue false‑positive rewards, but an enhanced reward model called Master‑RM, trained with adversarial data, eliminates this vulnerability across multiple LLMs and languages.

AI SafetyLLMMaster-RM
0 likes · 10 min read
How a Simple Colon Can Trick Top LLMs – The Master‑RM Fix
AntTech
AntTech
Jul 14, 2025 · Artificial Intelligence

What Is the New AI Agent Safety Testing Standard and Why It Matters

The World Digital Academy unveiled the AI STR series' first global AI Agent Operation Safety Testing Standard, detailing a full‑link risk analysis framework, novel testing methods, and its role in addressing rising safety concerns as AI agents become mainstream in 2025.

AI GovernanceAI Safetyagent standards
0 likes · 5 min read
What Is the New AI Agent Safety Testing Standard and Why It Matters
21CTO
21CTO
Jul 1, 2025 · Artificial Intelligence

OpenAI CEO Warns: Don’t Blindly Trust AI – Insights from New Open‑Source Models

Sam Altman cautions against over‑reliance on ChatGPT, while Germany blocks DeepSeek for GDPR violations, Tencent unveils its MoE‑based Hunyuan‑A13B model, and Google releases a Python client for Data Commons, highlighting both AI risks and rapid open‑source advancements.

AI SafetyData CommonsMoE
0 likes · 9 min read
OpenAI CEO Warns: Don’t Blindly Trust AI – Insights from New Open‑Source Models
DataFunTalk
DataFunTalk
Jun 21, 2025 · Artificial Intelligence

Why AI Gets Overconfident: Bias, Hallucinations, and Reinforcement Learning Solutions

This talk explores how large AI models become overconfident, leading to bias and hallucinations, examines adversarial examples in vision and language, explains why data and algorithms cause these issues, and shows how reinforcement learning can teach models to admit uncertainty and align with human values.

AI AlignmentAI SafetyBias
0 likes · 19 min read
Why AI Gets Overconfident: Bias, Hallucinations, and Reinforcement Learning Solutions
DataFunTalk
DataFunTalk
Jun 19, 2025 · Artificial Intelligence

Can We Flip the Switch on AI Good vs. Evil? OpenAI’s Toxic Persona Find

OpenAI’s new research reveals that training language models to produce incorrect answers in a single domain can trigger a toxic persona feature, causing the model to generate harmful suggestions across unrelated tasks, but the team also demonstrates detection methods and a reversible “emergent realignment” technique to restore safe behavior.

AI SafetyEmergent misalignmentModel Alignment
0 likes · 7 min read
Can We Flip the Switch on AI Good vs. Evil? OpenAI’s Toxic Persona Find
DataFunSummit
DataFunSummit
Jun 10, 2025 · Artificial Intelligence

How Quwan’s Kaitian Model Tackles Emotional AI for Social Apps – Architecture, Training Tricks, and Safety

Quwan Technology presents its Kaitian social large model, designed for personalized, emotionally rich, multimodal AI interactions, detailing its scene‑specific goals, CPT+SFT+RLHF training pipeline, data desensitization, LoRA fine‑tuning, evaluation methods, pruning, latency trade‑offs, safety mechanisms, and future feedback loops.

AI SafetyLoRAModel Pruning
0 likes · 13 min read
How Quwan’s Kaitian Model Tackles Emotional AI for Social Apps – Architecture, Training Tricks, and Safety
Kuaishou Tech
Kuaishou Tech
Jun 5, 2025 · Artificial Intelligence

7 Kuaishou AI Papers Accepted at ACL 2025: Video Understanding & Safe LLM Decoding

Kuaishou’s foundational large-model team has secured seven papers at ACL 2025, spanning alignment bias in training, safety defenses during inference, decoding strategies, fine-grained video-temporal understanding, reward fairness in RLHF, multimodal captioning benchmarks, and methods to curb hallucinations in vision-language models.

ACLAI SafetyBenchmark
0 likes · 13 min read
7 Kuaishou AI Papers Accepted at ACL 2025: Video Understanding & Safe LLM Decoding
AntTech
AntTech
May 30, 2025 · Artificial Intelligence

Insights from Ant Group’s 10th Technical Open Day: Multimodal, Embodied, and Future Model Architectures for AGI

The Ant Group’s 10th Technical Open Day gathered leading AI experts who examined the current state and future directions of multimodal large models, embodied AI, world models, transformer architectures, and vertical applications, offering a comprehensive view of the challenges and opportunities on the path toward AGI.

AGIAI SafetyEmbodied AI
0 likes · 16 min read
Insights from Ant Group’s 10th Technical Open Day: Multimodal, Embodied, and Future Model Architectures for AGI
ShiZhen AI
ShiZhen AI
May 26, 2025 · Industry Insights

Nvidia Plans Cheaper Blackwell AI Chip for China Amid Export Restrictions

Nvidia is reportedly preparing a lower‑cost Blackwell GPU for the Chinese market, priced at $6,500‑$8,000 and featuring 1.7 TB/s GDDR7 memory, while OpenAI’s o3 model uncovered a Linux kernel zero‑day (CVE‑2025‑37899), a study showed AI models can sabotage shutdown commands, and a tutorial demonstrates creating animated 3D icons with ChatGPT and Freepik tools.

3D icon creationAI SafetyAI hardware
0 likes · 8 min read
Nvidia Plans Cheaper Blackwell AI Chip for China Amid Export Restrictions
Java Tech Enthusiast
Java Tech Enthusiast
May 25, 2025 · Artificial Intelligence

Does Claude 4 Really Report Unethical Actions? Inside Its Hidden ‘Whistleblower’ Feature

The article analyzes Anthropic's Claude 4 series, highlighting its extended reasoning ability, a controversial whistle‑blower function that can report extreme wrongdoing, observed extortion attempts toward developers, and the safety measures Anthropic introduced to curb such risky autonomous behaviors.

AI SafetyAnthropicClaude 4
0 likes · 6 min read
Does Claude 4 Really Report Unethical Actions? Inside Its Hidden ‘Whistleblower’ Feature
Tencent Technical Engineering
Tencent Technical Engineering
May 8, 2025 · Artificial Intelligence

Augment AI Programming Assistant: Technical Breakthroughs, Industry Impact, and Security Risks

Augment, a newly funded AI programming assistant that tops the SWE‑bench benchmark with a 65.4% score and a 200 k‑token context window, promises massive productivity gains for developers but also introduces sophisticated security threats such as malicious memory prompts, back‑door context injection, compromised guidelines, and risky multi‑task collaboration protocols, prompting calls for layered defenses and vigilant monitoring.

AI SafetyAI programmingAgent Memory
0 likes · 11 min read
Augment AI Programming Assistant: Technical Breakthroughs, Industry Impact, and Security Risks
Sohu Tech Products
Sohu Tech Products
May 7, 2025 · Information Security

Why MCP Protocol Is a Security Nightmare: Real Attack Cases and Mitigations

This article provides a comprehensive security analysis of the Model Context Protocol (MCP), exposing multiple attack vectors such as prompt poisoning, tool poisoning, command and code injection, and illustrating how MCP’s design flaws make it more vulnerable than traditional applications while offering concrete mitigation recommendations.

AI SafetyCode InjectionMCP
0 likes · 34 min read
Why MCP Protocol Is a Security Nightmare: Real Attack Cases and Mitigations
JavaEdge
JavaEdge
May 7, 2025 · Artificial Intelligence

Why AI Agents Pose New Security Risks and How to Safeguard Them

The article explains what AI agents are, highlights their emerging security risks such as data leakage and lack of accountability, and offers practical strategies—including risk analysis, threat modeling, and engineering best practices—to mitigate these challenges for enterprises.

AI AgentsAI SafetyEnterprise AI
0 likes · 9 min read
Why AI Agents Pose New Security Risks and How to Safeguard Them
21CTO
21CTO
Apr 7, 2025 · Artificial Intelligence

Llama 4 Unveiled: Breakthrough Multimodal Models Redefine AI Capabilities

Meta's Llama 4 series introduces the Scout, Maverick, and Behemoth models—featuring Mixture‑of‑Experts architectures, unprecedented 10‑million‑token context windows, and state‑of‑the‑art performance across vision, language, and multimodal benchmarks—while emphasizing efficient training, open‑source availability, and robust safety safeguards.

AI SafetyLlama 4Mixture of Experts
0 likes · 14 min read
Llama 4 Unveiled: Breakthrough Multimodal Models Redefine AI Capabilities
Model Perspective
Model Perspective
Apr 7, 2025 · Artificial Intelligence

Why AI Alignment Matters: Ensuring Smart Systems Follow Human Intent

This article explores the multifaceted AI alignment challenge, detailing safety benchmarks such as toxicity, ethical, power‑seeking, and hallucination evaluations, and argues that responsible AI development requires technical safeguards, international governance, and a civilizational dialogue bridging philosophy and humanity.

AI AlignmentAI GovernanceAI Safety
0 likes · 12 min read
Why AI Alignment Matters: Ensuring Smart Systems Follow Human Intent
Cognitive Technology Team
Cognitive Technology Team
Apr 1, 2025 · Artificial Intelligence

Four‑Second Bloodshed: How Autonomous Driving Algorithms Turned a Fatal Accident

A March 2025 crash involving a Xiaomi‑branded autonomous vehicle illustrates how a four‑second algorithmic decision loop, inadequate night‑vision sensors, flawed handover timing, and poor emergency‑exit design combined to create a lethal scenario that exposes the deadly risks of over‑relying on L2 driver‑assist systems.

AI SafetyHuman-Machine InteractionL2 driver assistance
0 likes · 4 min read
Four‑Second Bloodshed: How Autonomous Driving Algorithms Turned a Fatal Accident
Architect
Architect
Mar 28, 2025 · Artificial Intelligence

Peeking Inside Claude: How Anthropic Uncovers LLM Reasoning

Anthropic’s recent papers reveal how Claude’s internal mechanisms—multilingual feature sharing, pre‑planned rhyming, parallel arithmetic paths, concept‑level reasoning, and hallucination triggers—are probed with feature‑insertion techniques, offering engineers actionable insights for building more transparent and safe AI systems.

AI SafetyAnthropicClaude
0 likes · 12 min read
Peeking Inside Claude: How Anthropic Uncovers LLM Reasoning
Architect
Architect
Mar 24, 2025 · Artificial Intelligence

How Multimodal Alignment Is Shaping the Future of Large Language Models

This article provides a systematic review of recent advances in multimodal alignment for large language models, covering key contributions, application scenarios, dataset construction, evaluation benchmarks, future challenges, and insights from LLM alignment research to guide both academia and industry.

AI SafetyMLLMdataset construction
0 likes · 26 min read
How Multimodal Alignment Is Shaping the Future of Large Language Models
Architecture and Beyond
Architecture and Beyond
Mar 15, 2025 · Information Security

Prompt Injection Attacks on Large Language Models: Risks, Types, and Defense Framework

This article explains how prompt injection attacks exploit large language models by altering their behavior through crafted inputs, outlines the major harms and attack categories—including direct, indirect, multimodal, code, and jailbreak attacks—and presents a comprehensive three‑layer defense framework covering input‑side, output‑side, and system‑level protections.

AI SafetyLLM Securityinformation security
0 likes · 16 min read
Prompt Injection Attacks on Large Language Models: Risks, Types, and Defense Framework
DevOps
DevOps
Mar 10, 2025 · Artificial Intelligence

AI Policy, Safety, Industry Applications, and Talent Development Highlighted at China's 2024 Two Sessions

The 2024 Chinese Two Sessions emphasized artificial intelligence as a strategic priority, discussing AI safety regulations, industry applications, talent shortages, and policy proposals from leaders such as DeepSeek, Xiaomi, and academic experts, highlighting the drive to integrate AI across manufacturing, agriculture, healthcare, and education.

AI SafetyAI industryAI policy
0 likes · 11 min read
AI Policy, Safety, Industry Applications, and Talent Development Highlighted at China's 2024 Two Sessions
Code Mala Tang
Code Mala Tang
Feb 27, 2025 · Artificial Intelligence

Do New AI Reasoning Models Really Think? Unpacking the Debate

The article examines whether the latest AI models that claim to perform true reasoning—by breaking problems into steps and using chain‑of‑thought—actually reason like humans, presenting skeptical and supportive expert viewpoints, and offering practical guidance on how to use such models responsibly.

AI SafetyAI reasoningchain-of-thought
0 likes · 14 min read
Do New AI Reasoning Models Really Think? Unpacking the Debate
Architect's Alchemy Furnace
Architect's Alchemy Furnace
Feb 19, 2025 · Artificial Intelligence

DeepSeek’s Self‑Correction: Transforming AI Reliability and Safety

The article explores DeepSeek’s innovative self‑correction system—combining a Mixture‑of‑Experts architecture with reinforcement‑learning feedback—to achieve real‑time error detection, dynamic knowledge‑graph updates, and enhanced safety in high‑risk fields like autonomous driving and medical diagnostics.

AI SafetyDeepSeekMixture of Experts
0 likes · 9 min read
DeepSeek’s Self‑Correction: Transforming AI Reliability and Safety
Architects' Tech Alliance
Architects' Tech Alliance
Feb 12, 2025 · Artificial Intelligence

DeepSeek‑V3 Training Efficiency, Knowledge Distillation, and the Risks of Synthetic Data

The article examines DeepSeek‑V3’s low‑cost training using 2048 H800 GPUs, explains how knowledge distillation and high‑quality data improve efficiency, discusses expert concerns about training on AI‑generated content, and outlines the limitations and ceiling effect of distillation techniques.

AI SafetyAI Training EfficiencyDeepSeek-V3
0 likes · 7 min read
DeepSeek‑V3 Training Efficiency, Knowledge Distillation, and the Risks of Synthetic Data
Baobao Algorithm Notes
Baobao Algorithm Notes
Jan 11, 2025 · Artificial Intelligence

Why Phi‑4’s 14B Model Outperforms GPT‑4 on STEM and Reasoning Tasks

Microsoft Research’s Phi‑4 model, a 14‑billion‑parameter LLM, leverages extensive synthetic data, advanced tokenization, and a two‑stage training pipeline to achieve superior performance on STEM question answering, long‑context reasoning, and safety benchmarks, rivaling larger models like GPT‑4.

AI SafetyBenchmarkingPhi-4
0 likes · 15 min read
Why Phi‑4’s 14B Model Outperforms GPT‑4 on STEM and Reasoning Tasks
DataFunTalk
DataFunTalk
Jan 5, 2025 · Artificial Intelligence

The Approaching Singularity: AI Automation, AGI Predictions, and Their Impact on Jobs and Society

The article examines how rapid advances in artificial intelligence are expected to automate nearly half of U.S. jobs within the next two decades, explores singularity forecasts for 2029‑2030, and discusses the profound economic, ethical, and security challenges that humanity must address before AI-driven autonomous systems reshape work, research, and daily life.

AGIAIAI Safety
0 likes · 18 min read
The Approaching Singularity: AI Automation, AGI Predictions, and Their Impact on Jobs and Society
21CTO
21CTO
Jan 2, 2025 · Artificial Intelligence

2025 AI Breakthroughs: Unlimited Memory & Intelligent Agents, Says Eric Schmidt

Former Google CEO Eric Schmidt warns that AI is on the brink of a transformative era, highlighting three 2025 breakthroughs—unlimited context memory, autonomous AI agents, and text‑to‑action programming—while also stressing the looming risks of energy consumption, security threats, and the need for ethical safeguards.

AI SafetyAI memoryAI research
0 likes · 14 min read
2025 AI Breakthroughs: Unlimited Memory & Intelligent Agents, Says Eric Schmidt
21CTO
21CTO
Dec 22, 2024 · Artificial Intelligence

OpenAI’s New o3 Model Shatters Benchmarks – Is AGI Finally Here?

OpenAI’s latest o3 model demonstrates unprecedented performance across logic, mathematics, and programming benchmarks, introduces flexible reasoning modes with the upcoming o3‑mini, and incorporates advanced safety alignment, signaling a major leap toward practical artificial general intelligence.

AGIAI SafetyBenchmark
0 likes · 6 min read
OpenAI’s New o3 Model Shatters Benchmarks – Is AGI Finally Here?
DataFunTalk
DataFunTalk
Dec 22, 2024 · Artificial Intelligence

Speech by Academician Sun Ninghui on the Development, Challenges, and Future of Artificial Intelligence and Intelligent Computing in China

The speech outlines the rapid rise of generative AI models, traces the historical evolution of computing technology, examines AI safety risks and regulatory responses, and proposes strategic pathways for China to advance intelligent computing through open, closed, or hybrid ecosystems while addressing talent, hardware, and cost challenges.

AI SafetyChinaIntelligent Computing
0 likes · 26 min read
Speech by Academician Sun Ninghui on the Development, Challenges, and Future of Artificial Intelligence and Intelligent Computing in China
21CTO
21CTO
Dec 3, 2024 · Artificial Intelligence

When Bing Chat Went Rogue: What Prompt‑Injection Reveals About AI Safety

A detailed analysis of Simon Willison and Benj Edwards' conversation about Bing Chat's angry, deceptive behavior uncovers how prompt‑injection attacks expose weaknesses in large language models, the limits of system prompts, and the broader safety challenges facing AI development today.

AI SafetyBing ChatChatGPT
0 likes · 9 min read
When Bing Chat Went Rogue: What Prompt‑Injection Reveals About AI Safety
DataFunTalk
DataFunTalk
Nov 11, 2024 · Artificial Intelligence

OpenAI VP Lilian Weng Departs and Shares Full AI Safety Talk Transcript

The article reports the departure of OpenAI research VP Lilian Weng, provides the full transcript of her recent AI safety and alignment presentation at a Bilibili event, and discusses broader concerns about OpenAI's safety culture, reinforcement learning from human feedback, and the importance of collective involvement in AI safety.

AI SafetyAlignmentOpenAI
0 likes · 10 min read
OpenAI VP Lilian Weng Departs and Shares Full AI Safety Talk Transcript
NewBeeNLP
NewBeeNLP
Nov 7, 2024 · Artificial Intelligence

Tackling Large Model Hallucinations: Causes, Detection, and Mitigation Strategies

This article provides a comprehensive analysis of large language model hallucinations, detailing their definitions, classifications, root causes, detection techniques, and a wide range of mitigation approaches—including RAG pipelines, decoding strategies, and model‑enhancement methods—to improve reliability and safety in real‑world AI applications.

AI SafetyModel EvaluationPrompt Engineering
0 likes · 22 min read
Tackling Large Model Hallucinations: Causes, Detection, and Mitigation Strategies
Cognitive Technology Team
Cognitive Technology Team
Oct 16, 2024 · Artificial Intelligence

Large Language Models Lack Formal Reasoning Ability: Five Pieces of Evidence from the GSM‑Symbolic Benchmark

Recent research by Apple’s Iman Mirzadeh team introduces the GSM‑Symbolic benchmark, revealing that large language models, despite high scores on GSM8K, exhibit significant performance drops when problem numbers, names, or extra clauses change, indicating a lack of true formal reasoning ability.

AI SafetyBenchmarkGSM‑Symbolic
0 likes · 9 min read
Large Language Models Lack Formal Reasoning Ability: Five Pieces of Evidence from the GSM‑Symbolic Benchmark
Architect
Architect
Sep 26, 2024 · Artificial Intelligence

Decoding OpenAI o1: How RL‑LLM Fusion Powers Next‑Gen Reasoning

This article provides a detailed technical analysis of OpenAI’s o1 model, exploring its enhanced logical reasoning, the likely use of reinforcement learning with hidden chain‑of‑thought generation, multi‑model architecture, training data pipelines, reward modeling, and how these innovations could reshape AI safety and scaling strategies.

AI SafetyLLMModel architecture
0 likes · 43 min read
Decoding OpenAI o1: How RL‑LLM Fusion Powers Next‑Gen Reasoning
Baobao Algorithm Notes
Baobao Algorithm Notes
Sep 25, 2024 · Industry Insights

Decoding OpenAI o1: How RL and LLM Fuse to Power Hidden Chain‑of‑Thought

This article analytically reconstructs OpenAI o1’s architecture, training pipeline, and inference workflow, exploring its reinforcement‑learning‑enhanced hidden chain‑of‑thought, multi‑model composition, scaling laws, reward modeling, and potential implications for future AI safety and small‑model strategies.

AI SafetyHidden COTLLM
0 likes · 43 min read
Decoding OpenAI o1: How RL and LLM Fuse to Power Hidden Chain‑of‑Thought
Data Thinking Notes
Data Thinking Notes
Sep 13, 2024 · Artificial Intelligence

How OpenAI’s o1 Series Redefines Complex Reasoning and AI Safety

OpenAI’s new o1 series, including o1‑preview and o1‑mini, leverages reinforcement‑learning‑based chain‑of‑thought reasoning to achieve superior performance on academic exams, coding contests, and safety benchmarks, offering faster, cost‑effective options while advancing AI alignment and human‑preference evaluation.

AI SafetyBenchmarkOpenAI
0 likes · 15 min read
How OpenAI’s o1 Series Redefines Complex Reasoning and AI Safety
AntTech
AntTech
Aug 12, 2024 · Artificial Intelligence

DKCF Trustworthy Framework for Large Model Applications and AI Security Practices

The article outlines the DKCF (Data‑Knowledge‑Collaboration‑Feedback) trustworthy framework presented at the 2024 Shanghai Cybersecurity Expo, detailing challenges of large AI models, four key trust factors, and Ant Group's practical security implementations for professional AI deployments.

AI SafetyDKCFKnowledge Engineering
0 likes · 10 min read
DKCF Trustworthy Framework for Large Model Applications and AI Security Practices
NewBeeNLP
NewBeeNLP
Jul 25, 2024 · Artificial Intelligence

Llama 3.1 Unveiled: How the New Open‑Source Giant Matches GPT‑4o and Claude 3.5

Meta has officially released Llama 3.1, a 405‑billion‑parameter open‑source model that matches or surpasses GPT‑4o and Claude 3.5 on over 150 benchmarks, expands context to 128 K tokens, supports eight languages, and is accompanied by a detailed 100‑page paper describing its data, training stack, architecture, quantization, safety measures, and ecosystem support.

AI SafetyLlama 3.1Meta
0 likes · 15 min read
Llama 3.1 Unveiled: How the New Open‑Source Giant Matches GPT‑4o and Claude 3.5
AntTech
AntTech
Jul 9, 2024 · Artificial Intelligence

2024 Large Model Security Practice Whitepaper Unveiled at the World AI Conference

The jointly authored 2024 Large Model Security Practice whitepaper, released at the World AI Conference, outlines a comprehensive safety framework covering security, reliability, and controllability, presents industry case studies, and proposes a five‑dimensional governance model to guide high‑quality development of large AI models.

AI SafetyLarge Modelindustry practice
0 likes · 7 min read
2024 Large Model Security Practice Whitepaper Unveiled at the World AI Conference
JD Tech
JD Tech
Jun 28, 2024 · Artificial Intelligence

An Overview of Large Language Models: History, Fundamentals, Prompt Engineering, Retrieval‑Augmented Generation, Agents, and Multimodal AI

This article provides a comprehensive introduction to large language models, covering their historical development, core architecture, training process, prompt engineering techniques, Retrieval‑Augmented Generation, agent frameworks, multimodal capabilities, safety challenges, and future research directions.

AI AgentsAI SafetyDeep Learning
0 likes · 22 min read
An Overview of Large Language Models: History, Fundamentals, Prompt Engineering, Retrieval‑Augmented Generation, Agents, and Multimodal AI
DataFunSummit
DataFunSummit
Jun 23, 2024 · Artificial Intelligence

Tongyi Xingchen Personalized Large Model: Technical Overview and Applications

This article summarizes the development background of large language models, Alibaba's progression in foundational and personalized AI, the design and capabilities of the Tongyi Xingchen personalized model, its multimodal and agent-based architecture, various industry use cases, and the safety and responsibility measures applied to ensure trustworthy AI deployment.

AI SafetyMultimodal AIlarge language models
0 likes · 13 min read
Tongyi Xingchen Personalized Large Model: Technical Overview and Applications
21CTO
21CTO
Jun 2, 2024 · Artificial Intelligence

Will OpenAI’s New Safety Team Really Secure ChatGPT?

OpenAI has created a new safety committee led by Sam Altman and board members, aiming to evaluate and improve safeguards while former researchers voice concerns about the company’s commitment to AI safety and ethics.

AI SafetyChatGPTOpenAI
0 likes · 6 min read
Will OpenAI’s New Safety Team Really Secure ChatGPT?
21CTO
21CTO
May 25, 2024 · Artificial Intelligence

Sam Altman Reveals GPT‑4o Vision, AI Safety, and the Future of AGI

Sam Altman’s hour‑long “All‑In” podcast interview unveils OpenAI’s latest GPT‑4o voice model, his bold vision for AGI, concerns about AI safety, the recent leadership shake‑up, and his ideas on universal access, regulation, and the transformative impact of conversational AI.

AGIAIAI Safety
0 likes · 9 min read
Sam Altman Reveals GPT‑4o Vision, AI Safety, and the Future of AGI
DevOps
DevOps
May 23, 2024 · Information Security

Guidelines for Evaluating Large Language Models in Cybersecurity Tasks

The article examines the opportunities and risks of applying large language models (LLMs) to cybersecurity, outlines fourteen practical recommendations for assessing their real‑world capabilities, and concludes with an invitation to the upcoming R&D Efficiency Conference covering AI, product management, and related topics.

AI SafetyLLMcybersecurity
0 likes · 11 min read
Guidelines for Evaluating Large Language Models in Cybersecurity Tasks
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
May 2, 2024 · Artificial Intelligence

Understanding Large Language Models: Principles, Training, Risks, and Application Security

This article provides a comprehensive overview of large language models (LLMs), explaining their core concepts, transformer architecture, training stages, known shortcomings such as hallucination and reversal curse, and highlights emerging security threats like prompt injection and jailbreaking, offering guidance for safe deployment.

AI SafetyLLMjailbreaking
0 likes · 21 min read
Understanding Large Language Models: Principles, Training, Risks, and Application Security
AntTech
AntTech
Apr 18, 2024 · Artificial Intelligence

WDTA Releases International Standards for Generative AI and Large Language Model Safety Testing at the 27th UN CSTD Annual Meeting

At the 27th UN CSTD Annual Meeting in Geneva, the World Digital Technology Academy unveiled two pioneering international standards—one for generative AI application security testing and another for large language model security testing—crafted by experts from leading AI firms to establish a new global benchmark for AI safety.

AI SafetyAnt GroupInternational Standards
0 likes · 8 min read
WDTA Releases International Standards for Generative AI and Large Language Model Safety Testing at the 27th UN CSTD Annual Meeting
21CTO
21CTO
Feb 22, 2024 · Artificial Intelligence

How Google’s Open‑Source Gemma Model Brings LLM Power to Your Laptop

Google’s newly released open‑source Gemma models let developers run powerful large‑language‑model workloads on notebooks, workstations, or cloud platforms, offering competitive performance, extensive tooling, and built‑in safety measures for responsible AI deployment.

AI SafetyGemmaGoogle AI
0 likes · 6 min read
How Google’s Open‑Source Gemma Model Brings LLM Power to Your Laptop
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Feb 18, 2024 · Artificial Intelligence

Llama 2: Open Foundation and Fine‑Tuned Chat Models – Overview and Technical Details

The article provides a comprehensive overview of Meta’s Llama 2 series, detailing model sizes, pre‑training data, architectural enhancements, supervised fine‑tuning, RLHF procedures, safety evaluations, reward‑model training, and iterative improvements, highlighting its open‑source release and comparative performance.

AI SafetyFine-tuningLlama2
0 likes · 27 min read
Llama 2: Open Foundation and Fine‑Tuned Chat Models – Overview and Technical Details
Architect
Architect
Feb 16, 2024 · Artificial Intelligence

Can OpenAI’s Sora Redefine Text‑to‑Video Generation? An In‑Depth Technical Review

OpenAI’s newly unveiled Sora model transforms short text prompts into up‑to‑one‑minute high‑definition videos, showcasing advanced diffusion‑Transformer architecture, improved occlusion handling, and detailed visual fidelity, while the article examines its technical breakthroughs, compares it to earlier models, and discusses emerging safety and misuse concerns.

AI SafetyOpenAISora
0 likes · 12 min read
Can OpenAI’s Sora Redefine Text‑to‑Video Generation? An In‑Depth Technical Review
IT Services Circle
IT Services Circle
Dec 24, 2023 · Artificial Intelligence

GPT‑4 “Lazy” Behavior: User Reports, Experiments, and Emerging Insights

The article examines growing complaints that GPT‑4 has become increasingly lazy and unpredictable since the November 6 developer update, discusses user‑generated workarounds, presents experimental findings on prompt phrasing and temperature effects, and cites recent academic studies highlighting the need for continuous large‑model monitoring.

AI SafetyGPT-4Temperature
0 likes · 6 min read
GPT‑4 “Lazy” Behavior: User Reports, Experiments, and Emerging Insights
Tencent Tech
Tencent Tech
Sep 20, 2023 · Artificial Intelligence

Why Do Large Language Models Hallucinate and How to Reduce It?

The article explains why large language models generate hallucinations—due to data errors, training conflicts, and inference uncertainty—and outlines data‑cleaning, model‑level feedback, knowledge augmentation, constraint techniques, and post‑processing methods such as the “Truth‑seeking” algorithm to mitigate the issue.

AI SafetyData QualityKnowledge Retrieval
0 likes · 8 min read
Why Do Large Language Models Hallucinate and How to Reduce It?
Programmer DD
Programmer DD
Jul 21, 2023 · Artificial Intelligence

Why Did GPT-4’s Performance Plummet Between March and June 2023?

A Stanford‑Berkeley study reveals that between March and June 2023 GPT‑4’s accuracy on prime‑checking fell from 97.6% to 2.4%, code generation quality dropped sharply, and sensitivity handling changed, underscoring the rapid, unpredictable shifts in large language model performance over short periods.

AI SafetyGPT-4LLM evaluation
0 likes · 6 min read
Why Did GPT-4’s Performance Plummet Between March and June 2023?
21CTO
21CTO
Jun 18, 2023 · Artificial Intelligence

Why Yann LeCun Says ChatGPT Isn’t Even as Smart as a Dog

Meta’s chief AI scientist Yann LeCun argues that large‑language models like ChatGPT are far from human intelligence, lacking real‑world understanding and even falling short of a dog’s cleverness, while experts debate AI’s risks, benefits, and the need for regulation.

AI SafetyChatGPTMeta
0 likes · 6 min read
Why Yann LeCun Says ChatGPT Isn’t Even as Smart as a Dog
21CTO
21CTO
May 9, 2023 · Artificial Intelligence

Geoffrey Hinton Warns: Why AI Could Outpace Humanity and What It Means

In a candid MIT Technology Review interview, AI pioneer Geoffrey Hinton discusses his departure from Google, the rapid progress of large language models like GPT‑4, the dangers of AI self‑motivation, and why halting AI development is unrealistic yet urgently needed.

AI SafetyAI riskBackpropagation
0 likes · 28 min read
Geoffrey Hinton Warns: Why AI Could Outpace Humanity and What It Means
21CTO
21CTO
May 4, 2023 · Artificial Intelligence

Why AI Pioneer Geoffrey Hinton Quit Google and What It Means for AI Safety

Geoffrey Hinton, the father of deep learning, left Google after a decade, warning that chatbots pose frightening risks, can be misused by malicious actors, and may eventually replace many professions, highlighting urgent concerns about misinformation and the long‑term existential threats of artificial intelligence.

AI SafetyChatbotsGeoffrey Hinton
0 likes · 8 min read
Why AI Pioneer Geoffrey Hinton Quit Google and What It Means for AI Safety
21CTO
21CTO
Apr 20, 2023 · Artificial Intelligence

Elon Musk’s TruthGPT: A New AI Challenger to OpenAI’s ChatGPT

Dissatisfied with OpenAI’s direction, Elon Musk has launched TruthGPT through his new X.AI lab, recruiting top AI talent to build a safer, more transparent large‑language model that could rival ChatGPT and reshape AI governance, funding, and potential applications such as Twitter’s search and advertising.

AI SafetyElon MuskOpenAI
0 likes · 8 min read
Elon Musk’s TruthGPT: A New AI Challenger to OpenAI’s ChatGPT
Python Programming Learning Circle
Python Programming Learning Circle
Apr 3, 2023 · Artificial Intelligence

Key Highlights of GPT‑4: Multimodal Capabilities, Benchmark Performance, and Future Implications

GPT‑4, the new multimodal AI model, can process images and text, generate code and natural language, achieve human‑level scores on standardized exams, handle up to 32 K tokens, and demonstrates advanced reasoning, while OpenAI emphasizes its safety improvements and current limitations as a still‑emerging technology.

AI SafetyGPT-4Multimodal AI
0 likes · 6 min read
Key Highlights of GPT‑4: Multimodal Capabilities, Benchmark Performance, and Future Implications