Tag

AI safety

0 views collected around this technical thread.

DataFunSummit
DataFunSummit
Jun 10, 2025 · Artificial Intelligence

How Quwan’s Kaitian Model Tackles Emotional AI for Social Apps – Architecture, Training Tricks, and Safety

Quwan Technology presents its Kaitian social large model, designed for personalized, emotionally rich, multimodal AI interactions, detailing its scene‑specific goals, CPT+SFT+RLHF training pipeline, data desensitization, LoRA fine‑tuning, evaluation methods, pruning, latency trade‑offs, safety mechanisms, and future feedback loops.

AI safetyLoRARLHF
0 likes · 13 min read
How Quwan’s Kaitian Model Tackles Emotional AI for Social Apps – Architecture, Training Tricks, and Safety
Kuaishou Tech
Kuaishou Tech
Jun 5, 2025 · Artificial Intelligence

7 Kuaishou AI Papers Accepted at ACL 2025: Video Understanding & Safe LLM Decoding

Kuaishou’s foundational large-model team has secured seven papers at ACL 2025, spanning alignment bias in training, safety defenses during inference, decoding strategies, fine-grained video-temporal understanding, reward fairness in RLHF, multimodal captioning benchmarks, and methods to curb hallucinations in vision-language models.

ACLAI safetybenchmark
0 likes · 13 min read
7 Kuaishou AI Papers Accepted at ACL 2025: Video Understanding & Safe LLM Decoding
AntTech
AntTech
May 30, 2025 · Artificial Intelligence

Insights from Ant Group’s 10th Technical Open Day: Multimodal, Embodied, and Future Model Architectures for AGI

The Ant Group’s 10th Technical Open Day gathered leading AI experts who examined the current state and future directions of multimodal large models, embodied AI, world models, transformer architectures, and vertical applications, offering a comprehensive view of the challenges and opportunities on the path toward AGI.

AGIAI safetyMultimodal Models
0 likes · 16 min read
Insights from Ant Group’s 10th Technical Open Day: Multimodal, Embodied, and Future Model Architectures for AGI
Tencent Technical Engineering
Tencent Technical Engineering
May 8, 2025 · Artificial Intelligence

Augment AI Programming Assistant: Technical Breakthroughs, Industry Impact, and Security Risks

Augment, a newly funded AI programming assistant that tops the SWE‑bench benchmark with a 65.4% score and a 200 k‑token context window, promises massive productivity gains for developers but also introduces sophisticated security threats such as malicious memory prompts, back‑door context injection, compromised guidelines, and risky multi‑task collaboration protocols, prompting calls for layered defenses and vigilant monitoring.

AI programmingAI safetyAugment
0 likes · 11 min read
Augment AI Programming Assistant: Technical Breakthroughs, Industry Impact, and Security Risks
Model Perspective
Model Perspective
Apr 7, 2025 · Artificial Intelligence

Why AI Alignment Matters: Ensuring Smart Systems Follow Human Intent

This article explores the multifaceted AI alignment challenge, detailing safety benchmarks such as toxicity, ethical, power‑seeking, and hallucination evaluations, and argues that responsible AI development requires technical safeguards, international governance, and a civilizational dialogue bridging philosophy and humanity.

AI alignmentAI governanceAI safety
0 likes · 12 min read
Why AI Alignment Matters: Ensuring Smart Systems Follow Human Intent
Cognitive Technology Team
Cognitive Technology Team
Apr 4, 2025 · Artificial Intelligence

Reasoning Models Do Not Always Reveal Their Thoughts: Evaluating Chain‑of‑Thought Fidelity

The article examines how modern reasoning models like Claude 3.7 Sonnet display chain‑of‑thought explanations, but often hide or distort their true reasoning, presenting challenges for AI safety and alignment, and evaluates methods to test and improve fidelity.

AI alignmentAI safetyChain-of-Thought
0 likes · 13 min read
Reasoning Models Do Not Always Reveal Their Thoughts: Evaluating Chain‑of‑Thought Fidelity
Cognitive Technology Team
Cognitive Technology Team
Apr 1, 2025 · Artificial Intelligence

Four‑Second Bloodshed: How Autonomous Driving Algorithms Turned a Fatal Accident

A March 2025 crash involving a Xiaomi‑branded autonomous vehicle illustrates how a four‑second algorithmic decision loop, inadequate night‑vision sensors, flawed handover timing, and poor emergency‑exit design combined to create a lethal scenario that exposes the deadly risks of over‑relying on L2 driver‑assist systems.

AI safetyHuman-Machine InteractionL2 driver assistance
0 likes · 4 min read
Four‑Second Bloodshed: How Autonomous Driving Algorithms Turned a Fatal Accident
Architecture and Beyond
Architecture and Beyond
Mar 15, 2025 · Information Security

Prompt Injection Attacks on Large Language Models: Risks, Types, and Defense Framework

This article explains how prompt injection attacks exploit large language models by altering their behavior through crafted inputs, outlines the major harms and attack categories—including direct, indirect, multimodal, code, and jailbreak attacks—and presents a comprehensive three‑layer defense framework covering input‑side, output‑side, and system‑level protections.

AI safetyLLM securityPrompt Injection
0 likes · 16 min read
Prompt Injection Attacks on Large Language Models: Risks, Types, and Defense Framework
DevOps
DevOps
Mar 10, 2025 · Artificial Intelligence

AI Policy, Safety, Industry Applications, and Talent Development Highlighted at China's 2024 Two Sessions

The 2024 Chinese Two Sessions emphasized artificial intelligence as a strategic priority, discussing AI safety regulations, industry applications, talent shortages, and policy proposals from leaders such as DeepSeek, Xiaomi, and academic experts, highlighting the drive to integrate AI across manufacturing, agriculture, healthcare, and education.

AI IndustryAI policyAI safety
0 likes · 11 min read
AI Policy, Safety, Industry Applications, and Talent Development Highlighted at China's 2024 Two Sessions
Code Mala Tang
Code Mala Tang
Feb 27, 2025 · Artificial Intelligence

Do New AI Reasoning Models Really Think? Unpacking the Debate

The article examines whether the latest AI models that claim to perform true reasoning—by breaking problems into steps and using chain‑of‑thought—actually reason like humans, presenting skeptical and supportive expert viewpoints, and offering practical guidance on how to use such models responsibly.

AI reasoningAI safetyChain-of-Thought
0 likes · 14 min read
Do New AI Reasoning Models Really Think? Unpacking the Debate
Architects' Tech Alliance
Architects' Tech Alliance
Feb 12, 2025 · Artificial Intelligence

DeepSeek‑V3 Training Efficiency, Knowledge Distillation, and the Risks of Synthetic Data

The article examines DeepSeek‑V3’s low‑cost training using 2048 H800 GPUs, explains how knowledge distillation and high‑quality data improve efficiency, discusses expert concerns about training on AI‑generated content, and outlines the limitations and ceiling effect of distillation techniques.

AI Training EfficiencyAI safetyDeepSeek-V3
0 likes · 7 min read
DeepSeek‑V3 Training Efficiency, Knowledge Distillation, and the Risks of Synthetic Data
Top Architect
Top Architect
Feb 1, 2025 · Artificial Intelligence

OpenAI Launches o3-mini: A Fast, Cost‑Effective AI Model Optimized for STEM Reasoning

OpenAI unveiled the o3-mini family—low, medium, and high variants—offering a cheaper, faster, and secure inference model that matches or exceeds the performance of its predecessor o1 across STEM, coding, and general knowledge benchmarks while introducing search integration and enhanced safety features.

AI modelAI safetyO3-mini
0 likes · 8 min read
OpenAI Launches o3-mini: A Fast, Cost‑Effective AI Model Optimized for STEM Reasoning
ZhongAn Tech Team
ZhongAn Tech Team
Jan 6, 2025 · Artificial Intelligence

Weekly Tech Digest: AI Breakthroughs, Robotics Trends, and Industry Shifts in Early 2025

This weekly technology digest highlights major industry developments, including OpenAI's 2025 product roadmap, DeepSeek's identity anomaly, Nvidia's robotics advancements, and Honor's IPO preparations, alongside expert perspectives on AI safety and market trends in operating systems and electric vehicles.

AI safetyArtificial IntelligenceConsumer Electronics
0 likes · 8 min read
Weekly Tech Digest: AI Breakthroughs, Robotics Trends, and Industry Shifts in Early 2025
DataFunTalk
DataFunTalk
Jan 5, 2025 · Artificial Intelligence

The Approaching Singularity: AI Automation, AGI Predictions, and Their Impact on Jobs and Society

The article examines how rapid advances in artificial intelligence are expected to automate nearly half of U.S. jobs within the next two decades, explores singularity forecasts for 2029‑2030, and discusses the profound economic, ethical, and security challenges that humanity must address before AI-driven autonomous systems reshape work, research, and daily life.

AGIAIAI safety
0 likes · 18 min read
The Approaching Singularity: AI Automation, AGI Predictions, and Their Impact on Jobs and Society
DataFunTalk
DataFunTalk
Dec 22, 2024 · Artificial Intelligence

Speech by Academician Sun Ninghui on the Development, Challenges, and Future of Artificial Intelligence and Intelligent Computing in China

The speech outlines the rapid rise of generative AI models, traces the historical evolution of computing technology, examines AI safety risks and regulatory responses, and proposes strategic pathways for China to advance intelligent computing through open, closed, or hybrid ecosystems while addressing talent, hardware, and cost challenges.

AI safetyArtificial IntelligenceChina
0 likes · 26 min read
Speech by Academician Sun Ninghui on the Development, Challenges, and Future of Artificial Intelligence and Intelligent Computing in China
DataFunTalk
DataFunTalk
Nov 11, 2024 · Artificial Intelligence

OpenAI VP Lilian Weng Departs and Shares Full AI Safety Talk Transcript

The article reports the departure of OpenAI research VP Lilian Weng, provides the full transcript of her recent AI safety and alignment presentation at a Bilibili event, and discusses broader concerns about OpenAI's safety culture, reinforcement learning from human feedback, and the importance of collective involvement in AI safety.

AI safetyOpenAIalignment
0 likes · 10 min read
OpenAI VP Lilian Weng Departs and Shares Full AI Safety Talk Transcript
Cognitive Technology Team
Cognitive Technology Team
Oct 16, 2024 · Artificial Intelligence

Large Language Models Lack Formal Reasoning Ability: Five Pieces of Evidence from the GSM‑Symbolic Benchmark

Recent research by Apple’s Iman Mirzadeh team introduces the GSM‑Symbolic benchmark, revealing that large language models, despite high scores on GSM8K, exhibit significant performance drops when problem numbers, names, or extra clauses change, indicating a lack of true formal reasoning ability.

AI safetyGSM‑Symbolicbenchmark
0 likes · 9 min read
Large Language Models Lack Formal Reasoning Ability: Five Pieces of Evidence from the GSM‑Symbolic Benchmark
AntTech
AntTech
Aug 12, 2024 · Artificial Intelligence

DKCF Trustworthy Framework for Large Model Applications and AI Security Practices

The article outlines the DKCF (Data‑Knowledge‑Collaboration‑Feedback) trustworthy framework presented at the 2024 Shanghai Cybersecurity Expo, detailing challenges of large AI models, four key trust factors, and Ant Group's practical security implementations for professional AI deployments.

AI safetyDKCFfeedback loops
0 likes · 10 min read
DKCF Trustworthy Framework for Large Model Applications and AI Security Practices
AntTech
AntTech
Jul 9, 2024 · Artificial Intelligence

2024 Large Model Security Practice Whitepaper Unveiled at the World AI Conference

The jointly authored 2024 Large Model Security Practice whitepaper, released at the World AI Conference, outlines a comprehensive safety framework covering security, reliability, and controllability, presents industry case studies, and proposes a five‑dimensional governance model to guide high‑quality development of large AI models.

AI safetyWhitepaperindustry practice
0 likes · 7 min read
2024 Large Model Security Practice Whitepaper Unveiled at the World AI Conference
JD Tech
JD Tech
Jun 28, 2024 · Artificial Intelligence

An Overview of Large Language Models: History, Fundamentals, Prompt Engineering, Retrieval‑Augmented Generation, Agents, and Multimodal AI

This article provides a comprehensive introduction to large language models, covering their historical development, core architecture, training process, prompt engineering techniques, Retrieval‑Augmented Generation, agent frameworks, multimodal capabilities, safety challenges, and future research directions.

AI agentsAI safetyRAG
0 likes · 22 min read
An Overview of Large Language Models: History, Fundamentals, Prompt Engineering, Retrieval‑Augmented Generation, Agents, and Multimodal AI