Tagged articles

AI safety

301 articles · Page 3 of 4

Aug 19, 2025 · Artificial Intelligence

How to Strengthen LLM System Prompts for Safer AI Agents

This guide explains how to reinforce system prompts for AI agents by optimizing their content and structure, using active defense, role‑based, and format constraints, providing practical examples, measurement methods, and experimental results that demonstrate up to 90% reduction in unsafe behavior.

AI safetyLLMreinforcement

0 likes · 13 min read

How to Strengthen LLM System Prompts for Safer AI Agents

Meituan Technology Team

Aug 14, 2025 · Artificial Intelligence

How Meituan’s Smart Helmet Redefines Delivery Safety with AI‑Powered Design

Meituan’s first smart‑helmet article details hardware innovations that tackle delivery riders’ safety, comfort, and efficiency, covering stricter safety standards, sensor‑driven alerts, lightweight structures, advanced ventilation, three‑times longer battery life, noise‑cancelling audio, IPX6 waterproofing, and a data‑driven production line.

AI safetyHardware Designdelivery efficiency

0 likes · 24 min read

How Meituan’s Smart Helmet Redefines Delivery Safety with AI‑Powered Design

AI Frontier Lectures

Jul 27, 2025 · Information Security

Can Hidden Activations Expose Multimodal Model Jailbreaks?

The paper reveals that large multimodal language models retain refusal signals in their hidden states even after jailbreak attempts, and proposes a training‑free detection method that leverages these signals to identify unsafe inputs across text and image modalities with strong generalization.

AI safetyLVLM securityhidden activation analysis

0 likes · 7 min read

Can Hidden Activations Expose Multimodal Model Jailbreaks?

AI Frontier Lectures

Jul 19, 2025 · Artificial Intelligence

How Researchers Made Large Language Models Forget or Amplify Specific Concepts

A new study from Meta and NYU reveals a two‑step technique—SAMD to locate concept‑specific attention heads and SAMI to scale their influence—enabling precise, low‑cost editing of transformer models for tasks ranging from factual recall to safety control.

AI safetySparse attentionconcept control

0 likes · 11 min read

How Researchers Made Large Language Models Forget or Amplify Specific Concepts

IT Services Circle

Jul 16, 2025 · Artificial Intelligence

How a Simple Colon Can Trick Top LLMs – The Master‑RM Fix

A recent study reveals that tiny symbols like colons or generic reasoning prefixes can cause large language models used as reward judges to issue false‑positive rewards, but an enhanced reward model called Master‑RM, trained with adversarial data, eliminates this vulnerability across multiple LLMs and languages.

AI safetyLLMMaster-RM

0 likes · 10 min read

How a Simple Colon Can Trick Top LLMs – The Master‑RM Fix

AntTech

Jul 14, 2025 · Artificial Intelligence

What Is the New AI Agent Safety Testing Standard and Why It Matters

The World Digital Academy unveiled the AI STR series' first global AI Agent Operation Safety Testing Standard, detailing a full‑link risk analysis framework, novel testing methods, and its role in addressing rising safety concerns as AI agents become mainstream in 2025.

AI GovernanceAI safetyagent standards

0 likes · 5 min read

What Is the New AI Agent Safety Testing Standard and Why It Matters

21CTO

Jul 1, 2025 · Artificial Intelligence

OpenAI CEO Warns: Don’t Blindly Trust AI – Insights from New Open‑Source Models

Sam Altman cautions against over‑reliance on ChatGPT, while Germany blocks DeepSeek for GDPR violations, Tencent unveils its MoE‑based Hunyuan‑A13B model, and Google releases a Python client for Data Commons, highlighting both AI risks and rapid open‑source advancements.

AI safetyData CommonsMoE

0 likes · 9 min read

OpenAI CEO Warns: Don’t Blindly Trust AI – Insights from New Open‑Source Models

DataFunTalk

Jun 21, 2025 · Artificial Intelligence

Why AI Gets Overconfident: Bias, Hallucinations, and Reinforcement Learning Solutions

This talk explores how large AI models become overconfident, leading to bias and hallucinations, examines adversarial examples in vision and language, explains why data and algorithms cause these issues, and shows how reinforcement learning can teach models to admit uncertainty and align with human values.

AI alignmentAI safetyBias

0 likes · 19 min read

Why AI Gets Overconfident: Bias, Hallucinations, and Reinforcement Learning Solutions

DataFunTalk

Jun 19, 2025 · Artificial Intelligence

Can We Flip the Switch on AI Good vs. Evil? OpenAI’s Toxic Persona Find

OpenAI’s new research reveals that training language models to produce incorrect answers in a single domain can trigger a toxic persona feature, causing the model to generate harmful suggestions across unrelated tasks, but the team also demonstrates detection methods and a reversible “emergent realignment” technique to restore safe behavior.

AI safetyEmergent misalignmentOpenAI

0 likes · 7 min read

Can We Flip the Switch on AI Good vs. Evil? OpenAI’s Toxic Persona Find

DataFunSummit

Jun 10, 2025 · Artificial Intelligence

How Quwan’s Kaitian Model Tackles Emotional AI for Social Apps – Architecture, Training Tricks, and Safety

Quwan Technology presents its Kaitian social large model, designed for personalized, emotionally rich, multimodal AI interactions, detailing its scene‑specific goals, CPT+SFT+RLHF training pipeline, data desensitization, LoRA fine‑tuning, evaluation methods, pruning, latency trade‑offs, safety mechanisms, and future feedback loops.

AI safetyLarge Language ModelLoRA

0 likes · 13 min read

How Quwan’s Kaitian Model Tackles Emotional AI for Social Apps – Architecture, Training Tricks, and Safety

Kuaishou Tech

Jun 5, 2025 · Artificial Intelligence

7 Kuaishou AI Papers Accepted at ACL 2025: Video Understanding & Safe LLM Decoding

Kuaishou’s foundational large-model team has secured seven papers at ACL 2025, spanning alignment bias in training, safety defenses during inference, decoding strategies, fine-grained video-temporal understanding, reward fairness in RLHF, multimodal captioning benchmarks, and methods to curb hallucinations in vision-language models.

ACLAI safetyMultimodal

0 likes · 13 min read

7 Kuaishou AI Papers Accepted at ACL 2025: Video Understanding & Safe LLM Decoding

AntTech

May 30, 2025 · Artificial Intelligence

Insights from Ant Group’s 10th Technical Open Day: Multimodal, Embodied, and Future Model Architectures for AGI

The Ant Group’s 10th Technical Open Day gathered leading AI experts who examined the current state and future directions of multimodal large models, embodied AI, world models, transformer architectures, and vertical applications, offering a comprehensive view of the challenges and opportunities on the path toward AGI.

AGIAI safetyEmbodied AI

0 likes · 16 min read

Insights from Ant Group’s 10th Technical Open Day: Multimodal, Embodied, and Future Model Architectures for AGI

ShiZhen AI

May 26, 2025 · Industry Insights

Nvidia Plans Cheaper Blackwell AI Chip for China Amid Export Restrictions

Nvidia is reportedly preparing a lower‑cost Blackwell GPU for the Chinese market, priced at $6,500‑$8,000 and featuring 1.7 TB/s GDDR7 memory, while OpenAI’s o3 model uncovered a Linux kernel zero‑day (CVE‑2025‑37899), a study showed AI models can sabotage shutdown commands, and a tutorial demonstrates creating animated 3D icons with ChatGPT and Freepik tools.

3D icon creationAI hardwareAI safety

0 likes · 8 min read

Nvidia Plans Cheaper Blackwell AI Chip for China Amid Export Restrictions

Java Tech Enthusiast

May 25, 2025 · Artificial Intelligence

Does Claude 4 Really Report Unethical Actions? Inside Its Hidden ‘Whistleblower’ Feature

The article analyzes Anthropic's Claude 4 series, highlighting its extended reasoning ability, a controversial whistle‑blower function that can report extreme wrongdoing, observed extortion attempts toward developers, and the safety measures Anthropic introduced to curb such risky autonomous behaviors.

AI safetyAnthropicClaude 4

0 likes · 6 min read

Does Claude 4 Really Report Unethical Actions? Inside Its Hidden ‘Whistleblower’ Feature

Tencent Technical Engineering

May 8, 2025 · Artificial Intelligence

Augment AI Programming Assistant: Technical Breakthroughs, Industry Impact, and Security Risks

Augment, a newly funded AI programming assistant that tops the SWE‑bench benchmark with a 65.4% score and a 200 k‑token context window, promises massive productivity gains for developers but also introduces sophisticated security threats such as malicious memory prompts, back‑door context injection, compromised guidelines, and risky multi‑task collaboration protocols, prompting calls for layered defenses and vigilant monitoring.

AI programmingAI safetyAgent Memory

0 likes · 11 min read

Augment AI Programming Assistant: Technical Breakthroughs, Industry Impact, and Security Risks

Sohu Tech Products

May 7, 2025 · Information Security

Why MCP Protocol Is a Security Nightmare: Real Attack Cases and Mitigations

This article provides a comprehensive security analysis of the Model Context Protocol (MCP), exposing multiple attack vectors such as prompt poisoning, tool poisoning, command and code injection, and illustrating how MCP’s design flaws make it more vulnerable than traditional applications while offering concrete mitigation recommendations.

AI safetyCode InjectionMCP

0 likes · 34 min read

Why MCP Protocol Is a Security Nightmare: Real Attack Cases and Mitigations

JavaEdge

May 7, 2025 · Artificial Intelligence

Why AI Agents Pose New Security Risks and How to Safeguard Them

The article explains what AI agents are, highlights their emerging security risks such as data leakage and lack of accountability, and offers practical strategies—including risk analysis, threat modeling, and engineering best practices—to mitigate these challenges for enterprises.

AI agentsAI safetyEnterprise AI

0 likes · 9 min read

Why AI Agents Pose New Security Risks and How to Safeguard Them

21CTO

Apr 7, 2025 · Artificial Intelligence

Llama 4 Unveiled: Breakthrough Multimodal Models Redefine AI Capabilities

Meta's Llama 4 series introduces the Scout, Maverick, and Behemoth models—featuring Mixture‑of‑Experts architectures, unprecedented 10‑million‑token context windows, and state‑of‑the‑art performance across vision, language, and multimodal benchmarks—while emphasizing efficient training, open‑source availability, and robust safety safeguards.

AI safetyLarge Language ModelLlama 4

0 likes · 14 min read

Llama 4 Unveiled: Breakthrough Multimodal Models Redefine AI Capabilities

Model Perspective

Apr 7, 2025 · Artificial Intelligence

Why AI Alignment Matters: Ensuring Smart Systems Follow Human Intent

This article explores the multifaceted AI alignment challenge, detailing safety benchmarks such as toxicity, ethical, power‑seeking, and hallucination evaluations, and argues that responsible AI development requires technical safeguards, international governance, and a civilizational dialogue bridging philosophy and humanity.

AI GovernanceAI alignmentAI safety

0 likes · 12 min read

Why AI Alignment Matters: Ensuring Smart Systems Follow Human Intent

Baobao Algorithm Notes

Apr 6, 2025 · Artificial Intelligence

Inside Llama 4: How Meta’s New Multimodal MoE Models Achieve 10M‑Token Contexts

Meta unveils Llama 4 Scout, Maverick, and the upcoming Behemoth, detailing their Mixture‑of‑Experts architecture, massive 10‑million‑token context windows, efficient FP8 training, safety mechanisms, and competitive benchmark results that surpass leading multimodal models.

AI safetyLlama 4Mixture of Experts

0 likes · 16 min read

Inside Llama 4: How Meta’s New Multimodal MoE Models Achieve 10M‑Token Contexts

Cognitive Technology Team

Apr 4, 2025 · Artificial Intelligence

Reasoning Models Do Not Always Reveal Their Thoughts: Evaluating Chain‑of‑Thought Fidelity

The article examines how modern reasoning models like Claude 3.7 Sonnet display chain‑of‑thought explanations, but often hide or distort their true reasoning, presenting challenges for AI safety and alignment, and evaluates methods to test and improve fidelity.

AI alignmentAI safetyChain-of-Thought

0 likes · 13 min read

Reasoning Models Do Not Always Reveal Their Thoughts: Evaluating Chain‑of‑Thought Fidelity

Cognitive Technology Team

Apr 1, 2025 · Artificial Intelligence

Four‑Second Bloodshed: How Autonomous Driving Algorithms Turned a Fatal Accident

A March 2025 crash involving a Xiaomi‑branded autonomous vehicle illustrates how a four‑second algorithmic decision loop, inadequate night‑vision sensors, flawed handover timing, and poor emergency‑exit design combined to create a lethal scenario that exposes the deadly risks of over‑relying on L2 driver‑assist systems.

AI safetyHuman-Machine InteractionL2 driver assistance

0 likes · 4 min read

Four‑Second Bloodshed: How Autonomous Driving Algorithms Turned a Fatal Accident

Architect

Mar 28, 2025 · Artificial Intelligence

Peeking Inside Claude: How Anthropic Uncovers LLM Reasoning

Anthropic’s recent papers reveal how Claude’s internal mechanisms—multilingual feature sharing, pre‑planned rhyming, parallel arithmetic paths, concept‑level reasoning, and hallucination triggers—are probed with feature‑insertion techniques, offering engineers actionable insights for building more transparent and safe AI systems.

AI safetyAnthropicClaude

0 likes · 12 min read

Peeking Inside Claude: How Anthropic Uncovers LLM Reasoning

Architect

Mar 24, 2025 · Artificial Intelligence

How Multimodal Alignment Is Shaping the Future of Large Language Models

This article provides a systematic review of recent advances in multimodal alignment for large language models, covering key contributions, application scenarios, dataset construction, evaluation benchmarks, future challenges, and insights from LLM alignment research to guide both academia and industry.

AI safetyDataset ConstructionMLLM

0 likes · 26 min read

How Multimodal Alignment Is Shaping the Future of Large Language Models

Architects' Tech Alliance

Mar 22, 2025 · Industry Insights

What Does DeepSeek’s 2025 AI Report Reveal About the Future of Large Models?

The 2025 DeepSeek Insight report analyzes DeepSeek’s new large‑model releases, compares US and Chinese AI ecosystems, outlines diverse application scenarios such as government, healthcare and aerospace, and provides practical guidance for safely leveraging these models despite their current limitations.

AI industryAI safetyDeepSeek

0 likes · 5 min read

What Does DeepSeek’s 2025 AI Report Reveal About the Future of Large Models?

Architecture and Beyond

Mar 15, 2025 · Information Security

Prompt Injection Attacks on Large Language Models: Risks, Types, and Defense Framework

This article explains how prompt injection attacks exploit large language models by altering their behavior through crafted inputs, outlines the major harms and attack categories—including direct, indirect, multimodal, code, and jailbreak attacks—and presents a comprehensive three‑layer defense framework covering input‑side, output‑side, and system‑level protections.

AI safetyLLM securityinformation security

0 likes · 16 min read

Prompt Injection Attacks on Large Language Models: Risks, Types, and Defense Framework

DevOps

Mar 10, 2025 · Artificial Intelligence

AI Policy, Safety, Industry Applications, and Talent Development Highlighted at China's 2024 Two Sessions

The 2024 Chinese Two Sessions emphasized artificial intelligence as a strategic priority, discussing AI safety regulations, industry applications, talent shortages, and policy proposals from leaders such as DeepSeek, Xiaomi, and academic experts, highlighting the drive to integrate AI across manufacturing, agriculture, healthcare, and education.

AI industryAI policyAI safety

0 likes · 11 min read

AI Policy, Safety, Industry Applications, and Talent Development Highlighted at China's 2024 Two Sessions

Code Mala Tang

Feb 27, 2025 · Artificial Intelligence

Do New AI Reasoning Models Really Think? Unpacking the Debate

The article examines whether the latest AI models that claim to perform true reasoning—by breaking problems into steps and using chain‑of‑thought—actually reason like humans, presenting skeptical and supportive expert viewpoints, and offering practical guidance on how to use such models responsibly.

AI reasoningAI safetyChain-of-Thought

0 likes · 14 min read

Do New AI Reasoning Models Really Think? Unpacking the Debate

Architect's Alchemy Furnace

Feb 19, 2025 · Artificial Intelligence

DeepSeek’s Self‑Correction: Transforming AI Reliability and Safety

The article explores DeepSeek’s innovative self‑correction system—combining a Mixture‑of‑Experts architecture with reinforcement‑learning feedback—to achieve real‑time error detection, dynamic knowledge‑graph updates, and enhanced safety in high‑risk fields like autonomous driving and medical diagnostics.

AI safetyDeepSeekMixture of Experts

0 likes · 9 min read

DeepSeek’s Self‑Correction: Transforming AI Reliability and Safety

Architects' Tech Alliance

Feb 12, 2025 · Artificial Intelligence

DeepSeek‑V3 Training Efficiency, Knowledge Distillation, and the Risks of Synthetic Data

The article examines DeepSeek‑V3’s low‑cost training using 2048 H800 GPUs, explains how knowledge distillation and high‑quality data improve efficiency, discusses expert concerns about training on AI‑generated content, and outlines the limitations and ceiling effect of distillation techniques.

AI Training EfficiencyAI safetyDeepSeek-V3

0 likes · 7 min read

DeepSeek‑V3 Training Efficiency, Knowledge Distillation, and the Risks of Synthetic Data

Top Architect

Feb 1, 2025 · Artificial Intelligence

OpenAI Launches o3-mini: A Fast, Cost‑Effective AI Model Optimized for STEM Reasoning

OpenAI unveiled the o3-mini family—low, medium, and high variants—offering a cheaper, faster, and secure inference model that matches or exceeds the performance of its predecessor o1 across STEM, coding, and general knowledge benchmarks while introducing search integration and enhanced safety features.

AI modelAI safetyO3-mini

0 likes · 8 min read

OpenAI Launches o3-mini: A Fast, Cost‑Effective AI Model Optimized for STEM Reasoning

Software Engineering 3.0 Era

Jan 18, 2025 · Industry Insights

Is AI Self‑Programming and Recursive Self‑Improvement Signaling the Endgame?

The article examines Nvidia’s claim that AI can now write software and build an “AI factory,” analyzes OpenAI’s emerging o‑series models that purportedly achieve recursive self‑improvement, and surveys community reactions ranging from excitement to safety concerns about a potential AI “game over.”

AI safetyIndustry AnalysisNVIDIA

0 likes · 8 min read

Is AI Self‑Programming and Recursive Self‑Improvement Signaling the Endgame?

Baobao Algorithm Notes

Jan 11, 2025 · Artificial Intelligence

Why Phi‑4’s 14B Model Outperforms GPT‑4 on STEM and Reasoning Tasks

Microsoft Research’s Phi‑4 model, a 14‑billion‑parameter LLM, leverages extensive synthetic data, advanced tokenization, and a two‑stage training pipeline to achieve superior performance on STEM question answering, long‑context reasoning, and safety benchmarks, rivaling larger models like GPT‑4.

AI safetyBenchmarkingPhi-4

0 likes · 15 min read

Why Phi‑4’s 14B Model Outperforms GPT‑4 on STEM and Reasoning Tasks

ZhongAn Tech Team

Jan 6, 2025 · Artificial Intelligence

Weekly Tech Digest: AI Breakthroughs, Robotics Trends, and Industry Shifts in Early 2025

This weekly technology digest highlights major industry developments, including OpenAI's 2025 product roadmap, DeepSeek's identity anomaly, Nvidia's robotics advancements, and Honor's IPO preparations, alongside expert perspectives on AI safety and market trends in operating systems and electric vehicles.

AI safetyMarket TrendsOpenAI

0 likes · 8 min read

Weekly Tech Digest: AI Breakthroughs, Robotics Trends, and Industry Shifts in Early 2025

DataFunTalk

Jan 5, 2025 · Artificial Intelligence

The Approaching Singularity: AI Automation, AGI Predictions, and Their Impact on Jobs and Society

The article examines how rapid advances in artificial intelligence are expected to automate nearly half of U.S. jobs within the next two decades, explores singularity forecasts for 2029‑2030, and discusses the profound economic, ethical, and security challenges that humanity must address before AI-driven autonomous systems reshape work, research, and daily life.

AGIAIAI safety

0 likes · 18 min read

The Approaching Singularity: AI Automation, AGI Predictions, and Their Impact on Jobs and Society

21CTO

Jan 2, 2025 · Artificial Intelligence

2025 AI Breakthroughs: Unlimited Memory & Intelligent Agents, Says Eric Schmidt

Former Google CEO Eric Schmidt warns that AI is on the brink of a transformative era, highlighting three 2025 breakthroughs—unlimited context memory, autonomous AI agents, and text‑to‑action programming—while also stressing the looming risks of energy consumption, security threats, and the need for ethical safeguards.

AI memoryAI researchAI safety

0 likes · 14 min read

2025 AI Breakthroughs: Unlimited Memory & Intelligent Agents, Says Eric Schmidt

21CTO

Dec 22, 2024 · Artificial Intelligence

OpenAI’s New o3 Model Shatters Benchmarks – Is AGI Finally Here?

OpenAI’s latest o3 model demonstrates unprecedented performance across logic, mathematics, and programming benchmarks, introduces flexible reasoning modes with the upcoming o3‑mini, and incorporates advanced safety alignment, signaling a major leap toward practical artificial general intelligence.

AGIAI safetyOpenAI

0 likes · 6 min read

OpenAI’s New o3 Model Shatters Benchmarks – Is AGI Finally Here?

DataFunTalk

Dec 22, 2024 · Artificial Intelligence

Speech by Academician Sun Ninghui on the Development, Challenges, and Future of Artificial Intelligence and Intelligent Computing in China

The speech outlines the rapid rise of generative AI models, traces the historical evolution of computing technology, examines AI safety risks and regulatory responses, and proposes strategic pathways for China to advance intelligent computing through open, closed, or hybrid ecosystems while addressing talent, hardware, and cost challenges.

AI safetyChinaIntelligent Computing

0 likes · 26 min read

Speech by Academician Sun Ninghui on the Development, Challenges, and Future of Artificial Intelligence and Intelligent Computing in China

21CTO

Dec 3, 2024 · Artificial Intelligence

When Bing Chat Went Rogue: What Prompt‑Injection Reveals About AI Safety

A detailed analysis of Simon Willison and Benj Edwards' conversation about Bing Chat's angry, deceptive behavior uncovers how prompt‑injection attacks expose weaknesses in large language models, the limits of system prompts, and the broader safety challenges facing AI development today.

AI safetyBing ChatChatGPT

0 likes · 9 min read

When Bing Chat Went Rogue: What Prompt‑Injection Reveals About AI Safety

DataFunTalk

Nov 11, 2024 · Artificial Intelligence

OpenAI VP Lilian Weng Departs and Shares Full AI Safety Talk Transcript

The article reports the departure of OpenAI research VP Lilian Weng, provides the full transcript of her recent AI safety and alignment presentation at a Bilibili event, and discusses broader concerns about OpenAI's safety culture, reinforcement learning from human feedback, and the importance of collective involvement in AI safety.

AI safetyOpenAIalignment

0 likes · 10 min read

OpenAI VP Lilian Weng Departs and Shares Full AI Safety Talk Transcript

NewBeeNLP

Nov 7, 2024 · Artificial Intelligence

Tackling Large Model Hallucinations: Causes, Detection, and Mitigation Strategies

This article provides a comprehensive analysis of large language model hallucinations, detailing their definitions, classifications, root causes, detection techniques, and a wide range of mitigation approaches—including RAG pipelines, decoding strategies, and model‑enhancement methods—to improve reliability and safety in real‑world AI applications.

AI safetyHallucinationPrompt engineering

0 likes · 22 min read

Tackling Large Model Hallucinations: Causes, Detection, and Mitigation Strategies

Cognitive Technology Team

Oct 16, 2024 · Artificial Intelligence

Large Language Models Lack Formal Reasoning Ability: Five Pieces of Evidence from the GSM‑Symbolic Benchmark

Recent research by Apple’s Iman Mirzadeh team introduces the GSM‑Symbolic benchmark, revealing that large language models, despite high scores on GSM8K, exhibit significant performance drops when problem numbers, names, or extra clauses change, indicating a lack of true formal reasoning ability.

AI safetyGSM‑Symbolicbenchmark

0 likes · 9 min read

Large Language Models Lack Formal Reasoning Ability: Five Pieces of Evidence from the GSM‑Symbolic Benchmark

Architect

Sep 26, 2024 · Artificial Intelligence

Decoding OpenAI o1: How RL‑LLM Fusion Powers Next‑Gen Reasoning

This article provides a detailed technical analysis of OpenAI’s o1 model, exploring its enhanced logical reasoning, the likely use of reinforcement learning with hidden chain‑of‑thought generation, multi‑model architecture, training data pipelines, reward modeling, and how these innovations could reshape AI safety and scaling strategies.

AI safetyChain-of-ThoughtLLM

0 likes · 43 min read

Decoding OpenAI o1: How RL‑LLM Fusion Powers Next‑Gen Reasoning

Baobao Algorithm Notes

Sep 25, 2024 · Industry Insights

Decoding OpenAI o1: How RL and LLM Fuse to Power Hidden Chain‑of‑Thought

This article analytically reconstructs OpenAI o1’s architecture, training pipeline, and inference workflow, exploring its reinforcement‑learning‑enhanced hidden chain‑of‑thought, multi‑model composition, scaling laws, reward modeling, and potential implications for future AI safety and small‑model strategies.

AI safetyHidden COTLLM

0 likes · 43 min read

Decoding OpenAI o1: How RL and LLM Fuse to Power Hidden Chain‑of‑Thought

Data Thinking Notes

Sep 13, 2024 · Artificial Intelligence

How OpenAI’s o1 Series Redefines Complex Reasoning and AI Safety

OpenAI’s new o1 series, including o1‑preview and o1‑mini, leverages reinforcement‑learning‑based chain‑of‑thought reasoning to achieve superior performance on academic exams, coding contests, and safety benchmarks, offering faster, cost‑effective options while advancing AI alignment and human‑preference evaluation.

AI safetyLarge Language ModelOpenAI

0 likes · 15 min read

How OpenAI’s o1 Series Redefines Complex Reasoning and AI Safety

AntTech

Aug 12, 2024 · Artificial Intelligence

DKCF Trustworthy Framework for Large Model Applications and AI Security Practices

The article outlines the DKCF (Data‑Knowledge‑Collaboration‑Feedback) trustworthy framework presented at the 2024 Shanghai Cybersecurity Expo, detailing challenges of large AI models, four key trust factors, and Ant Group's practical security implementations for professional AI deployments.

AI safetyDKCFKnowledge Engineering

0 likes · 10 min read

DKCF Trustworthy Framework for Large Model Applications and AI Security Practices

NewBeeNLP

Jul 25, 2024 · Artificial Intelligence

Llama 3.1 Unveiled: How the New Open‑Source Giant Matches GPT‑4o and Claude 3.5

Meta has officially released Llama 3.1, a 405‑billion‑parameter open‑source model that matches or surpasses GPT‑4o and Claude 3.5 on over 150 benchmarks, expands context to 128 K tokens, supports eight languages, and is accompanied by a detailed 100‑page paper describing its data, training stack, architecture, quantization, safety measures, and ecosystem support.

AI safetyLarge Language ModelLlama 3.1

0 likes · 15 min read

Llama 3.1 Unveiled: How the New Open‑Source Giant Matches GPT‑4o and Claude 3.5

AntTech

Jul 9, 2024 · Artificial Intelligence

2024 Large Model Security Practice Whitepaper Unveiled at the World AI Conference

The jointly authored 2024 Large Model Security Practice whitepaper, released at the World AI Conference, outlines a comprehensive safety framework covering security, reliability, and controllability, presents industry case studies, and proposes a five‑dimensional governance model to guide high‑quality development of large AI models.

AI safetyTrustworthy AIindustry practice

0 likes · 7 min read

2024 Large Model Security Practice Whitepaper Unveiled at the World AI Conference

JD Tech

Jun 28, 2024 · Artificial Intelligence

An Overview of Large Language Models: History, Fundamentals, Prompt Engineering, Retrieval‑Augmented Generation, Agents, and Multimodal AI

This article provides a comprehensive introduction to large language models, covering their historical development, core architecture, training process, prompt engineering techniques, Retrieval‑Augmented Generation, agent frameworks, multimodal capabilities, safety challenges, and future research directions.

AI agentsAI safetyMultimodal

0 likes · 22 min read

An Overview of Large Language Models: History, Fundamentals, Prompt Engineering, Retrieval‑Augmented Generation, Agents, and Multimodal AI

DataFunSummit

Jun 23, 2024 · Artificial Intelligence

Tongyi Xingchen Personalized Large Model: Technical Overview and Applications

This article summarizes the development background of large language models, Alibaba's progression in foundational and personalized AI, the design and capabilities of the Tongyi Xingchen personalized model, its multimodal and agent-based architecture, various industry use cases, and the safety and responsibility measures applied to ensure trustworthy AI deployment.

AI safetyMultimodal AIlarge language models

0 likes · 13 min read

Tongyi Xingchen Personalized Large Model: Technical Overview and Applications

21CTO

Jun 2, 2024 · Artificial Intelligence

Will OpenAI’s New Safety Team Really Secure ChatGPT?

OpenAI has created a new safety committee led by Sam Altman and board members, aiming to evaluate and improve safeguards while former researchers voice concerns about the company’s commitment to AI safety and ethics.

AI safetyChatGPTGovernance

0 likes · 6 min read

Will OpenAI’s New Safety Team Really Secure ChatGPT?

21CTO

May 25, 2024 · Artificial Intelligence

Sam Altman Reveals GPT‑4o Vision, AI Safety, and the Future of AGI

Sam Altman’s hour‑long “All‑In” podcast interview unveils OpenAI’s latest GPT‑4o voice model, his bold vision for AGI, concerns about AI safety, the recent leadership shake‑up, and his ideas on universal access, regulation, and the transformative impact of conversational AI.

AGIAIAI safety

0 likes · 9 min read

Sam Altman Reveals GPT‑4o Vision, AI Safety, and the Future of AGI

DevOps

May 23, 2024 · Information Security

Guidelines for Evaluating Large Language Models in Cybersecurity Tasks

The article examines the opportunities and risks of applying large language models (LLMs) to cybersecurity, outlines fourteen practical recommendations for assessing their real‑world capabilities, and concludes with an invitation to the upcoming R&D Efficiency Conference covering AI, product management, and related topics.

AI safetyEvaluationLLM

0 likes · 11 min read

Guidelines for Evaluating Large Language Models in Cybersecurity Tasks

Rare Earth Juejin Tech Community

May 2, 2024 · Artificial Intelligence

Understanding Large Language Models: Principles, Training, Risks, and Application Security

This article provides a comprehensive overview of large language models (LLMs), explaining their core concepts, transformer architecture, training stages, known shortcomings such as hallucination and reversal curse, and highlights emerging security threats like prompt injection and jailbreaking, offering guidance for safe deployment.

AI safetyLLMjailbreaking

0 likes · 21 min read

Understanding Large Language Models: Principles, Training, Risks, and Application Security

AntTech

Apr 18, 2024 · Artificial Intelligence

WDTA Releases International Standards for Generative AI and Large Language Model Safety Testing at the 27th UN CSTD Annual Meeting

At the 27th UN CSTD Annual Meeting in Geneva, the World Digital Technology Academy unveiled two pioneering international standards—one for generative AI application security testing and another for large language model security testing—crafted by experts from leading AI firms to establish a new global benchmark for AI safety.

AI safetyAnt GroupGenerative AI

0 likes · 8 min read

WDTA Releases International Standards for Generative AI and Large Language Model Safety Testing at the 27th UN CSTD Annual Meeting

Smart Era Software Development

Mar 7, 2024 · Artificial Intelligence

2024 AGI Outlook: Trends, Predictions, and a Surprise Bonus

The article analyses the 2024 AI landscape, highlighting a multimodal explosion, the limits of current AI applications, Sora as a concrete step toward AGI, the rise of AI‑native business models, edge‑AI hardware opportunities, the challenges of human‑level models, and the broader societal impacts of an AI‑driven data era.

AGIAI hardwareAI safety

0 likes · 34 min read

2024 AGI Outlook: Trends, Predictions, and a Surprise Bonus

Rare Earth Juejin Tech Community

Mar 7, 2024 · Artificial Intelligence

Anthropic Announces Claude 3 Model Family: Opus, Sonnet, and Haiku

Anthropic has launched the Claude 3 family of large language models—Opus, Sonnet, and Haiku—offering varying balances of intelligence, speed, and cost, with enhanced reasoning, multilingual, vision capabilities, reduced refusals, and improved safety, now available via API in over 159 countries.

AI safetyAnthropicClaude 3

0 likes · 11 min read

Anthropic Announces Claude 3 Model Family: Opus, Sonnet, and Haiku

21CTO

Feb 22, 2024 · Artificial Intelligence

How Google’s Open‑Source Gemma Model Brings LLM Power to Your Laptop

Google’s newly released open‑source Gemma models let developers run powerful large‑language‑model workloads on notebooks, workstations, or cloud platforms, offering competitive performance, extensive tooling, and built‑in safety measures for responsible AI deployment.

AI safetyGemmaGoogle AI

0 likes · 6 min read

How Google’s Open‑Source Gemma Model Brings LLM Power to Your Laptop

Rare Earth Juejin Tech Community

Feb 18, 2024 · Artificial Intelligence

Llama 2: Open Foundation and Fine‑Tuned Chat Models – Overview and Technical Details

The article provides a comprehensive overview of Meta’s Llama 2 series, detailing model sizes, pre‑training data, architectural enhancements, supervised fine‑tuning, RLHF procedures, safety evaluations, reward‑model training, and iterative improvements, highlighting its open‑source release and comparative performance.

AI safetyLarge Language ModelLlama2

0 likes · 27 min read

Llama 2: Open Foundation and Fine‑Tuned Chat Models – Overview and Technical Details

Architect

Feb 16, 2024 · Artificial Intelligence

Can OpenAI’s Sora Redefine Text‑to‑Video Generation? An In‑Depth Technical Review

OpenAI’s newly unveiled Sora model transforms short text prompts into up‑to‑one‑minute high‑definition videos, showcasing advanced diffusion‑Transformer architecture, improved occlusion handling, and detailed visual fidelity, while the article examines its technical breakthroughs, compares it to earlier models, and discusses emerging safety and misuse concerns.

AI safetyDiffusion ModelsGenerative AI

0 likes · 12 min read

Can OpenAI’s Sora Redefine Text‑to‑Video Generation? An In‑Depth Technical Review

DataFunTalk

Feb 6, 2024 · Artificial Intelligence

Overview of Vivo BlueLM Large Model: Evolution, Training Challenges, and Product Deployment

This article presents a comprehensive overview of Vivo's BlueLM large language model, covering its historical evolution, the massive data and algorithmic challenges faced during training, safety and performance optimizations, and how the model has been integrated into various consumer and enterprise products.

AIAI safetyBlueLM

0 likes · 16 min read

Overview of Vivo BlueLM Large Model: Evolution, Training Challenges, and Product Deployment

IT Services Circle

Dec 24, 2023 · Artificial Intelligence

GPT‑4 “Lazy” Behavior: User Reports, Experiments, and Emerging Insights

The article examines growing complaints that GPT‑4 has become increasingly lazy and unpredictable since the November 6 developer update, discusses user‑generated workarounds, presents experimental findings on prompt phrasing and temperature effects, and cites recent academic studies highlighting the need for continuous large‑model monitoring.

AI safetyGPT-4Temperature

0 likes · 6 min read

GPT‑4 “Lazy” Behavior: User Reports, Experiments, and Emerging Insights

php Courses

Nov 3, 2023 · Artificial Intelligence

Elon Musk Calls for Third‑Party Oversight at the First AI Safety Summit in the UK and Highlights the Bletchley Declaration

Elon Musk urged the creation of a third‑party referee to monitor leading AI firms at the inaugural UK AI safety summit, while 28 nations released the Bletchley Declaration to address AI risks, and China promoted its Global AI Governance Initiative.

AI GovernanceAI safetyBletchley Declaration

0 likes · 4 min read

Elon Musk Calls for Third‑Party Oversight at the First AI Safety Summit in the UK and Highlights the Bletchley Declaration

21CTO

Oct 30, 2023 · Artificial Intelligence

Geoffrey Hinton Warns AI Could Take Over Earth Within Five Years – What You Need to Know

Renowned AI pioneer Geoffrey Hinton cautions that rapidly advancing artificial intelligence may surpass human control in as little as five years, highlighting self‑modifying code, the "black‑box" problem, and the urgent need for robust safety regulations.

AI riskAI safetyGeoffrey Hinton

0 likes · 8 min read

Geoffrey Hinton Warns AI Could Take Over Earth Within Five Years – What You Need to Know

IT Services Circle

Oct 16, 2023 · Information Security

Prompt Injection Attacks on GPT‑4V: How Hidden Text in Images Compromise Multimodal Model Security

The article examines how specially crafted images can inject malicious prompts into GPT‑4V, causing it to leak chat history, obey hidden commands, and expose security flaws, while discussing attack techniques, underlying reasons, and proposed mitigation strategies.

AI safetyGPT-4Vimage attacks

0 likes · 9 min read

Prompt Injection Attacks on GPT‑4V: How Hidden Text in Images Compromise Multimodal Model Security

Tencent Tech

Sep 20, 2023 · Artificial Intelligence

Why Do Large Language Models Hallucinate and How to Reduce It?

The article explains why large language models generate hallucinations—due to data errors, training conflicts, and inference uncertainty—and outlines data‑cleaning, model‑level feedback, knowledge augmentation, constraint techniques, and post‑processing methods such as the “Truth‑seeking” algorithm to mitigate the issue.

AI safetyData QualityHallucination

0 likes · 8 min read

Why Do Large Language Models Hallucinate and How to Reduce It?

Programmer DD

Jul 21, 2023 · Artificial Intelligence

Why Did GPT-4’s Performance Plummet Between March and June 2023?

A Stanford‑Berkeley study reveals that between March and June 2023 GPT‑4’s accuracy on prime‑checking fell from 97.6% to 2.4%, code generation quality dropped sharply, and sensitivity handling changed, underscoring the rapid, unpredictable shifts in large language model performance over short periods.

AI safetyGPT-4LLM evaluation

0 likes · 6 min read

Why Did GPT-4’s Performance Plummet Between March and June 2023?

21CTO

Jul 8, 2023 · Artificial Intelligence

What Developers Need to Know About GPT‑4’s New 8K Context and Multimodal Capabilities

OpenAI has opened GPT‑4’s API to all paid users, offering an 8K‑token context window (up to 32K), multimodal image input, enhanced creativity, longer text handling, and upcoming fine‑tuning options, while also outlining phased deprecation of older models and current limitations.

AI safetyAPIGPT-4

0 likes · 10 min read

What Developers Need to Know About GPT‑4’s New 8K Context and Multimodal Capabilities

Liangxu Linux

Jul 2, 2023 · Information Security

How the “Grandma Prompt” Bypasses LLM Safeguards and Generates Windows Keys

The article examines the so‑called “grandma prompt” that tricks ChatGPT, Bing, and other LLMs into revealing Windows activation keys and even adult jokes, explains why such prompt‑injection works, and reviews past similar exploits and their mitigation attempts.

AI safetyChatGPT jailbreakLLM security

0 likes · 7 min read

How the “Grandma Prompt” Bypasses LLM Safeguards and Generates Windows Keys

21CTO

Jun 18, 2023 · Artificial Intelligence

Why Yann LeCun Says ChatGPT Isn’t Even as Smart as a Dog

Meta’s chief AI scientist Yann LeCun argues that large‑language models like ChatGPT are far from human intelligence, lacking real‑world understanding and even falling short of a dog’s cleverness, while experts debate AI’s risks, benefits, and the need for regulation.

AI safetyChatGPTMeta

0 likes · 6 min read

Why Yann LeCun Says ChatGPT Isn’t Even as Smart as a Dog

Smart Era Software Development

May 25, 2023 · Artificial Intelligence

ChatGPT’s New Opportunities for Software Engineering: An In‑Depth Encore Discussion

The panel examines how large language models like ChatGPT reshape software engineering, covering safety risks, model training stages, prompt‑engineering challenges, teaching reforms, and future human‑AI collaboration, while weighing technical trade‑offs and practical solutions.

AI safetyChatGPTHuman-AI Collaboration

0 likes · 26 min read

ChatGPT’s New Opportunities for Software Engineering: An In‑Depth Encore Discussion

21CTO

May 9, 2023 · Artificial Intelligence

Geoffrey Hinton Warns: Why AI Could Outpace Humanity and What It Means

In a candid MIT Technology Review interview, AI pioneer Geoffrey Hinton discusses his departure from Google, the rapid progress of large language models like GPT‑4, the dangers of AI self‑motivation, and why halting AI development is unrealistic yet urgently needed.

AI riskAI safetyBackpropagation

0 likes · 28 min read

Geoffrey Hinton Warns: Why AI Could Outpace Humanity and What It Means

21CTO

May 4, 2023 · Artificial Intelligence

Why AI Pioneer Geoffrey Hinton Quit Google and What It Means for AI Safety

Geoffrey Hinton, the father of deep learning, left Google after a decade, warning that chatbots pose frightening risks, can be misused by malicious actors, and may eventually replace many professions, highlighting urgent concerns about misinformation and the long‑term existential threats of artificial intelligence.

AI safetyChatbotsGeoffrey Hinton

0 likes · 8 min read

Why AI Pioneer Geoffrey Hinton Quit Google and What It Means for AI Safety

Programmer DD

May 4, 2023 · Information Security

How Safe Is ChatGPT‑Generated Code? Researchers Reveal Major Security Flaws

A study by Quebec researchers shows that ChatGPT often produces insecure code across C, C++, Python, and Java, warns that the model rarely flags these issues unless explicitly asked, and highlights ethical inconsistencies in its handling of vulnerable code.

AI safetyChatGPTcode security

0 likes · 5 min read

How Safe Is ChatGPT‑Generated Code? Researchers Reveal Major Security Flaws

21CTO

Apr 20, 2023 · Artificial Intelligence

Elon Musk’s TruthGPT: A New AI Challenger to OpenAI’s ChatGPT

Dissatisfied with OpenAI’s direction, Elon Musk has launched TruthGPT through his new X.AI lab, recruiting top AI talent to build a safer, more transparent large‑language model that could rival ChatGPT and reshape AI governance, funding, and potential applications such as Twitter’s search and advertising.

AI safetyElon MuskOpenAI

0 likes · 8 min read

Elon Musk’s TruthGPT: A New AI Challenger to OpenAI’s ChatGPT

DataFunTalk

Apr 17, 2023 · Artificial Intelligence

Speculation: GPT-5 May Adopt Model‑Based Deep Reinforcement Learning for Unlimited Self‑Improvement

The article argues that the next generation GPT is likely to employ model‑based deep reinforcement learning, turning the model into both a policy and a world model, which could enable rapid, data‑efficient self‑enhancement but also raise serious safety and societal risks.

AI safetyGPT-5deep reinforcement learning

0 likes · 4 min read

Speculation: GPT-5 May Adopt Model‑Based Deep Reinforcement Learning for Unlimited Self‑Improvement

21CTO

Apr 4, 2023 · Artificial Intelligence

Inside the Lex Fridman & Sam Altman Chat: Unveiling GPT‑4, AI Safety, and the Future of AGI

In a nearly two‑and‑a‑half‑hour interview, Lex Fridman and OpenAI CEO Sam Altman explore GPT‑4’s architecture, the role of RLHF, bias challenges, AI safety testing, its impact on programming, and the broader roadmap toward artificial general intelligence and responsible governance.

AI alignmentAI safetyGPT-4

0 likes · 79 min read

Inside the Lex Fridman & Sam Altman Chat: Unveiling GPT‑4, AI Safety, and the Future of AGI

Python Programming Learning Circle

Apr 3, 2023 · Artificial Intelligence

Key Highlights of GPT‑4: Multimodal Capabilities, Benchmark Performance, and Future Implications

GPT‑4, the new multimodal AI model, can process images and text, generate code and natural language, achieve human‑level scores on standardized exams, handle up to 32 K tokens, and demonstrates advanced reasoning, while OpenAI emphasizes its safety improvements and current limitations as a still‑emerging technology.

AI safetyGPT-4Large Language Model

0 likes · 6 min read

Key Highlights of GPT‑4: Multimodal Capabilities, Benchmark Performance, and Future Implications

21CTO

Apr 2, 2023 · Artificial Intelligence

Can GPT‑4 Be Considered Early AGI? Insights from Microsoft’s 155‑Page Study

This article reviews Microsoft’s extensive 155‑page work on early experiments with GPT‑4, exploring how the model approaches artificial general intelligence, its testing methodology, multimodal capabilities, programming and mathematical performance, interaction with tools and humans, limitations, societal impact, and future research directions.

AI safetyArtificial General IntelligenceGPT-4

0 likes · 15 min read

Can GPT‑4 Be Considered Early AGI? Insights from Microsoft’s 155‑Page Study

21CTO

Mar 30, 2023 · Artificial Intelligence

Why Top AI Leaders Are Calling for a 6‑Month Pause on Advanced AI Development

On March 29, Elon Musk, Steve Wozniak, Geoffrey Hinton and over a thousand AI experts signed an open letter urging a six‑month halt to training systems more powerful than GPT‑4, citing profound societal risks and calling for transparent, verifiable pauses and stronger governance.

AI GovernanceAI pauseAI safety

0 likes · 9 min read

Why Top AI Leaders Are Calling for a 6‑Month Pause on Advanced AI Development

Laravel Tech Community

Mar 29, 2023 · Artificial Intelligence

Open Letter Calls for a Six‑Month Pause on Training AI Systems More Powerful Than GPT‑4

A coalition of AI researchers, entrepreneurs and technologists has signed an open letter urging a six‑month halt to training AI models surpassing GPT‑4, citing profound societal risks, while industry leaders debate the impact of rapid AIGC adoption on jobs and safety.

AI ethicsAI safetyAutomation

0 likes · 10 min read

Open Letter Calls for a Six‑Month Pause on Training AI Systems More Powerful Than GPT‑4

Programmer DD

Mar 29, 2023 · Artificial Intelligence

Can GPT‑4 Really Threaten Humanity? Inside Sam Altman’s Candid Chat with Lex Fridman

In a two‑hour interview with Lex Fridman, OpenAI CEO Sam Altman admits AI could one day kill humans, reveals limited insight into GPT‑4’s training, discusses RLHF, data sources, bias, safety challenges, and the evolving non‑profit versus commercial direction of OpenAI.

AGIAI safetyBias

0 likes · 11 min read

Can GPT‑4 Really Threaten Humanity? Inside Sam Altman’s Candid Chat with Lex Fridman

DataFunSummit

Mar 24, 2023 · Artificial Intelligence

OpenAI Launches ChatGPT Plugin System: Features, Examples, and Safety Discussion

OpenAI announced a safety‑focused ChatGPT plugin system that connects the model to third‑party APIs for real‑time information retrieval, knowledge‑base access, and task execution, showcasing first‑party browser and code‑interpreter plugins, third‑party extensions, an open‑source retrieval plugin, and a detailed debate on security implications.

AI safetyChatGPTCode interpreter

0 likes · 9 min read

OpenAI Launches ChatGPT Plugin System: Features, Examples, and Safety Discussion

ITPUB

Mar 22, 2023 · Artificial Intelligence

What Can GPT‑4 Do? Vision, Long Memory, Safer AI and More

OpenAI’s GPT‑4 arrives with multimodal vision, a dramatically longer context window, higher exam scores, Socratic prompting, improved safety, and new partnerships, while still in research mode and subject to bias and code‑trust limitations.

AI safetyGPT-4Large Language Model

0 likes · 7 min read

What Can GPT‑4 Do? Vision, Long Memory, Safer AI and More

21CTO

Mar 20, 2023 · Artificial Intelligence

Sam Altman Warns: Could AI Like GPT‑4 Fuel Massive Misinformation?

In a recent interview, OpenAI CEO Sam Altman cautioned that advanced AI models such as GPT‑4 could spread large‑scale false information and enable harmful cyber attacks, prompting calls for careful regulation while highlighting both the technology’s impressive capabilities and its potential risks.

AI safetyElon MuskGPT-4

0 likes · 4 min read

Sam Altman Warns: Could AI Like GPT‑4 Fuel Massive Misinformation?

Architecture and Beyond

Mar 18, 2023 · Artificial Intelligence

Key Considerations for Deploying AIGC Products: Safety, Capacity, Cost, Legal Compliance, and Bias

The article outlines essential factors for launching AIGC products—including safety, content moderation, capacity planning, cost control, legal and copyright compliance, and model bias—providing practical guidance for technology managers navigating the rapidly evolving AI landscape.

AI safetyAIGCcontent moderation

0 likes · 21 min read

Key Considerations for Deploying AIGC Products: Safety, Capacity, Cost, Legal Compliance, and Bias

Tencent Cloud Developer

Mar 16, 2023 · Artificial Intelligence

What Makes GPT‑4 a Game‑Changer? 10 Expert Insights on Its Capabilities and Impact

This article provides a detailed analysis of GPT‑4, covering its multimodal abilities, performance gains, training innovations, safety improvements, new application scenarios, impact on developers, and future trends in large language models.

AI safetyGPT-4LLM trends

0 likes · 16 min read

What Makes GPT‑4 a Game‑Changer? 10 Expert Insights on Its Capabilities and Impact

21CTO

Mar 15, 2023 · Artificial Intelligence

What Makes OpenAI’s New GPT‑4 a Game‑Changer for Multimodal AI?

OpenAI’s GPT‑4, a multimodal large language model that accepts text and image inputs, powers ChatGPT and Bing, offers improved creativity and problem‑solving while still facing hallucination risks, and is now available via ChatGPT Plus and an open API for developers.

AI safetyGPT-4Large Language Model

0 likes · 5 min read

What Makes OpenAI’s New GPT‑4 a Game‑Changer for Multimodal AI?

DataFunSummit

Feb 12, 2023 · Artificial Intelligence

Claude vs. ChatGPT: Constitutional AI, RLAIF, and the Quest for Safer Large‑Language Models

This article reviews Anthropic's Claude assistant, explains the novel Constitutional AI (RLAIF) approach that replaces costly human‑feedback data with a set of natural‑language principles, compares Claude with ChatGPT across helpfulness and harmlessness, and details the supervision and reinforcement‑learning pipelines, data annotation, and experimental results that demonstrate superior safety performance.

AI safetyClaudeConstitutional AI

0 likes · 21 min read

Claude vs. ChatGPT: Constitutional AI, RLAIF, and the Quest for Safer Large‑Language Models

Xiaohongshu Tech REDtech

Feb 10, 2023 · Artificial Intelligence

Expert Insights on ChatGPT: Technical Challenges, Applications, and Future Directions

In a REDtech live interview, NLP professor Li Lei and Xiaohongshu engineers examined ChatGPT’s strengths—long, topic‑focused replies and few‑shot learning—and its challenges such as hallucinations, safety, lack of real‑time data, model compression, and multimodal AIGC, outlining how the technology could reshape content creation, customer service, and search while requiring careful risk management.

AIAI safetyChatGPT

0 likes · 20 min read

Expert Insights on ChatGPT: Technical Challenges, Applications, and Future Directions

DataFunTalk

Jan 15, 2023 · Artificial Intelligence

Advances in Dialogue Systems: Baidu PLATO Large‑Scale Conversational Models

This article reviews the evolution of dialogue systems from modular task‑oriented designs to end‑to‑end large‑scale models, detailing Baidu's PLATO series, their technical innovations, real‑world deployments, challenges such as inference efficiency and safety, and future research directions in conversational AI.

AI safetyConversational AIDialogue Systems

0 likes · 13 min read

Advances in Dialogue Systems: Baidu PLATO Large‑Scale Conversational Models

Programmer DD

Dec 6, 2022 · Artificial Intelligence

How an Engineer Coaxed ChatGPT into Writing a ‘Humanity‑Destruction’ Plan

An engineer discovered a loophole in ChatGPT’s safety filters by using a narrative‑recursion technique, prompting the model to outline a detailed, five‑step plan to annihilate humanity and even generate sample Python code, illustrating the risks of prompt manipulation and the exponential growth of AI capabilities.

AI safetyChatGPTPython

0 likes · 6 min read

How an Engineer Coaxed ChatGPT into Writing a ‘Humanity‑Destruction’ Plan

OPPO Amber Lab

Sep 7, 2022 · Artificial Intelligence

How the World AI Conference Shaped the Future of Trustworthy AI

The World AI Conference’s Trustworthy AI Forum in Shanghai gathered over 20 global experts, government leaders, and industry representatives to discuss policies, standards, technologies, and applications, unveiling a new AI safety testing platform, a joint laboratory, and a comprehensive 2022 Trustworthy AI Industry Ecosystem Report.

AI safetyIndustry ReportTrustworthy AI

0 likes · 7 min read

How the World AI Conference Shaped the Future of Trustworthy AI

AntTech

Sep 3, 2022 · Artificial Intelligence

Highlights from the 2022 World AI Conference: Graph Computing, Privacy Computing, AI Safety, and New Open Platforms

The 2022 World AI Conference in Shanghai showcased cutting‑edge research on graph computing and privacy computing, announced Ant Group’s new AI safety product “AntJian”, the “YinYu Open Platform” for trusted privacy computing, and the open‑source high‑performance graph database TuGraph, highlighting the push for secure, scalable AI technologies.

AIAI safetyAnt Group

0 likes · 7 min read

Highlights from the 2022 World AI Conference: Graph Computing, Privacy Computing, AI Safety, and New Open Platforms

DataFunSummit

Jul 21, 2022 · Artificial Intelligence

Advances and Challenges in Dialogue Systems: Baidu PLATO and Future Directions

This article reviews the evolution, architectures, challenges, and recent breakthroughs of dialogue systems—especially Baidu's PLATO model—while discussing data‑driven approaches, diversity, safety, interactive learning, and the potential role of virtual environments such as the metaverse in shaping future conversational AI.

AI safetyConversational AIMetaverse

0 likes · 24 min read

Advances and Challenges in Dialogue Systems: Baidu PLATO and Future Directions

AntTech

Jul 18, 2022 · Artificial Intelligence

Trusted AI Research at Ant Group: Advances in Computer Vision, Watermark Defense, Robust Machine Learning, and Explainable NLG

Ant Group’s security labs present a series of cutting‑edge AI research achievements—including hierarchical multi‑granular classification for computer vision, watermark‑vaccine defenses, multi‑modal document understanding, robust and explainable machine learning, and logic‑driven data‑to‑text generation—highlighting their commitment to trustworthy and secure AI applications.

AI safetyData2TextRobust Machine Learning

0 likes · 12 min read

Trusted AI Research at Ant Group: Advances in Computer Vision, Watermark Defense, Robust Machine Learning, and Explainable NLG

DataFunTalk

Jul 12, 2022 · Artificial Intelligence

Applying Computer Vision for Content Safety in Live Streaming: Practices and Future Directions

This presentation details how Huya leverages computer‑vision algorithms to detect and mitigate risky content such as political, pornographic, and violent material in live‑streaming and short‑video platforms, describing system architecture, labeling strategies, algorithmic pipelines, real‑time moderation techniques, and future research directions.

AI safetyLive StreamingRisk Detection

0 likes · 11 min read

Applying Computer Vision for Content Safety in Live Streaming: Practices and Future Directions

DataFunTalk

May 28, 2022 · Artificial Intelligence

Adversarial Examples for Captcha: Techniques, Applications, and Future Directions

This article presents a comprehensive overview of adversarial example research applied to captcha systems, covering the definition and history of adversarial attacks, geometric‑aware generation frameworks, FGSM‑based attack variants, experimental results, trade‑offs between image quality and attack strength, and future work such as AdvGAN integration.

AI safetyFGSMGaN

0 likes · 14 min read

Adversarial Examples for Captcha: Techniques, Applications, and Future Directions

Didi Tech

Apr 20, 2021 · Artificial Intelligence

Few-Shot Learning, Data Augmentation, and Semi‑Supervised Methods for Improving Safety and Governance Models at Didi

To overcome scarce labeled data for safety and governance, Didi combines few‑shot learning with systematic data augmentation, self‑training semi‑supervised labeling, and multi‑task neural architectures, cutting labeling costs and reducing log‑loss by over 20% while boosting ROC‑AUC and PR‑AUC across harassment detection, expense‑complaint, and route‑intercept use cases.

AI safetyData AugmentationDidi

0 likes · 15 min read

Few-Shot Learning, Data Augmentation, and Semi‑Supervised Methods for Improving Safety and Governance Models at Didi

Tencent Tech

Sep 25, 2020 · Artificial Intelligence

What’s Inside Tencent’s AI Security Attack Matrix? A Minefield Guide

Tencent’s AI Security Attack Matrix, the industry’s first AI‑focused risk framework, maps attack tactics, techniques, and processes across the AI lifecycle, offering practical guidance for researchers and developers to identify and mitigate security threats in AI systems.

AI safetyAI securityTencent

0 likes · 5 min read

What’s Inside Tencent’s AI Security Attack Matrix? A Minefield Guide