SkillAttack Reveals 6,500+ Attack Paths – Community‑Built SkillAtlas Secures Agent Skills
SkillAttack automates red‑team testing of LLM‑driven Agent Skills, exposing real attack paths across dozens of models, while the community‑curated SkillAtlas now hosts over 6,500 publicly searchable traces covering 233 skills and 18 major model families, inviting researchers and developers to contribute.
SkillAttack is an automated red‑team testing framework that requires no modification of existing Skills or the underlying Agent platform. Its core operation consists of three stages: (1) a systematic vulnerability‑surface analysis of each Skill, (2) parallel generation of concrete attack paths for every identified surface, and (3) execution of those paths in a sandbox with iterative refinement of the attack strategy until the path is fully reproduced.
The framework focuses on whether a discovered path can actually be executed, rather than merely being theoretically possible. This distinguishes SkillAttack from prior static analyses or single‑run tests, moving Skill security from speculative risk to verifiable, reproducible attack trajectories.
To evaluate the approach, the authors tested SkillAttack on ten mainstream large‑model agents—including GPT‑5.4, Claude Sonnet, Gemini 3.0 Pro, Kimi‑k2.5, Qwen 3.5‑Plus, and GLM‑5—using both adversarially crafted Skill scenarios and 100 of the most popular real Skills scraped from ClawHub. In both settings, SkillAttack discovered significantly more successful attack paths than existing methods, demonstrating that Skill‑related security issues are systemic rather than isolated incidents.
The research team from the National Laboratory of Intelligent Algorithm Security also launched SkillAtlas, a community‑maintained Attack Trace Library. SkillAtlas aggregates three sources of knowledge: publicly disclosed security cases, automatically discovered attack trajectories from SkillAttack, and community‑submitted real‑world risk reports. To date it contains more than 6,500 public attack traces, covering 233 distinct Skills across 18 major model families and eight risk categories (data theft, malicious code execution, backdoor implantation, phishing, poisoning, etc.). Each trace can be filtered by attack outcome, risk type, and severity level.
The authors argue that every newly discovered attack path should become a shared defensive asset rather than private knowledge. By making these paths publicly searchable, the community can avoid repeatedly “stepping into the same pit” and collectively raise the security baseline of the entire Skill ecosystem.
SkillAtlas invites contributions from three groups: security researchers (to submit new attack paths and reproducibility records), developers (to report anomalous behaviors observed in their Skills), and platform providers (to help consolidate dispersed risk experiences into a systematic public resource). The platform’s website (https://skillatlas.top) and its GitHub repository (https://github.com/Zhow01/SkillAttack) provide access to the library and the testing framework, while the underlying research paper is available at https://arxiv.org/pdf/2604.04989.
In summary, SkillAttack demonstrates that Agent Skills introduce substantial, verifiable security risks across a wide range of LLM agents, and SkillAtlas offers a scalable, community‑driven mechanism to catalog, share, and mitigate those risks.
Machine Learning Algorithms & Natural Language Processing
Focused on frontier AI technologies, empowering AI researchers' progress.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
