Information Security 23 min read

Security Analysis of Code Execution Sandboxes in AI Applications

This report investigates the security of code‑execution sandboxes used by various AI applications, evaluates their isolation mechanisms, presents detailed test results for multiple platforms, and offers recommendations for selecting and hardening sandbox solutions in the era of large language models.

AntTech
AntTech
AntTech
Security Analysis of Code Execution Sandboxes in AI Applications

With the rapid development of large language models, many AI applications now allow users to execute code, greatly expanding their use cases but also introducing significant security risks if the execution environment is not properly isolated.

The Ant Financial Security Lab examined several mainstream AI services, focusing on the security of their code‑execution sandboxes. The study includes a background overview, a description of test objects, detailed testing methods, and comprehensive results for each application.

Background : AI code execution typically runs in a sandbox; OpenAI's code interpreter, for example, uses gVisor. Inadequate sandbox isolation can allow malicious code to compromise backend services and the entire cluster.

Test Objects : The analysis covered ChatGPT 4, and five other AI tools (Applications A‑E), each representing different sandbox implementations.

Testing Criteria (01‑07): ability to execute arbitrary commands, privilege level of execution, external network access, east‑west isolation, leakage of sensitive information, arbitrary file upload, and the underlying sandbox technology.

Key Findings :

ChatGPT 4 executes commands in a gVisor container, runs as a non‑privileged user, lacks external network access, provides east‑west isolation, but leaks some sensitive information and allows arbitrary file upload.

Application A uses Pyodide (WebAssembly‑based Python) with partial external network access, runs as non‑privileged, and leaks Python version information.

Application B employs Omegajail (process‑level sandbox) and permits arbitrary command execution, external network access, and file upload.

Application C runs on Deno (JavaScript/TypeScript runtime) with privileged execution, partial external network access, and file upload capability.

Application D utilizes AWS Lambda backed by Firecracker micro‑VMs, allowing arbitrary commands, external network access, file upload, and leaking internal source code.

Application E uses the e2b sandbox service (Firecracker‑based) with similar capabilities to Application D.

Sandbox Technology Overview : The report describes language‑level sandboxes (Pyodide, Deno), process‑level sandboxes (Omegajail), and system‑level sandboxes (Firecracker, gVisor), highlighting their architectures, isolation guarantees, and trade‑offs.

Selection Recommendations : For simple workloads and limited resources, language‑level sandboxes like Pyodide are recommended; for complex, high‑security requirements, system‑level sandboxes such as gVisor or Firecracker are preferable, possibly via commercial services (e2b, Modal) to reduce operational overhead.

Hardening Guidelines : Adopt minimal runtime environments, run as non‑privileged users, enforce strict security policies (file and network restrictions), and implement comprehensive logging and audit mechanisms.

Conclusion : Secure sandboxing remains a critical, evolving challenge for AI applications. Proper sandbox selection and configuration, combined with layered defenses, are essential to mitigate code‑execution risks in the era of large language models.

References :

[1] gVisor – https://gvisor.dev/

[2] ChatGPT 4 – https://openai.com/gpt-4

[3] e2b – https://e2b.dev/

[4] Liu, T. et al., “Demystifying RCE Vulnerabilities in LLM‑Integrated Apps,” arXiv:2309.02926, 2023.

[5] Wiz Security Research, “Wiz and Hugging Face address risks to AI infrastructure,” 2024.

SandboxAI securityFirecrackercode executionDenogVisorPyodide
AntTech
Written by

AntTech

Technology is the core driver of Ant's future creation.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.