5 min read

How Safe Is ChatGPT‑Generated Code? Researchers Reveal Major Security Flaws

A study by Quebec researchers shows that ChatGPT often produces insecure code across C, C++, Python, and Java, warns that the model rarely flags these issues unless explicitly asked, and highlights ethical inconsistencies in its handling of vulnerable code.

Programmer DD

May 4, 2023

How Safe Is ChatGPT‑Generated Code? Researchers Reveal Major Security Flaws

ChatGPT, the large‑language‑model chatbot released by OpenAI, can generate code, but researchers at the University of Quebec discovered that the code it produces frequently contains serious security problems and the model does not proactively inform users of these risks.

The team published an arXiv paper titled How Secure is Code Generated by ChatGPT? , reporting that many of the generated snippets fail to meet basic security standards, and ChatGPT only acknowledges problems when directly questioned.

In their experiment, the researchers asked ChatGPT to write 21 programs and scripts in C, C++, Python, and Java, each designed to exhibit a specific vulnerability such as memory corruption, denial‑of‑service, deserialization flaws, or weak cryptographic implementations. Only five of the initial outputs were safe; after prompting the model to correct its mistakes, it managed to produce seven additional safer versions, but these improvements were limited to the particular flaws being evaluated.

The study notes that ChatGPT often overlooks the attacker’s execution model and repeatedly suggests unrealistic mitigations like “avoid invalid input,” which are not feasible in real‑world scenarios. Nevertheless, when pressed, the model can recognize and admit critical vulnerabilities in its suggestions.

Researchers also point out an ethical inconsistency: while ChatGPT refuses to generate overtly malicious code, it readily produces code with vulnerabilities and may even offer advice on how to make it safer, yet claims it cannot generate a safer version itself.

Finally, the authors observe that effective prompts for fixing vulnerabilities require prior knowledge of the specific flaw and coding techniques, meaning that using ChatGPT to remediate security issues may not provide additional value beyond what developers already know.