Which Programming Languages Do Hackers Prefer? Survey and Exploit-DB Analysis

This study surveys members of the Chaos Computer Club and analyzes over 45,000 Exploit‑DB entries to identify the programming languages most commonly used by hackers, describing the data‑collection process, language‑detection methodology with Pygments, and revealing trends such as the dominance of Shell and Python and the evolving preferences over time.

Linux Tech Enthusiast
Linux Tech Enthusiast
Linux Tech Enthusiast
Which Programming Languages Do Hackers Prefer? Survey and Exploit-DB Analysis

The authors, together with other researchers in the Chaos Computer Club (CCC), conducted a survey in May 2021 to discover which programming languages are most frequently used by hackers. The online questionnaire received responses from 48 CCC members, who reported primarily using Shell (Bash/PowerShell) and Python, with C, JavaScript, and HTML/CSS also mentioned. The survey noted that participants did not consider language choice essential for attacks and that preferences had shifted over time.

To validate the survey findings, the authors compared them with data from Exploit‑DB, a public repository of exploit scripts. At the time of writing, Exploit‑DB contained more than 45,000 exploits contributed by over 9,000 authors. Each entry includes metadata such as exploit ID, author, type, and publication date, and is linked to a file containing the actual script.

For the analysis, the authors cloned a GitHub project that provides the necessary files under an exploits directory. Using an Anaconda‑based Python environment, they created and activated the conda environment with:

conda env create -f environment.yml
conda activate exploits

The Exploit‑DB snapshot was copied from the Kali Linux share directory and transformed with a script:

cp -p /usr/share/exploitdb/files_exploits.csv data/
python execute_transformer.py

The transformation script extracts language information from each exploit file. Although Pygments is primarily a syntax‑highlighting library, it offers a guess_lexer_for_filename function that can infer the programming language. The authors also evaluated the deep‑learning based Guesslang library but found it slower and less accurate, so they retained Pygments. The core detection function is shown below:

import pygments
from pygments.lexers import guess_lexer_for_filename

def _parse_exploit_file(file_name):
    with open(file_name, encoding="UTF-8") as file:
        lines = file.readlines()
    text = "
".join(lines)
    line_count = len(lines)
    try:
        lang_guessed = guess_lexer_for_filename(file_name, text).name
    except pygments.util.ClassNotFound:
        lang_guessed = None
    return line_count, lang_guessed

Using this pipeline, the authors compared the top ten languages reported by CCC members with those inferred from Exploit‑DB authors. While both datasets highlighted Shell and Python, the Exploit‑DB data showed a heavy skew toward "Text only" entries, which often contain shell commands or mixed‑language scripts, potentially under‑representing certain languages. Sample size differed markedly: 48 survey respondents versus over 900 authors contributing more than 2,500 files in 2020/21, resulting in 1,134 distinct language references in Exploit‑DB and 140 mentions in the CCC survey.

Extending the analysis to the entire Exploit‑DB history (25 years) revealed that the most frequent languages are Text, Python, C, HTML, and Perl. Notably, Perl appears high in the historical ranking despite its absence in the CCC top‑ten list. The authors observed a clear shift from C to Python over the years, aligning with the CCC participants' view that language choice is not critical for attacks and reflecting Python's broader popularity.

In conclusion, the comparative study demonstrates substantial overlap between the languages used by CCC members and Exploit‑DB authors, confirming Python's prominence in the security field and indicating that language preferences evolve with technological trends. The primary limitation stems from the language‑detection approach, which struggles with multi‑language files; addressing this limitation is suggested for future research.

programming languagesInformation Securityhackerlanguage detection
Linux Tech Enthusiast
Written by

Linux Tech Enthusiast

Focused on sharing practical Linux technology content, covering Linux fundamentals, applications, tools, as well as databases, operating systems, network security, and other technical knowledge.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.