What Programming Languages Do Hackers Prefer? Survey and Exploit-DB Analysis

A 2021 CCC member survey and a large‑scale analysis of Exploit‑DB reveal that hackers predominantly use Shell scripts and Python, with notable overlap across both data sets, while language preferences shift over time toward Python and away from C, highlighting detection challenges and future trends.

Linux Tech Enthusiast
Linux Tech Enthusiast
Linux Tech Enthusiast
What Programming Languages Do Hackers Prefer? Survey and Exploit-DB Analysis

Survey Overview

In May 2021 the Chaos Computer Club (CCC) sent an online questionnaire to its members. Forty‑eight respondents indicated the programming languages they used for hacking. The most frequently mentioned were Shell (Bash/PowerShell) and Python, followed by C, JavaScript, and HTML/CSS. Participants noted that language choice is not essential for attacks and that their preferences evolve over time.

Exploit‑DB Background

Exploit‑DB is a public archive of exploit scripts and vulnerable software, primarily accessed by penetration testers and security researchers via its website or Kali Linux tools. At the time of writing the database contained over 45,000 exploits contributed by more than 9,000 authors. Each entry includes metadata (ID, author, type, date) and a file with the actual exploit code.

Setup and Data Transformation

To reproduce the analysis, the GitHub project containing the analysis scripts is cloned. The environment is based on Anaconda Python and can be created with:

conda env create -f environment.yml
conda activate exploits

The CSV snapshot of Exploit‑DB (files_exploits.csv) is copied into the data/ directory and transformed with execute_transformer.py:

cp -p /usr/share/exploitdb/files_exploits.csv data/
python execute_transformer.py

The transformer extracts the programming language of each exploit file using the Pygments library, which provides a guess_lexer_for_filename function. An alternative deep‑learning detector (Guesslang) was evaluated but proved slower and did not yield superior results, so Pygments was retained.

import pygments
from pygments.lexers import guess_lexer_for_filename

def _parse_exploit_file(file_name):
    with open(file_name, encoding="UTF-8") as file:
        lines = file.readlines()
        text = "
".join(lines)
        line_count = len(lines)
        try:
            lang_guessed = guess_lexer_for_filename(file_name, text).name
        except pygments.util.ClassNotFound:
            lang_guessed = None
        return line_count, lang_guessed

This code reads a file, counts its lines, and returns the language guessed by Pygments based on file extension and lexical analysis.

Result Discussion

Figure 2 (not shown) compares the top ten languages reported by CCC members with those detected in Exploit‑DB. The CCC sample (48 participants) contributed 140 language mentions, while over 1,134 distinct language references appear in the Exploit‑DB snapshot (900+ authors, 2,500+ files). Both datasets share a large overlap: Shell and Python appear at the top, followed by C, JavaScript, and HTML/CSS. Approximately 60 % of the languages appear in both lists, and Python consistently ranks second.

Exploit‑DB shows a pronounced imbalance: more than half of the entries are classified as “Text only” by Pygments. These files often contain descriptive text together with embedded shell commands or scripts, causing under‑representation of certain languages, especially Shell scripts, which rank first in the CCC survey.

Historical Perspective

Figure 3 (not shown) shows the top ten languages across the entire Exploit‑DB history, again based on Pygments detection. “Text” remains the most common, followed by Python, C, HTML, and Perl. Prolog appears unexpectedly, likely due to misclassification of the “.pl” extension.

Figure 4 (not shown) visualizes the percentage share of each language over the past 25 years. The share of text files stays stable, while the share of C declines and Python rises sharply. This mirrors the CCC participants’ observation that language preferences evolve, with many attributing the rise of Python to its broader popularity rather than any intrinsic security advantage.

Conclusion

The comparison confirms substantial overlap between the languages used by CCC members and those found in Exploit‑DB, reinforcing Python’s prominence in security research. Both sources indicate that language preferences shift over time, driven by broader technological trends. The study’s main limitation is the reliance on Pygments for language detection, which struggles with multi‑language files and may under‑count certain scripts. Addressing this limitation is a promising direction for future work. Overall, the analysis demonstrates that the Exploit‑DB dataset offers a rich resource for data‑driven security research.

Pythonprogramming languagesshellInformation Securitysecurity researchhackingexploit-db
Linux Tech Enthusiast
Written by

Linux Tech Enthusiast

Focused on sharing practical Linux technology content, covering Linux fundamentals, applications, tools, as well as databases, operating systems, network security, and other technical knowledge.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.