How to Build a GitHub Code Leak Detector with Python – Real‑World Security Monitoring

This tutorial walks you through creating a Python‑based GitHub monitoring tool that logs in, crawls code search results for sensitive keywords, extracts repository details, writes findings to CSV, and sends email alerts, providing a practical approach to detecting accidental source‑code leaks.

Programmer DD
Programmer DD
Programmer DD
How to Build a GitHub Code Leak Detector with Python – Real‑World Security Monitoring

0×00 Background

It is well known that GitHub is a hot target for security personnel and hackers because many developers unintentionally expose their code, which can pose security risks to enterprises.

For example, code may contain sensitive information such as usernames, passwords, database credentials, internal IPs, even personal details. Therefore, monitoring GitHub for information leaks is necessary. Existing open‑source tools are not suitable, so a custom tool is built.

0×01 Roll Up Your Sleeves

Life is short, I use Python!

Python’s powerful libraries, concise syntax and rapid development make it ideal for this project.

Principle and Steps

GitHub does not provide a searchable API, so we use a web crawler to fetch pages, parse the results, and extract the needed information.

Login to GitHub; Query keyword results; Email alert; Read configuration file.

Development Environment and Python Libraries

Environment: macOS 10.12.6, Python 3.6.5

Libraries: requests, lxml, csv, tqdm, email, smtplib, configparser, time

0×02 Step Analysis

1. Login to GitHub

Login requires a POST request to https://github.com/session with parameters including authenticity_token, login and password. The token is extracted from the login page using XPath.

def login_github(username, password):
    login_url = 'https://github.com/login'
    session_url = 'https://github.com/session'
    try:
        s = requests.session()
        resp = s.get(login_url).text
        dom_tree = etree.HTML(resp)
        key = dom_tree.xpath('//input[@name="authenticity_token"]/@value')
        user_data = {
            'commit': 'Sign in',
            'utf8': '✓',
            'authenticity_token': key,
            'login': username,
            'password': password
        }
        s.post(session_url, data=user_data)
        s.get('https://github.com/settings/profile')
        return s
    except:
        print('Exception, check network and credentials')

2. Query Keyword and Render Results

After login, construct a search URL like https://github.com/search?p={page}&q={keyword}&type=Code, fetch the page, and parse repository URLs, usernames, upload times, and filenames using XPath.

# Example snippet extracting URLs
Urls = dom_tree_code.xpath('//div[@class="d-inline-block col-10"]/a[2]/@href')
users = dom_tree_code.xpath('//a[@class="text-blod"]/text()')
datetime = dom_tree_code.xpath('//relative-time/text()')
filename = dom_tree_code.xpath('//div[@class="d-inline-block col-10"]/a[2]/text()')

3. Email Alert

The tool sends an email with the list of leaked repositories. The email body includes the matched payload, URL, and code snippet.

def send_warning(host, username, password, sender, receivers, content):
    # Build MIME message and send via SMTP
    ...

4. Configuration File Reading

A simple INI file stores the keyword, GitHub credentials, email settings, and custom payloads. The main function reads this file and passes the values to the hunter function.

[KEYWORD]
keyword = your main keyword here

[EMAIL]
host = smtp.example.com
user = [email protected]
password = your_password

[PAYLOADS]
p1 = password
p2 = username

0×03 Monitoring Result

1. Run Output

2. Email Alert

0×04 Summary

The tool first searches GitHub with a main keyword (e.g., company domain, email, name), then scans the results for user‑defined payloads such as password, username, database, etc. Combined with cron, it can run daily and send alerts. The full source code is available on GitHub.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

AutomationWeb Scrapingemail-alert
Programmer DD
Written by

Programmer DD

A tinkering programmer and author of "Spring Cloud Microservices in Action"

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.