How Fake GitHub Stars Are Bought, Detected, and Why Investors Care

GitHub star‑buying has become a black‑market industry, with prices ranging from under a yuan to six yuan per star, prompting investors to use star counts as a metric, while researchers develop simple and advanced clustering algorithms to detect fake stars and expose the practice.

Programmer DD
Programmer DD
Programmer DD
How Fake GitHub Stars Are Bought, Detected, and Why Investors Care

Fake Star Market

GitHub star‑buying has appeared repeatedly, even affecting well‑known open‑source projects from large companies. Prices vary: cheap services cost as low as 0.4–0.88 CNY per star, while premium accounts with a year‑old history can cost up to 6 CNY per star. An entrepreneur spent 20 EUR (≈156 CNY) to purchase 25 high‑quality stars, only to see them disappear after a month when the accounts were banned. Low‑cost providers often offer free re‑stars after removal.

Fake Star Detector

Fraser Marlow, a growth lead at Dagster, discovered the black‑market and collaborated with spam‑filter experts to build a fake‑star detector. The system uses two algorithms: a simple rule‑based filter that catches obvious patterns (e.g., many accounts starring the same two projects without contribution history) and a more sophisticated supervised clustering algorithm that groups accounts with similar activity signatures. Normal users exhibit diverse, sparse activity, whereas fake accounts cluster tightly. The team validated the approach by creating a target repository, purchasing stars, and achieving near‑100 % match rate in tests, with 98 % precision and 85 % recall on real‑world data.

Applying the detector to the GitHub Archive dataset revealed extreme cases: the project okcash had 759 stars, but the simple algorithm flagged only one suspicious star, while the clustering method identified 97 % of them as fake. The analysis focused on stars added after 2022‑01‑01, meaning many older fraudulent stars remain undetected. Dagster’s own product and several peers showed low fake‑star rates, suggesting the data‑pipeline sector is relatively healthy.

Investors: We Love Stars

Open‑source teams often buy stars to attract venture capital. Investors such as Pratima Aiyagari view star count as a quick proxy for project traction, creating metrics like the ROSS index that ranks projects by star‑growth rate. However, as investors increasingly rely on this metric, its reliability erodes. Scholars like Stuart Geiger note that over‑time such indicators become self‑defeating, echoing Campbell’s Law (metrics become manipulable) and Goodhart’s Law (once a metric is targeted, it ceases to be a good measure). The practice extends beyond funding: job seekers use star‑rich projects to impress recruiters, and many companies evaluate candidates based on GitHub metrics.

One More Thing

The rise of large AI models makes detecting fake accounts harder. Earlier fake‑star schemes relied on simple account features, but with tools like ChatGPT, perpetrators can generate realistic, varied comments, blurring the line between genuine and synthetic activity.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

clusteringGitHubinvestmentfake detectionstar buying
Programmer DD
Written by

Programmer DD

A tinkering programmer and author of "Spring Cloud Microservices in Action"

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.