Which Tech Giant Dominates Open‑Source on GitHub? 2017 Contribution Analysis
This article examines 2017 GitHub data to compare how major tech companies like Microsoft, Google, Amazon, and others contribute to open‑source projects, revealing surprising rankings, star counts, and the methodology behind the analysis using Google BigQuery and custom SQL queries.
Background
Open‑source has become a major driver in the IT industry, and major tech companies now contribute large amounts of code. This analysis quantifies corporate contributions on GitHub for the year 2017.
Data source
All GitHub events for 2017 were obtained from the GitHub Archive ( https://www.githubarchive.org/). Only events from repositories that gained more than 20 stars were considered. Corporate affiliation was inferred from the email domain in commit records; users without a corporate email were excluded.
Processing
Google BigQuery ( https://cloud.google.com/bigquery/) was used to run a StandardSQL query that:
Extracts all events from the monthly tables `githubarchive.month.2017*`.
Identifies “WatchEvent” rows to compute the star count per repository and keeps repositories with >20 stars.
Joins those repositories with “PushEvent” rows, extracts the author email, and derives the domain (e.g., google.com).
Aggregates per domain the number of distinct contributors (“githubers”), the set of contributed repositories, and the total star count of those repositories.
Filters out personal email providers (gmail.com, yahoo.com, etc.) and keeps only domains with at least 30 contributors.
#standardSQL
WITH period AS (
SELECT *
FROM `githubarchive.month.2017*`
),
repo_stars AS (
SELECT repo.id,
COUNT(DISTINCT actor.login) AS stars,
APPROX_TOP_COUNT(repo.name, 1)[OFFSET(0)].value AS repo_name
FROM period
WHERE type='WatchEvent'
GROUP BY repo.id
HAVING stars > 20
),
pushers_guess_emails_and_top_projects AS (
SELECT *,
REGEXP_EXTRACT(email, r'@(.*)') AS domain
FROM (
SELECT actor.id,
APPROX_TOP_COUNT(actor.login,1)[OFFSET(0)].value AS login,
APPROX_TOP_COUNT(JSON_EXTRACT_SCALAR(payload, '$.commits[0].author.email'),1)[OFFSET(0)].value AS email,
COUNT(*) AS c,
ARRAY_AGG(DISTINCT TO_JSON_STRING(STRUCT(b.repo_name, stars))) AS repos
FROM period a
JOIN repo_stars b ON a.repo.id = b.id
WHERE type='PushEvent'
GROUP BY actor.id
HAVING c > 3
)
)
SELECT *
FROM (
SELECT domain,
githubers,
(SELECT COUNT(DISTINCT repo) FROM UNNEST(repos) repo) AS repos_contributed_to,
ARRAY(
SELECT AS STRUCT
JSON_EXTRACT_SCALAR(repo, '$.repo_name') AS repo_name,
CAST(JSON_EXTRACT_SCALAR(repo, '$.stars') AS INT64) AS stars,
COUNT(*) AS githubers_from_domain
FROM UNNEST(repos) repo
GROUP BY repo_name, stars
HAVING githubers_from_domain > 1
ORDER BY stars DESC
LIMIT 3
) AS top,
(SELECT SUM(CAST(JSON_EXTRACT_SCALAR(repo, '$.stars') AS INT64))
FROM (SELECT DISTINCT repo FROM UNNEST(repos) repo)) AS sum_stars_projects_contributed_to,
ARRAY_CONCAT_AGG(ARRAY(SELECT * FROM UNNEST(repos) repo)) AS repos
FROM pushers_guess_emails_and_top_projects
WHERE domain IN UNNEST(SPLIT('google.com|microsoft.com|amazon.com','|'))
AND domain NOT IN UNNEST(SPLIT('gmail.com|users.noreply.github.com|qq.com|hotmail.com|163.com|me.com|googlemail.com|outlook.com|yahoo.com|web.de|iki.fi|foxmail.com|yandex.ru','|'))
GROUP BY domain
HAVING githubers > 30
)
ORDER BY githubers DESC;Key findings
Microsoft: ~1,300 employees contributed to 825 repositories, accumulating about 260 k stars.
Google: ~900 active contributors pushed code to roughly 1,100 repositories, with a total of ~530 k stars (driven by TensorFlow, Angular, Kubernetes, etc.).
Amazon: only 134 contributors and 158 repositories, yielding ~27 k stars.
Star count is not proportional to contributor count; Google’s star impact is roughly double Microsoft’s despite only a 25 % larger repository set.
Other notable corporate contributors include Red Hat, IBM, Pivotal, Intel, and Facebook, each surpassing Amazon in star impact.
Additional companies such as Alibaba, Uber, Wix, GitHub, Apache, Tencent, Baidu, Apple, Mozilla, Oracle, Stanford, and MIT appear in lower‑ranked groups, with projects like Alibaba’s Ant Design.
Interpretation
Microsoft, once perceived as hostile to open‑source, now actively participates and holds platinum membership in the Linux Foundation and CNCF. Google leads in star popularity due to high‑impact projects. Amazon lags behind both in contributor count and star influence.
Reproducibility
The analysis pipeline can be reproduced by:
Downloading the 2017 GitHub Archive data.
Running the above StandardSQL query in BigQuery.
Adjusting the domain filter list to include or exclude additional email providers.
Full ranking tables are available in the original Medium article:
https://medium.freecodecamp.org/the-top-contributors-to-github-2017-be98ab854e87.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
