Operations 20 min read

What Drives Open‑Source Project Health in China? Insights from the 2018 Report

The 2018 China Open‑Source Annual Report analyzes GitHub data, introduces the Grank health index, highlights dominant companies and trending technologies such as AI, cloud, and blockchain, and reveals how activity and community decentralisation shape the vitality of Chinese open‑source projects.

Alibaba Cloud Native
Alibaba Cloud Native
Alibaba Cloud Native
What Drives Open‑Source Project Health in China? Insights from the 2018 Report

Data Collection and Processing

The Guide Compass team harvested public repository metadata from GitHub using a combination of web crawlers and the official GitHub Data API. For each repository the following attributes were stored in a local data structure:

Static fields: repository name, original source URL, creation date, primary language.

Dynamic fields: number of forks, stars, open issues, pull‑request count, commit count, list of contributors.

Raw JSON responses were cleaned, de‑duplicated and normalized into a relational schema. Aggregations (ranking, tag generation, statistical summaries) were then derived to support downstream classification, search and recommendation services built with machine‑learning, natural‑language‑processing and data‑analysis pipelines. An image‑database layer was added to enable high‑dimensional visual queries on repository‑level graphics.

Grank Model Overview

Grank is an index designed to quantify the health of an open‑source project or organization along two orthogonal dimensions: activity and community decentralisation .

Activity dimension

Activity is represented as a three‑dimensional vector (commits, pull‑requests, contributors). Weekly snapshots of the vector are taken; the Euclidean distance between consecutive snapshots yields a “movement speed” that reflects development velocity. This approach avoids raw‑count comparison across projects of different scales and focuses on the trend and magnitude of change.

Community decentralisation dimension

Each contributor’s identity attributes (e.g., email domain, organization affiliation) are extracted from commit metadata. These attributes are clustered (e.g., by domain) to compute a dispersion score. A high score indicates a broadly distributed contributor base, while a very low score signals centralisation. Both extremes can be risky, so the score is interpreted as a relative health indicator.

Data Set for Grank Evaluation

Using the GitHub API, the team collected data for all active repositories belonging to major Chinese internet companies and to projects donated to the Apache Software Foundation. The sampled organizations (unordered) include:

Alibaba (Ant Design, PouchContainer, Ice, Angular Developers, Egg.js, etc.)

Huawei (CarbonData, Hadoop‑related projects)

Tencent (AlloyTeam, tarscloud)

Baidu (FEX, EFE, Doris, ECharts)

Ele.me (Element)

NetEase, Sohu, Qihoo 360, Vipshop, Douban, Dianping, Xiaomi, Meituan, Meili, Wandoujia, Dangdang, Youzan, Deep, DNSPod, Sina Weibo, Toutiao, Didi, eBay

Apache incubator projects: CarbonData, Eagle, Kylin, Hawq, RocketMQ, Dubbo, Weex, Doris, ECharts, Griffin, SkyWalking

In total, roughly 500 repositories spanning the period 2017‑10‑01 to 2018‑09‑30 were analysed.

Key Findings

Top‑5 projects by combined activity and community scores

Ant Design (Ant Financial) – 2,298 commits in the year, 122 commits in the busiest week, 350 new contributors, 1,057 PRs; community participation consistently above 95%.

PouchContainer (Alibaba) – Steady weekly commit and PR volume despite a modest contributor pool; documentation is fully English, indicating strong international outreach.

CarbonData (Huawei / Apache) – High commit/PR activity and a stable, highly distributed community; the only Apache‑governed project in the top‑five.

Ice (Alibaba) – Rapidly growing front‑end framework; activity remains high but community participation shows a gradual decline as the project matures.

Element (Ele.me) – Vue‑based component library; activity fluctuates with a recent downward trend, while community decentralisation stays high.

An additional noteworthy project is SkyWalking (Apache incubator, personal lead). It is a distributed tracing system for micro‑services and containers and exhibits strong activity for a cloud‑native effort.

Programming language landscape (2017‑2018)

JavaScript remains the most popular language for web development in China.

Python rose to the third‑most‑used language in 2017, overtaking several European‑centric languages, driven by AI and data‑science adoption.

C/C++ fell to fourth place, reflecting a gradual shift away from traditional systems‑programming dominance.

GitHub growth and keyword trends

GitHub’s user base and repository count grew dramatically between 2017 and 2018, making its metrics a reliable proxy for open‑source activity in China. Keyword analysis of repository topics shows Machine Learning as the top‑searched term, followed by Game . Topic rankings are calculated by summing the star counts of all repositories tagged with a given label; this method can over‑represent projects that receive promotional stars.

Evaluation of star‑based rankings

Star counts and user‑defined tags provide flexible, community‑driven signals but may suffer from fairness issues. Ideal open‑source projects exhibit clear documentation, well‑structured architecture, and clean code, which tend to attract higher stars and forks. Many Chinese projects, however, still face challenges such as sparse documentation, excessive dependencies, or irregular update cycles.

Conclusions

The analysis confirms that Alibaba and its Ant Financial subsidiary dominate the Chinese open‑source ecosystem, contributing more than half of the top‑50 projects. Front‑end libraries are prevalent because of their rapid iteration cycles, while cloud‑native, AI‑related, and blockchain projects are gaining momentum. The Grank index offers a quantitative view of both development velocity and community health, providing a practical tool for assessing the vitality of open‑source initiatives.

AIData AnalysisOpen SourceChinaGitHubcloudProject Metrics
Alibaba Cloud Native
Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.