DevOps Continuous Integration Practices in Large-Scale Projects
The article outlines how large‑scale projects can achieve reliable, fast releases by adopting DevOps continuous‑integration practices such as trunk‑based development, comprehensive automated testing, versioned artifact storage, configuration‑as‑code pipelines, Docker layer caching, and staged builds that prioritize short tasks and parallel execution to minimize manual effort and maintain quality.
DevOps, a combination of Development and Operations, refers to the process of connecting development, testing, and operations stages in software delivery through a toolchain, while reducing team time loss and achieving more efficient and stable product delivery through automated testing and monitoring. This article focuses on the capabilities required by DevOps during the Continuous Integration (CI) phase and provides a brief explanation of workflow design and pipeline optimization strategies.
As Tencent Document's project scale continues to grow, with increasing feature complexity and maintenance personnel, the contradiction between feature delivery frequency and software quality has become increasingly prominent. Implementing a comprehensive DevOps toolchain became a priority.
The author argues that every环节—from code integration and feature testing to deployment and infrastructure architecture management—should have comprehensive automated monitoring手段 and minimize manual intervention. Only then can software balance quality and efficiency while ensuring reliability during increased release frequency.
1. What We Discuss When We Talk About CI
CI (Continuous Integration) refers to the act of frequently integrating code into the main branch multiple times per day. This includes both the continuous integration of code into the main branch and the process of generating usable artifacts from source code.
During the CI phase, the following stages need to be implemented:
Static Code Checking: This includes ESLINT/TSLINT static syntax checking, verifying whether git commit messages conform to specifications, and checking if submitted files have corresponding owners for review. These static checks don't require compilation and can be completed by directly scanning the source code.
Unit Testing/Integration Testing/E2E Testing: Automated testing is crucial for ensuring product quality. Test case coverage and quality directly determine the quality of build artifacts, making comprehensive and well-designed test cases essential for achieving continuous delivery.
Compiling and Organizing Artifacts: In small to medium-sized projects, this step is often skipped, with build artifacts directly handed to deployment. However, for large projects, frequent submissions generate numerous build artifacts that require proper management.
2. Workflow Design for Better Integration
Before formally implementing CI, we need to plan a new workflow to address potential issues arising from switching to high-frequency integration.
Pipeline Organization: We need a suitable organizational form to manage what tasks a CI pipeline should execute at what stages. Tools like Drone or Jenkins support Configuration as Code—managing pipelines through configuration files. This approach eliminates the need for a dedicated web page for pipeline management, reduces maintenance costs, and allows pipeline configuration to be integrated into the source repository, enabling version management and audit traceability through git.
Version Release Pattern Trade-offs: According to "Continuous Delivery 2.0," version release patterns have three elements: delivery time, feature quantity, and delivery quality. These three factors constrain each other. With fixed development resources, only two elements can be guaranteed.
Traditional project-based release patterns sacrifice delivery time, waiting for all features to be developed and undergo complete manual testing before releasing a new version. However, this extends the delivery cycle and increases uncontrollable risks during development, potentially causing delayed releases.
For continuous integration, when integration frequency is high enough and automated testing is mature and stable, features don't need to be bundled into a single release. Each completed feature can be automatically tested, merged, and waiting for release. Stable features can then be automatically released at specific time intervals.
Branching Strategy: The original development model used branch development with trunk release, adopting the industry-standard Git-Flow pattern. While this pattern addresses feature development, bug fixes, version releases, and even hotfixes, its complex structure makes it difficult to manage. For large requirements, merge time intervals can be long, potentially causing significant conflicts when merging into the trunk.
The team decided to adopt trunk-based development with trunk release . Team members are required to submit their branch code to the trunk daily. When release conditions are met, a release branch is pulled directly from the trunk. If defects are found, they're fixed directly on the trunk and cherry-picked to corresponding release branches as needed.
However, trunk-based development requires strong infrastructure support and long-term habit cultivation. The challenges include: 1) Comprehensive and fast automated testing with high coverage; 2) Owner-responsibility Code Review mechanism; 3) Significant infrastructure investment for high-frequency automated testing; 4) Fast, stable rollback capabilities and precise online and gray-level monitoring.
3. Establishing Build-to-Artifact Pipeline in Large Projects
For most projects, after code compilation generates build artifacts, deployment involves logging into the server and pasting each generated artifact. This approach makes audit and traceability difficult and doesn't guarantee correctness.
Additionally, when rolling back is needed, since historical versions aren't stored on the server, rollback involves recompiling and generating artifacts for the historical version—a process that's not satisfactory for speed.
The solution is to never overwrite files; all artifacts should be uploaded to persistent storage. A traffic distribution service can be added upstream to determine which version of HTML file each request should return.
For large projects, returned HTML files may be injected with channel identifiers, user customizations, and SSR required first-screen data, changing their code form. Therefore, the artifact provider for HTML should be a separate dynamic service that completes template HTML replacement through certain logic.
Summary of artifact generation after each compilation:
1. For static files like CSS and JS resources, they'll be published to cloud object storage and synchronized to CDN for access speed optimization.
2. For HTML artifacts, a rendering service is needed, packaged as a Docker image at the same level as backend microservice images, for upstream traffic distribution services (gateways) to select which services to invoke based on user requests.
4. Speed is Efficiency: Pipeline Optimization Strategies
Under high-frequency trunk-based continuous integration, integration speed equals efficiency. Pipeline execution time is developers' primary concern and a decisive factor for pipeline usability.
1) Pipeline Task Orchestration: Pipeline stages should follow principles: tasks without prerequisites first, short-duration tasks first, and unrelated tasks in parallel. Through shortest-path dependency analysis of pipeline tasks, the earliest execution time for each task can be determined.
2) Leveraging Docker Cache: Docker provides a feature where each executable statement in Dockerfile builds a new image layer cached for reuse in subsequent builds. We can use this feature to reduce steps that typically repeat execution, improving CI efficiency.
For example, npm install in frontend projects is typically the most time-consuming. Since dependency changes are infrequent in high-frequency integration, we can package the node_modules folder as an image for subsequent compilations:
FROM node:12 AS dependencies
WORKDIR /ci
COPY . .
RUN npm install
ENV NODE_PATH=/ci/node_modules
A cache hit strategy is added: before the next compilation, check if the image cache exists. To ensure dependencies haven't changed, compare the md5 hash of package-lock.json between the current build and the cached image. If inconsistent, reinstall dependencies and cache a new image. If consistent, directly retrieve node_modules from the image, saving significant installation time.
Pipeline method to retrieve image folders:
COPY --from=dependencies node_modules/ .
# Other steps execute
This feature can be extended to all low-update-frequency, long-generation-time tasks in CI, such as Linux environment dependency installation, unit test case pre-run caching, and copying folders with many static files.
3) Staged Build: Pipeline execution time inevitably increases with more tasks. In large projects, as various metric calculations are added and test cases increase, execution time will eventually become unbearable.
Staged build divides the CI pipeline into primary and secondary builds. The primary build must execute on every code submission and blocks progression if checks fail. The secondary build doesn't block the workflow, continuing to execute after code merge. However, if secondary build verification fails, the pipeline immediately sends notifications and blocks all other code merges until the issue is fixed.
Principles for whether a task should be in secondary build:
1) Secondary build includes long-running (over 15 minutes) and resource-intensive tasks like E2E automated tests.
2) Secondary build should include low-priority or low-failure-probability tasks, avoiding critical paths. If some automated test cases have high failure rates, consider adding related unit tests and moving them to the primary build.
3) If secondary build is still too long, consider splitting test cases appropriately for parallel testing.
Tencent Cloud Developer
Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.