Backend Development 12 min read

How to Generate Server‑Side PDFs with Puppeteer: A Step‑by‑Step Guide

This article explains how to use Puppeteer on a Node.js backend to render web pages as PDF files, covering installation, headless browser launch, page navigation, handling lazy‑loaded images, custom print CSS, authentication cookies, and Docker deployment with practical code snippets.

Aotu Lab

Jun 3, 2021

How to Generate Server‑Side PDFs with Puppeteer: A Step‑by‑Step Guide

Background

Server‑side generation of PDF files from web pages is often required for downstream processing, such as uploading the PDF to a storage service and passing its URL to external APIs. Because the PDF is not displayed to the end user, generating it on the backend saves client resources.

Technology Choice

The solution uses Puppeteer , a Node.js library that provides a high‑level API for controlling Chrome/Chromium via the DevTools protocol. Puppeteer can capture screenshots, generate PDFs, crawl SPAs for SSR, automate form submissions, and run UI tests.

Implementation Steps

Install the package. Use the full puppeteer package when a Chromium binary is needed, or puppeteer-core if a browser is already available.

$ npm install -g cnpm --registry=https://registry.npm.taobao.org
$ cnpm i puppeteer --save

To skip the bundled Chromium download, set the environment variable PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=1 or install puppeteer-core instead.

Launch the browser in headless mode.

const browser = await puppeteer.launch({
  headless: true,
  args: ['--no-sandbox', '--font-render-hinting=medium']
});

During local debugging you may set headless: false to see the full browser window.

Open a new page and navigate to the target URL.

const page = await browser.newPage();
await page.goto(`${baseURL}/article/${id}`, {
  timeout: 60000,
  waitUntil: 'networkidle2'
});

The timeout option can be increased for slow pages. waitUntil determines when navigation is considered complete; common values are load , domcontentloaded , networkidle0 , and networkidle2 .

Generate the PDF and write it to disk.

const ext = '.pdf';
const key = randomFilename(title, ext);
const _path = path.resolve(config.uploadDir, key);
await page.pdf({ path: _path, format: 'a4' });

path specifies the output file location; format sets the paper size (A4 = 8.27" × 11.7").

Even when you do not need a permanent file, set a temporary path during debugging to inspect the PDF.

Close the browser to free resources.

await browser.close();

Challenges and Solutions

1. Lazy‑loaded images

Pages that use lazy loading may render placeholder graphics in the PDF. After navigation, scroll the page to the bottom so that all images are fetched.

await autoScroll(page);
function autoScroll(page) {
  return page.evaluate(() => {
    return new Promise(resolve => {
      let totalHeight = 0;
      const distance = 100;
      const timer = setInterval(() => {
        const scrollHeight = document.body.scrollHeight;
        window.scrollBy(0, distance);
        totalHeight += distance;
        if (totalHeight >= scrollHeight) {
          clearInterval(timer);
          resolve();
        }
      }, 200);
    });
  });
}

2. Custom print CSS

Puppeteer renders PDFs using the page’s @media print stylesheet. Hide UI elements that are irrelevant to the article (headers, footers, comments, side panels, etc.).

@media print {
  .other_info,
  .authors,
  .textDetail_comment,
  .detail_recTitle,
  .detail_rec,
  .SuspensePanel {
    display: none !important;
  }
  .Footer,
  .HeaderSuctionTop {
    display: none;
  }
}

3. Authentication cookies

Protected articles require a logged‑in session. Inject the necessary cookies before generating the PDF.

async function simulateLogin(page, cookies, domain) {
  return await page.evaluate((sig, sess, domain) => {
    const date = new Date();
    date.setDate(date.getDate() + 1);
    const expires = `; expires=${date.toUTCString()}`;
    document.cookie = `koa:sess.sig=${sig}${expires}; domain=${domain}; path=/`;
    document.cookie = `koa:sess=${sess}${expires}; domain=${domain}; path=/`;
    document.cookie = `is_login=true${expires}; domain=${domain}; path=/`;
  }, cookies['koa:sess.sig'], cookies['koa:sess'], domain);
}
await simulateLogin(page, cookies, config.domain.split('//')[1]);

4. Docker deployment

When running Puppeteer inside Docker, install all required system libraries and ensure the Node version satisfies Puppeteer’s requirements (Node ≥ 10.18.1 for recent versions).

# Install Puppeteer dependencies on Ubuntu
RUN apt-get update && \
    apt-get install -y libgbm-dev gconf-service libasound2 libatk1.0-0 \
    libatk-bridge2.0-0 libc6 libcairo2 libcups2 libdbus-1-3 libexpat1 \
    libfontconfig1 libgcc1 libgconf-2-4 libgdk-pixbuf2.0-0 libglib2.0-0 \
    libgtk-3-0 libnspr4 libpango-1.0-0 libpangocairo-1.0-0 libstdc++6 \
    libx11-6 libx11-xcb1 libxcb1 libxcomposite1 libxcursor1 libxdamage1 \
    libxext6 libxfixes3 libxi6 libxrandr2 libxrender1 libxss1 libxtst6 \
    ca-certificates fonts-liberation libappindicator1 libnss3 lsb-release \
    xdg-utils wget build-essential libcairo2-dev libpango1.0-dev \
    libjpeg-dev libgif-dev librsvg2-dev -y && \
    apt-get install -y fonts-ipafont-gothic fonts-wqy-zenhei fonts-thai-tlwg fonts-kacst fonts-freefont-ttf --no-install-recommends

Note: Puppeteer v1.18.1‑v2.1.0 requires Node 8.9.0+, and v3.0.0+ requires Node 10.18.1+.

Conclusion

The complete workflow for generating PDFs on a Node.js backend with Puppeteer consists of:

Selecting the appropriate package ( puppeteer vs. puppeteer-core) based on the deployment environment.

Launching a headless Chromium instance, navigating to the target page, and optionally scrolling to trigger lazy‑loaded resources.

Applying custom @media print CSS to hide unwanted UI elements.

Injecting authentication cookies when the content is protected.

Generating the PDF with page.pdf() and saving it to a desired path.

Closing the browser to release resources.

Reference implementation and demo code are available at:

https://github.com/jiaozitang/puppeteerPdfDemo

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Docker Node.js Headless Chrome pdf-generation

Written by

Aotu Lab

Aotu Lab, founded in October 2015, is a front-end engineering team serving multi-platform products. The articles in this public account are intended to share and discuss technology, reflecting only the personal views of Aotu Lab members and not the official stance of JD.com Technology.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.