Building a ChatGPT‑Powered Markdown Documentation System with Embedbase and Nextra

This tutorial walks through creating an intelligent documentation site that stores markdown pages in Embedbase, retrieves semantically similar chunks for user queries, builds contextual prompts, and streams answers from ChatGPT using a custom Nextra theme and Node.js backend.

Architect's Guide
Architect's Guide
Architect's Guide
Building a ChatGPT‑Powered Markdown Documentation System with Embedbase and Nextra

In this guide we demonstrate how to turn a static markdown documentation site into an AI‑enhanced knowledge base that can answer user questions using ChatGPT. The solution combines OpenAI's ChatGPT, the Embedbase vector database, and the Nextra documentation framework built on Next.js.

Overview

We need to store document content in a database, accept user queries, search for the most similar passages, construct a context from the top‑5 results, and ask ChatGPT to answer based on that context.

Prerequisites

Embedbase API key

– provides semantic similarity search. OpenAI API key – for ChatGPT. Nextra and Node.js – the documentation framework.

Configure the keys in a .env file:

OPENAI_API_KEY="<YOUR KEY>"
EMBEDBASE_API_KEY="<YOUR KEY>"

Create Nextra Docs

Clone the official Nextra template from GitHub, install dependencies, and run the development server.

# we won't use "pnpm" here, rather the traditional "npm"
rm pnpm-lock.yaml
npm i
npm run dev

Prepare and Store Files

Write a scripts/sync.js script that reads all .mdx files, splits them into 100‑line chunks, and uploads the chunks to Embedbase.

const glob = require("glob");
const fs = require("fs");
const sync = async () => {
  // 1. read all files under pages/* with .mdx extension
  const documents = glob.sync("pages/**/*.mdx").map(path => ({
    id: path.replace("pages/", "/").replace("index.mdx", "").replace(".mdx", ""),
    data: fs.readFileSync(path, "utf-8")
  }));
  // 2. split documents into chunks of 100 lines
  const chunks = [];
  documents.forEach(document => {
    const lines = document.data.split("
");
    const chunkSize = 100;
    for (let i = 0; i < lines.length; i += chunkSize) {
      const chunk = lines.slice(i, i + chunkSize).join("
");
      chunks.push({ data: chunk });
    }
  });
};
sync();

Upload the chunks to Embedbase:

const fetch = require("node-fetch");
const apiKey = process.env.EMBEDBASE_API_KEY;
const response = await fetch("https://embedbase-hosted-usx5gpslaq-uc.a.run.app/v1/documentation", {
  method: "POST",
  headers: {
    "Authorization": "Bearer " + apiKey,
    "Content-Type": "application/json"
  },
  body: JSON.stringify({ documents: chunks })
});
console.log(await response.json());

Build Contextual Prompt

Install tiktoken to count tokens and create a helper that searches Embedbase and assembles a prompt limited to 1800 tokens.

import { get_encoding } from "@dqbd/tiktoken";
const enc = get_encoding('cl100k_base');
const apiKey = process.env.EMBEDBASE_API_KEY;
const search = async (query) => {
  return fetch("https://embedbase-hosted-usx5gpslaq-uc.a.run.app/v1/documentation/search", {
    method: "POST",
    headers: { "Authorization": "Bearer " + apiKey, "Content-Type": "application/json" },
    body: JSON.stringify({ query })
  }).then(r => r.json());
};
export default async function buildPrompt(req, res) {
  const prompt = req.body.prompt;
  const context = await createContext(prompt);
  const newPrompt = `Answer the question based on the context below, and if the question can't be answered based on the context, say "I don't know"

Context: ${context}

---

Question: ${prompt}
Answer:`;
  res.status(200).json({ prompt: newPrompt });
}

Streaming ChatGPT Calls

Implement an OpenAI streaming helper ( utils/OpenAIStream.ts) using eventsource-parser and expose an edge function pages/api/qa.ts that forwards the built prompt to the ChatGPT completion endpoint.

export async function OpenAIStream(payload) { /* … streaming logic … */ }
// pages/api/qa.ts
export const config = { runtime: "edge" };
export default async function handler(req, res) {
  const { prompt } = await req.json();
  const payload = { model: "gpt-3.5-turbo", messages: [{ role: "user", content: prompt }], stream: true };
  const stream = await OpenAIStream(payload);
  return new Response(stream);
}

Connect UI

Replace the default Nextra search bar with a modal that collects a question, calls /api/buildPrompt to get a contextual prompt, then streams the answer from /api/qa back to the UI.

// theme.config.tsx – Search component
const Search = () => {
  const [open, setOpen] = useState(false);
  const [question, setQuestion] = useState("");
  const [answer, setAnswer] = useState("");
  const answerQuestion = async (e) => {
    e.preventDefault();
    const promptRes = await fetch("/api/buildPrompt", { method: "POST", headers: { "Content-Type": "application/json" }, body: JSON.stringify({ prompt: question }) });
    const { prompt } = await promptRes.json();
    const resp = await fetch("/api/qa", { method: "POST", headers: { "Content-Type": "application/json" }, body: JSON.stringify({ prompt }) });
    const reader = resp.body.getReader();
    const decoder = new TextDecoder();
    let done = false;
    while (!done) {
      const { value, done: doneReading } = await reader.read();
      done = doneReading;
      setAnswer(prev => prev + decoder.decode(value));
    }
  };
  return (<>/* UI omitted for brevity */</>);
};

Conclusion

We created a Nextra documentation site, stored its content in Embedbase, built a semantic search API, generated a context‑aware prompt, streamed ChatGPT responses, and wired everything into a custom search modal.

Further Reading

Embedding enables semantic search, recommendation, classification, and generative retrieval across text, images, and multimodal data. Production considerations include storage infrastructure, cost optimization, user isolation, token limits, and integration with cloud services.

GitHub Action for Continuous Indexing

A simple workflow runs on every push to main, installs dependencies, and executes node scripts/sync.js to keep the Embedbase index up‑to‑date.

name: Index documentation
on:
  push:
    branches: [main]
jobs:
  index:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - uses: actions/setup-node@v2
        with:
          node-version: 14
      - run: npm install
      - run: node scripts/sync.js
        env:
          EMBEDBASE_API_KEY: ${{ secrets.EMBEDBASE_API_KEY }}

With these pieces in place, the documentation site becomes an interactive knowledge base powered by ChatGPT.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Node.jsChatGPTEmbeddingAPINextra
Architect's Guide
Written by

Architect's Guide

Dedicated to sharing programmer-architect skills—Java backend, system, microservice, and distributed architectures—to help you become a senior architect.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.