19 min read

Inside GitHub Copilot: How Prompts, Models, and Telemetry Really Work

A University of Illinois researcher reverse‑engineers GitHub Copilot, revealing how its VS Code extension builds complex prompts, selects models, manages telemetry, and decides which suggestions to display, while exposing configuration details, prompt‑engineering pipelines, model calls, and privacy implications for developers.

21CTO

Jan 3, 2023

Inside GitHub Copilot: How Prompts, Models, and Telemetry Really Work

Reverse Engineering Copilot

To uncover the secrets inside Copilot, a researcher from the University of Illinois performed a rough reverse‑engineering of the extension and documented the findings in a blog post.

Copilot has become an indispensable coding partner for many developers. Former Tesla AI director Andrej Karpathy notes that it speeds up his coding dramatically, handling about 80% of his code with roughly 80% accuracy.

The author set out to answer specific questions about Copilot’s internal structure, including prompt format, model invocation, success‑rate measurement, data collection, and model size.

Copilot Overview

GitHub Copilot consists of two main components:

Client: The VS Code extension collects everything you type (the prompt) and sends it to a Codex‑like model, displaying the model’s response in the editor.

Model: A Codex‑style model receives the prompt and returns code completions.

Prompt Engineering

The extension builds a sophisticated prompt that includes a large amount of project‑specific information. A real prompt example shows a JSON object with a prefix containing file paths, imports, and surrounding code, a suffix with the code after the cursor, and metadata such as isFimEnabled and promptElementRanges.

Prompt generation follows several steps:

Entry point: extractPrompt(ctx, doc, insertPos) extracts the prompt based on the document and cursor position.

Retrieve the document’s relative path and language ID.

Gather up to 20 recently accessed files of the same language as “relevant documents”.

Configure options such as suffixPercent, fimSuffixLengthThreshold, and includeSiblingFunctions.

Build a “Prompt Wishlist” that prioritises elements (e.g., PathMarker, SimilarFile, BeforeCursor, etc.) and fills the token budget.

Append a suffix, possibly invoking Fill‑in‑Middle (FIM) mode when the suffix is non‑empty.

The wishlist contains six element types: BeforeCursor, AfterCursor, SimilarFile, ImportedFile, LanguageMarker, and PathMarker. Elements are sorted by priority and token budget, and options like LanguageMarkerOption or NeighboringTabsOption influence ordering.

Model Calls

Copilot offers two UI modes for completions:

Inline/GhostText : Requests 1‑3 suggestions, caches results, and may enable debouncing when the user types quickly. It only sends a request if the cursor is at a suitable position (e.g., right side character is a space or closing brace).

Copilot Panel : Requests more samples (default 10) and does not apply the contextual filter because the user explicitly invoked it.

Both modes perform two pre‑display checks: duplicate suggestions are discarded, and suggestions that the user has already typed are ignored.

Telemetry

GitHub claims that about 40% of code written in popular languages (e.g., Python) is generated by Copilot. The telemetry code records whether code snippets are sent to the server, and the author verified that telemetry does include code snippets unless the user opts out.

After a suggestion is accepted or rejected, Copilot captures a snapshot of the surrounding code 30 seconds later, extracting a “hypothetical prompt” and a “hypothetical completion”. This data may be used to improve the model, though the short time window suggests noisy data that GitHub can later clean.

Other Observations

The model used appears to be named cushman‑ml, hinting at a ~12 B‑parameter model rather than the 175 B‑parameter Codex model. This is encouraging for open‑source efforts.

The accompanying worker.js seems to provide a parallel implementation of prompt extraction.

Enabling Verbose Logging

To enable verbose logging, modify the extension’s extension.js (typically located under

~/.vscode/extensions/github.copilot-<version>/dist/extension.js

) and change the shouldLog function to always return true. A ready‑made patch is available at the provided URLs.

All referenced images illustrate the described concepts and have been retained.

{
  "prefix": "# Path: codeviz\\app.py
# Compare this snippet from codeviz\\predictions.py:
# import json
# import sys
# import time
# from manifest import Manifest
...",
  "suffix": "if __name__ == '__main__':
    app.run(debug=True)",
  "isFimEnabled": true,
  "promptElementRanges": [
    { "kind": "PathMarker", "start": 0, "end": 23 },
    { "kind": "SimilarFile", "start": 23, "end": 2219 },
    { "kind": "BeforeCursor", "start": 2219, "end": 3142 }
  ]
}

Author: Parth Thakkar Source: thakkarparth007 blog, compiled by Machine Heart team. Original link: https://thakkarparth007.github.io/copilot-explorer/posts/copilot-internals

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Prompt engineering Reverse engineering GitHub Copilot Telemetry Model Internals

Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.