Frontend Development 14 min read

How VS Code’s Semantic Tokens API Boosts Syntax Highlighting Performance

This article explains VS Code’s evolution from TextMate‑based syntax highlighting to the new Semantic Tokens Provider API, detailing performance improvements, underlying token encoding, and how language servers implement semantic highlighting for faster, more accurate code coloring.

Taobao Frontend Technology

Mar 19, 2021

VS Code 1.44, released in March 2020, introduced many features such as accessibility and Remote Development, but the most exciting for many developers was the Semantic Tokens Provider API, which later became fully supported in version 1.45 and beyond.

Syntax highlighting

VS Code traditionally uses TextMate grammars, which are collections of regular expressions that split code into tokens and assign scopes. While this works well for most users, the oniguruma regex engine used for single‑line matching can be slow on large minified files (e.g., *.min.js, *.min.css), causing the editor to freeze. VS Code now offers an option to stop highlighting files with excessively long lines.

In version 1.45 the oniguruma memory usage was optimized, improving overall highlighting speed threefold. However, highlighting minified files became four times slower, so it is recommended to disable syntax highlighting for such files.

Performance comparison for version 1.45:

File                Size    Lines   Before   After   Difference
checker.ts          2.2MB   38127   15990    5551    2.9x faster
sqlite3.c           7.7MB   228634  23241    12189   1.9x faster
bootstrap.min.css   152.1KB 7       1653     6900    4.1x slower
cpp.tmLanguage.json 477KB   16221   416      139     3x faster

Images show the minimap rendering speed in versions 1.44, 1.45, and 1.53, illustrating the noticeable improvement.

Semantic Highlighting

Since VS Code became open source in 2015, the community has advocated for semantic highlighting, which relies on language‑specific symbol tables rather than regular expressions. Implementing it requires a language configuration plugin and a Language Server Protocol (LSP) based language server that can provide token information after full project analysis.

Before official support, extensions like vscode-ccls added a custom request $ccls/publishSemanticHighlight to retrieve semantic information from the ccls service and apply custom colors.

Semantic Tokens Provider API

The Semantic Tokens API, now part of LSP 3.16, defines how language servers should return token data. Extensions register a DocumentSemanticTokensProvider via registerDocumentSemanticTokensProvider and supply a SemanticTokensLegend describing supported token types and modifiers.

The provider implements three core members:

provideDocumentSemanticTokens – returns encoded tokens for the whole document.

provideDocumentSemanticTokensEdits – supplies incremental updates.

onDidChangeSemanticTokens – event fired when tokens change.

Each token consists of five properties (line, startChar, length, tokenType, tokenModifiers). To reduce memory usage, VS Code encodes these into a compact integer array. Example legend:

{
  tokenTypes: ['property', 'type', 'class'],
  tokenModifiers: ['private', 'static']
}

Sample token list before encoding:

[
  { line: 2, startChar: 5, length: 3, tokenType: "property", tokenModifiers: ["private","static"] },
  { line: 2, startChar: 10, length: 4, tokenType: "type", tokenModifiers: [] },
  { line: 5, startChar: 2, length: 7, tokenType: "class", tokenModifiers: [] }
]

Encoding converts token types to indices and combines modifiers into a bitmask, then stores only the numbers in a flat array, e.g.: [2,5,3,0,3, 0,5,4,1,0, 3,2,7,2,0] Relative positioning (deltaLine, deltaStartChar) further reduces updates when lines are inserted, because only the first token’s deltaLine changes.

Decoding of modifiers uses a reverse‑lookup function:

const tokenModifiers = ['method','interface','async','static','class'];
function decodeModifiers(res) {
  return tokenModifiers.filter((_, m) => res & (1 << m));
}
// decodeModifiers(12) // ['async','static']

VS Code also offers DocumentRangeSemanticTokensProvider for highlighting specific ranges, but once the full document provider is active, the range provider is ignored.

If a language registers both DocumentRangeSemanticTokensProvider and DocumentSemanticTokensProvider, the range provider is called only once; subsequent highlighting uses the document provider.

Conclusion

Since its initial LSP proposal in 2016, semantic token support has grown to hundreds of language implementations across editors such as Eclipse, Sublime, Atom, Theia, Vim, and Emacs. Implementing LSP‑based semantic highlighting is now a standard practice for building modern IDEs and language extensions.

References

2016-06 – Community Feature Request for LSP semantic highlighting support.

2018-08 – Theia proposal for semantic highlighting protocol extension.

2019-12 – VS Code announces Semantic Tokens API.

2020-01 – VS Code adds Semantic Tokens API implementation.

2020-03 – VS Code 1.44 officially supports Semantic Tokens API.

2020 – LSP 3.16 release adds Semantic Tokens protocol details.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

VS Code Semantic Tokens Language Server Protocol syntax highlighting Editor Performance

Written by

Taobao Frontend Technology

The frontend landscape is constantly evolving, with rapid innovations across familiar languages. Like us, your understanding of the frontend is continually refreshed. Join us on Taobao, a vibrant, all‑encompassing platform, to uncover limitless potential.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.