How VS Code’s Semantic Tokens API Boosts Syntax Highlighting Performance
This article explains VS Code’s evolution from TextMate‑based syntax highlighting to the new Semantic Tokens Provider API, detailing performance improvements, underlying token encoding, and how language servers implement semantic highlighting for faster, more accurate code coloring.
VS Code 1.44, released in March 2020, introduced many features such as accessibility and Remote Development, but the most exciting for many developers was the Semantic Tokens Provider API, which later became fully supported in version 1.45 and beyond.
Syntax highlighting
VS Code traditionally uses TextMate grammars, which are collections of regular expressions that split code into tokens and assign scopes. While this works well for most users, the oniguruma regex engine used for single‑line matching can be slow on large minified files (e.g.,
*.min.js,
*.min.css), causing the editor to freeze. VS Code now offers an option to stop highlighting files with excessively long lines.
In version 1.45 the oniguruma memory usage was optimized, improving overall highlighting speed threefold. However, highlighting minified files became four times slower, so it is recommended to disable syntax highlighting for such files.
Performance comparison for version 1.45:
File Size Lines Before After Difference
checker.ts 2.2MB 38127 15990 5551 2.9x faster
sqlite3.c 7.7MB 228634 23241 12189 1.9x faster
bootstrap.min.css 152.1KB 7 1653 6900 4.1x slower
cpp.tmLanguage.json 477KB 16221 416 139 3x fasterImages show the minimap rendering speed in versions 1.44, 1.45, and 1.53, illustrating the noticeable improvement.
Semantic Highlighting
Since VS Code became open source in 2015, the community has advocated for semantic highlighting, which relies on language‑specific symbol tables rather than regular expressions. Implementing it requires a language configuration plugin and a Language Server Protocol (LSP) based language server that can provide token information after full project analysis.
Before official support, extensions like
vscode-cclsadded a custom request
$ccls/publishSemanticHighlightto retrieve semantic information from the ccls service and apply custom colors.
Semantic Tokens Provider API
The Semantic Tokens API, now part of LSP 3.16, defines how language servers should return token data. Extensions register a
DocumentSemanticTokensProvidervia
registerDocumentSemanticTokensProviderand supply a
SemanticTokensLegenddescribing supported token types and modifiers.
The provider implements three core members:
provideDocumentSemanticTokens – returns encoded tokens for the whole document.
provideDocumentSemanticTokensEdits – supplies incremental updates.
onDidChangeSemanticTokens – event fired when tokens change.
Each token consists of five properties (line, startChar, length, tokenType, tokenModifiers). To reduce memory usage, VS Code encodes these into a compact integer array. Example legend:
<code>{
tokenTypes: ['property', 'type', 'class'],
tokenModifiers: ['private', 'static']
}</code>Sample token list before encoding:
<code>[
{ line: 2, startChar: 5, length: 3, tokenType: "property", tokenModifiers: ["private","static"] },
{ line: 2, startChar: 10, length: 4, tokenType: "type", tokenModifiers: [] },
{ line: 5, startChar: 2, length: 7, tokenType: "class", tokenModifiers: [] }
]</code>Encoding converts token types to indices and combines modifiers into a bitmask, then stores only the numbers in a flat array, e.g.:
<code>[2,5,3,0,3, 0,5,4,1,0, 3,2,7,2,0]</code>Relative positioning (deltaLine, deltaStartChar) further reduces updates when lines are inserted, because only the first token’s deltaLine changes.
Decoding of modifiers uses a reverse‑lookup function:
<code>const tokenModifiers = ['method','interface','async','static','class'];
function decodeModifiers(res) {
return tokenModifiers.filter((_, m) => res & (1 << m));
}
// decodeModifiers(12) // ['async','static']
</code>VS Code also offers
DocumentRangeSemanticTokensProviderfor highlighting specific ranges, but once the full document provider is active, the range provider is ignored.
If a language registers both DocumentRangeSemanticTokensProvider and DocumentSemanticTokensProvider, the range provider is called only once; subsequent highlighting uses the document provider.
Conclusion
Since its initial LSP proposal in 2016, semantic token support has grown to hundreds of language implementations across editors such as Eclipse, Sublime, Atom, Theia, Vim, and Emacs. Implementing LSP‑based semantic highlighting is now a standard practice for building modern IDEs and language extensions.
References
2016-06 – Community Feature Request for LSP semantic highlighting support.
2018-08 – Theia proposal for semantic highlighting protocol extension.
2019-12 – VS Code announces Semantic Tokens API.
2020-01 – VS Code adds Semantic Tokens API implementation.
2020-03 – VS Code 1.44 officially supports Semantic Tokens API.
2020 – LSP 3.16 release adds Semantic Tokens protocol details.
Taobao Frontend Technology
The frontend landscape is constantly evolving, with rapid innovations across familiar languages. Like us, your understanding of the frontend is continually refreshed. Join us on Taobao, a vibrant, all‑encompassing platform, to uncover limitless potential.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.