Fundamentals 18 min read

How to Build a Simple JSON Parser in Java: A Step‑by‑Step Guide

This article walks through the design and implementation of a lightweight JSON parser in Java, covering the underlying lexical and syntax analysis, token definitions, core parsing algorithms, testing strategies, and a brief demonstration of JSON beautification, providing developers with a clear understanding of JSON processing fundamentals.

Programmer DD
Programmer DD
Programmer DD
How to Build a Simple JSON Parser in Java: A Step‑by‑Step Guide

Background

JSON (JavaScript Object Notation) is a lightweight data interchange format. Compared with XML, JSON offers better readability and smaller size, making it popular in web development. Developers are encouraged to understand JSON fundamentals and its parsing mechanisms.

JSON Parser Implementation Principles

A JSON parser essentially builds a state machine based on JSON grammar rules, taking a JSON string as input and producing a JSON object. The parsing process includes lexical analysis and syntax analysis.

Example JSON string:

{<br/>    "name": "小明",<br/>    "age": 18<br/>}

Lexical analysis yields tokens such as:

{, name, :, 小明, ,, age, :, 18, }

Lexical analyzer input/output diagram.

After tokenization, syntax analysis checks the token sequence against JSON grammar to ensure structural validity.

Lexical Analysis

Lexical analysis converts a JSON string into a stream of tokens according to construction rules. JSON defines the following token types:

BEGIN_OBJECT ({)

END_OBJECT (})

BEGIN_ARRAY ([)

END_ARRAY (])

NULL (null)

NUMBER

STRING

BOOLEAN (true/false)

SEP_COLON (:)

SEP_COMMA (,)

TokenType enum definition:

public enum TokenType {<br/>    BEGIN_OBJECT(1),<br/>    END_OBJECT(2),<br/>    BEGIN_ARRAY(4),<br/>    END_ARRAY(8),<br/>    NULL(16),<br/>    NUMBER(32),<br/>    STRING(64),<br/>    BOOLEAN(128),<br/>    SEP_COLON(256),<br/>    SEP_COMMA(512),<br/>    END_DOCUMENT(1024);<br/>    // constructor and getter omitted for brevity<br/>}

Token class encapsulates type and literal value:

public class Token {<br/>    private TokenType tokenType;<br/>    private String value;<br/>    // other code omitted<br/>}

CharReader reads characters from a Reader:

public class CharReader {<br/>    public char peek() throws IOException { /* ... */ }<br/>    public char next() throws IOException { /* ... */ }<br/>    public void back() { /* ... */ }<br/>    public boolean hasMore() throws IOException { /* ... */ }<br/>    // other methods omitted<br/>}

Tokenizer uses CharReader to produce a TokenList:

public class Tokenizer {<br/>    public TokenList tokenize(CharReader charReader) throws IOException { /* ... */ }<br/>    private Token start() throws IOException { /* ... */ }<br/>    // other helper methods omitted<br/>}

Key lexical rule: based on the first character, the tokenizer decides token type (e.g., '{' → BEGIN_OBJECT, 'n' → NULL, '"' → STRING, digits → NUMBER, etc.).

Syntax Analysis

Syntax analysis consumes the token list and builds JsonObject or JsonArray structures according to the grammar:

object = {} | { members }<br/>members = pair | pair , members<br/>pair = string : value<br/>array = [] | [ elements ]<br/>elements = value | value , elements<br/>value = string | number | object | array | true | false | null

JsonObject and JsonArray helper classes:

public class JsonObject {<br/>    private Map<String, Object> map = new HashMap<>();<br/>    public void put(String key, Object value) { map.put(key, value); }<br/>    public Object get(String key) { return map.get(key); }<br/>    // other methods omitted<br/>}
public class JsonArray implements Iterable {<br/>    private List list = new ArrayList();<br/>    public void add(Object obj) { list.add(obj); }<br/>    public Object get(int index) { return list.get(index); }<br/>    // other methods omitted<br/>}

Core parsing method parseJsonObject processes tokens recursively, handling objects, arrays, literals, and enforcing expected token types.

Parsing flow summary:

Read a token and verify it matches the expected type.

If valid, update the expected token set; otherwise, throw an exception.

Repeat until all tokens are consumed or an error occurs.

Example token sequence {, id, :, 1, } demonstrates how the parser transitions between expected token states.

Testing and Demonstration

Tests use a sample JSON file (e.g., music.json) to verify correctness. The article also shows a JSON beautification example with a simulated hero data image.

The beautification code is provided as a supplemental feature.

Conclusion

The article presents a simple JSON parser implementation for educational purposes, acknowledges its limitations, and invites readers to contribute improvements. Source code is available on GitHub.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

JavaJSONlexical analysisSyntax AnalysisParser
Programmer DD
Written by

Programmer DD

A tinkering programmer and author of "Spring Cloud Microservices in Action"

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.