Backend Development 14 min read

Implementing a JSON Parser in Java: Structures, Tokenization, and Parsing

This article explains the fundamentals of JSON, its object and array structures, maps JSON types to Java equivalents, and provides a complete Java implementation of a JSON parser including token definitions, lexical analysis, and object/array construction with detailed code examples.

Java Captain

Feb 17, 2019

Implementing a JSON Parser in Java: Structures, Tokenization, and Parsing

JSON (JavaScript Object Notation) is a lightweight, language‑independent data‑exchange format that is easy for both humans and machines to read and write. It consists of two main structures: objects (unordered name/value pairs) and arrays (ordered lists of values).

The article first shows simple JSON examples for an object and an array, then presents a table that maps JSON types to their Java counterparts (e.g., string → String, number → Long / Double, true/false → Boolean, null → null, [array] → List / Object[], {"key":"value"} → Map<String,Object>).

To parse JSON, the process is divided into two steps: tokenization and syntactic analysis. The tokenizer converts the raw character stream into a sequence of tokens, each represented by a Token object that stores a TokenType and the literal value.

The TokenType enum is defined as follows:

package com.json.demo.tokenizer;
/**
 * BEGIN_OBJECT（{）
 * END_OBJECT（}）
 * BEGIN_ARRAY（[）
 * END_ARRAY（]）
 * NULL（null）
 * NUMBER（数字）
 * STRING（字符串）
 * BOOLEAN（true/false）
 * SEP_COLON（:）
 * SEP_COMMA（,）
 * END_DOCUMENT（表示JSON文档结束）
 */
public enum TokenType {
    BEGIN_OBJECT(1),
    END_OBJECT(2),
    BEGIN_ARRAY(4),
    END_ARRAY(8),
    NULL(16),
    NUMBER(32),
    STRING(64),
    BOOLEAN(128),
    SEP_COLON(256),
    SEP_COMMA(512),
    END_DOCUMENT(1024);
    private int code; // each type's numeric code
    TokenType(int code) { this.code = code; }
    public int getTokenCode() { return code; }
}

The Token class stores the token type and its literal value:

package com.json.demo.tokenizer;
public class Token {
    private TokenType tokenType;
    private String value;
    public Token(TokenType tokenType, String value) { this.tokenType = tokenType; this.value = value; }
    public TokenType getTokenType() { return tokenType; }
    public void setTokenType(TokenType tokenType) { this.tokenType = tokenType; }
    public String getValue() { return value; }
    public void setValue(String value) { this.value = value; }
    @Override
    public String toString() {
        return "Token{" + "tokenType=" + tokenType + ", value='" + value + '\'' + '}';
    }
}

Reading characters efficiently is handled by ReaderChar, which buffers input and provides peek(), next(), back(), and hasMore() methods:

package com.json.demo.tokenizer;
import java.io.IOException;
import java.io.Reader;
public class ReaderChar {
    private static final int BUFFER_SIZE = 1024;
    private Reader reader;
    private char[] buffer;
    private int index;
    private int size;
    public ReaderChar(Reader reader) { this.reader = reader; buffer = new char[BUFFER_SIZE]; }
    public char peek() { if (index - 1 >= size) return (char)-1; return buffer[Math.max(0, index - 1)]; }
    public char next() throws IOException { if (!hasMore()) return (char)-1; return buffer[index++]; }
    public void back() { index = Math.max(0, --index); }
    public boolean hasMore() throws IOException { if (index < size) return true; fillBuffer(); return index < size; }
    void fillBuffer() throws IOException { int n = reader.read(buffer); if (n == -1) return; index = 0; size = n; }
}

A TokenList stores the token stream and provides navigation methods:

package com.json.demo.tokenizer;
import java.util.ArrayList;
import java.util.List;
public class TokenList {
    private List<Token> tokens = new ArrayList<>();
    private int index = 0;
    public void add(Token token) { tokens.add(token); }
    public Token peek() { return index < tokens.size() ? tokens.get(index) : null; }
    public Token peekPrevious() { return index - 1 < 0 ? null : tokens.get(index - 2); }
    public Token next() { return tokens.get(index++); }
    public boolean hasMore() { return index < tokens.size(); }
    @Override
    public String toString() { return "TokenList{" + "tokens=" + tokens + '}'; }
}

The core tokenization logic resides in the start() method, which reads characters, skips whitespace, and returns the appropriate token based on the current character (e.g., '{' → BEGIN_OBJECT, '"' → STRING, 'n' → NULL, etc.). It also delegates to helper methods such as readString(), readNumber(), readBoolean(), and readNull(). The readString() method handles escape sequences and Unicode literals ( \uXXXX).

private Token readString() throws IOException {
    StringBuilder sb = new StringBuilder();
    while (true) {
        char ch = readerChar.next();
        if (ch == '\\') {
            if (!isEscape()) throw new JsonParseException("Invalid escape character");
            sb.append('\\');
            ch = readerChar.peek();
            sb.append(ch);
            if (ch == 'u') {
                for (int i = 0; i < 4; i++) {
                    ch = readerChar.next();
                    if (isHex(ch)) sb.append(ch); else throw new JsonParseException("Invalid character");
                }
            }
        } else if (ch == '"') {
            return new Token(TokenType.STRING, sb.toString());
        } else if (ch == '\r' || ch == '
') {
            throw new JsonParseException("Invalid character");
        } else {
            sb.append(ch);
        }
    }
}

After tokenization, the parser builds concrete JSON structures. JsonObject wraps a Map<String,Object> and provides put and get methods, while JsonArray wraps a List with add, get, and size methods.

public class JsonObject {
    private Map<String, Object> map = new HashMap<>();
    public void put(String key, Object value) { map.put(key, value); }
    public Object get(String key) { return map.get(key); }
    // ...
}

public class JsonArray {
    private List list = new ArrayList();
    public void add(Object obj) { list.add(obj); }
    public Object get(int index) { return list.get(index); }
    public int size() { return list.size(); }
    // ...
}

The main parsing routine examines the first token to decide whether to construct a JsonObject or a JsonArray, then recursively processes nested structures using the token stream. Bitwise checks on TokenType codes improve performance. Finally, a test class can feed custom JSON strings or fetch JSON over HTTP, invoke the parser, and format the output with utility classes such as FormatUtil. The complete source code is available on GitHub at https://github.com/gyl-coder/JSON-Parser.git .

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

java JSON tokenization Parser

Written by

Java Captain

Focused on Java technologies: SSM, the Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading; occasionally covers DevOps tools like Jenkins, Nexus, Docker, ELK; shares practical tech insights and is dedicated to full‑stack Java development.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.