Backend Development 14 min read

Implementing a JSON Parser in Java: Structures, Tokenization, and Parsing

This article explains the fundamentals of JSON, its object and array structures, maps JSON types to Java equivalents, and provides a complete Java implementation of a JSON parser including token definitions, lexical analysis, and object/array construction with detailed code examples.

Java Captain
Java Captain
Java Captain
Implementing a JSON Parser in Java: Structures, Tokenization, and Parsing

JSON (JavaScript Object Notation) is a lightweight, language‑independent data‑exchange format that is easy for both humans and machines to read and write. It consists of two main structures: objects (unordered name/value pairs) and arrays (ordered lists of values).

The article first shows simple JSON examples for an object and an array, then presents a table that maps JSON types to their Java counterparts (e.g., string → String , number → Long / Double , true/false → Boolean , null → null , [array] → List / Object[] , {"key":"value"} → Map<String,Object> ).

To parse JSON, the process is divided into two steps: tokenization and syntactic analysis. The tokenizer converts the raw character stream into a sequence of tokens, each represented by a Token object that stores a TokenType and the literal value.

The TokenType enum is defined as follows:

package com.json.demo.tokenizer;
/**
 * BEGIN_OBJECT({)
 * END_OBJECT(})
 * BEGIN_ARRAY([)
 * END_ARRAY(])
 * NULL(null)
 * NUMBER(数字)
 * STRING(字符串)
 * BOOLEAN(true/false)
 * SEP_COLON(:)
 * SEP_COMMA(,)
 * END_DOCUMENT(表示JSON文档结束)
 */
public enum TokenType {
    BEGIN_OBJECT(1),
    END_OBJECT(2),
    BEGIN_ARRAY(4),
    END_ARRAY(8),
    NULL(16),
    NUMBER(32),
    STRING(64),
    BOOLEAN(128),
    SEP_COLON(256),
    SEP_COMMA(512),
    END_DOCUMENT(1024);
    private int code; // each type's numeric code
    TokenType(int code) { this.code = code; }
    public int getTokenCode() { return code; }
}

The Token class stores the token type and its literal value:

package com.json.demo.tokenizer;
public class Token {
    private TokenType tokenType;
    private String value;
    public Token(TokenType tokenType, String value) { this.tokenType = tokenType; this.value = value; }
    public TokenType getTokenType() { return tokenType; }
    public void setTokenType(TokenType tokenType) { this.tokenType = tokenType; }
    public String getValue() { return value; }
    public void setValue(String value) { this.value = value; }
    @Override
    public String toString() {
        return "Token{" + "tokenType=" + tokenType + ", value='" + value + '\'' + '}';
    }
}

Reading characters efficiently is handled by ReaderChar , which buffers input and provides peek() , next() , back() , and hasMore() methods:

package com.json.demo.tokenizer;
import java.io.IOException;
import java.io.Reader;
public class ReaderChar {
    private static final int BUFFER_SIZE = 1024;
    private Reader reader;
    private char[] buffer;
    private int index;
    private int size;
    public ReaderChar(Reader reader) { this.reader = reader; buffer = new char[BUFFER_SIZE]; }
    public char peek() { if (index - 1 >= size) return (char)-1; return buffer[Math.max(0, index - 1)]; }
    public char next() throws IOException { if (!hasMore()) return (char)-1; return buffer[index++]; }
    public void back() { index = Math.max(0, --index); }
    public boolean hasMore() throws IOException { if (index < size) return true; fillBuffer(); return index < size; }
    void fillBuffer() throws IOException { int n = reader.read(buffer); if (n == -1) return; index = 0; size = n; }
}

A TokenList stores the token stream and provides navigation methods:

package com.json.demo.tokenizer;
import java.util.ArrayList;
import java.util.List;
public class TokenList {
    private List
tokens = new ArrayList<>();
    private int index = 0;
    public void add(Token token) { tokens.add(token); }
    public Token peek() { return index < tokens.size() ? tokens.get(index) : null; }
    public Token peekPrevious() { return index - 1 < 0 ? null : tokens.get(index - 2); }
    public Token next() { return tokens.get(index++); }
    public boolean hasMore() { return index < tokens.size(); }
    @Override
    public String toString() { return "TokenList{" + "tokens=" + tokens + '}'; }
}

The core tokenization logic resides in the start() method, which reads characters, skips whitespace, and returns the appropriate token based on the current character (e.g., '{' → BEGIN_OBJECT , '"' → STRING , 'n' → NULL , etc.). It also delegates to helper methods such as readString() , readNumber() , readBoolean() , and readNull() . The readString() method handles escape sequences and Unicode literals ( \uXXXX ).

private Token readString() throws IOException {
    StringBuilder sb = new StringBuilder();
    while (true) {
        char ch = readerChar.next();
        if (ch == '\\') {
            if (!isEscape()) throw new JsonParseException("Invalid escape character");
            sb.append('\\');
            ch = readerChar.peek();
            sb.append(ch);
            if (ch == 'u') {
                for (int i = 0; i < 4; i++) {
                    ch = readerChar.next();
                    if (isHex(ch)) sb.append(ch); else throw new JsonParseException("Invalid character");
                }
            }
        } else if (ch == '"') {
            return new Token(TokenType.STRING, sb.toString());
        } else if (ch == '\r' || ch == '\n') {
            throw new JsonParseException("Invalid character");
        } else {
            sb.append(ch);
        }
    }
}

After tokenization, the parser builds concrete JSON structures. JsonObject wraps a Map<String,Object> and provides put and get methods, while JsonArray wraps a List with add , get , and size methods.

public class JsonObject {
    private Map
map = new HashMap<>();
    public void put(String key, Object value) { map.put(key, value); }
    public Object get(String key) { return map.get(key); }
    // ...
}

public class JsonArray {
    private List list = new ArrayList();
    public void add(Object obj) { list.add(obj); }
    public Object get(int index) { return list.get(index); }
    public int size() { return list.size(); }
    // ...
}

The main parsing routine examines the first token to decide whether to construct a JsonObject or a JsonArray , then recursively processes nested structures using the token stream. Bitwise checks on TokenType codes improve performance. Finally, a test class can feed custom JSON strings or fetch JSON over HTTP, invoke the parser, and format the output with utility classes such as FormatUtil . The complete source code is available on GitHub at https://github.com/gyl-coder/JSON-Parser.git .

JSONData StructurestokenizationParser
Java Captain
Written by

Java Captain

Focused on Java technologies: SSM, the Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading; occasionally covers DevOps tools like Jenkins, Nexus, Docker, ELK; shares practical tech insights and is dedicated to full‑stack Java development.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.