Implementing a JSON Parser in Java: Structures, Tokenization, and Parsing
This article explains the fundamentals of JSON, its object and array structures, maps JSON types to Java equivalents, and provides a complete Java implementation of a JSON parser including token definitions, lexical analysis, and object/array construction with detailed code examples.
JSON (JavaScript Object Notation) is a lightweight, language‑independent data‑exchange format that is easy for both humans and machines to read and write. It consists of two main structures: objects (unordered name/value pairs) and arrays (ordered lists of values).
The article first shows simple JSON examples for an object and an array, then presents a table that maps JSON types to their Java counterparts (e.g., string → String , number → Long / Double , true/false → Boolean , null → null , [array] → List / Object[] , {"key":"value"} → Map<String,Object> ).
To parse JSON, the process is divided into two steps: tokenization and syntactic analysis. The tokenizer converts the raw character stream into a sequence of tokens, each represented by a Token object that stores a TokenType and the literal value.
The TokenType enum is defined as follows:
package com.json.demo.tokenizer;
/**
* BEGIN_OBJECT({)
* END_OBJECT(})
* BEGIN_ARRAY([)
* END_ARRAY(])
* NULL(null)
* NUMBER(数字)
* STRING(字符串)
* BOOLEAN(true/false)
* SEP_COLON(:)
* SEP_COMMA(,)
* END_DOCUMENT(表示JSON文档结束)
*/
public enum TokenType {
BEGIN_OBJECT(1),
END_OBJECT(2),
BEGIN_ARRAY(4),
END_ARRAY(8),
NULL(16),
NUMBER(32),
STRING(64),
BOOLEAN(128),
SEP_COLON(256),
SEP_COMMA(512),
END_DOCUMENT(1024);
private int code; // each type's numeric code
TokenType(int code) { this.code = code; }
public int getTokenCode() { return code; }
}The Token class stores the token type and its literal value:
package com.json.demo.tokenizer;
public class Token {
private TokenType tokenType;
private String value;
public Token(TokenType tokenType, String value) { this.tokenType = tokenType; this.value = value; }
public TokenType getTokenType() { return tokenType; }
public void setTokenType(TokenType tokenType) { this.tokenType = tokenType; }
public String getValue() { return value; }
public void setValue(String value) { this.value = value; }
@Override
public String toString() {
return "Token{" + "tokenType=" + tokenType + ", value='" + value + '\'' + '}';
}
}Reading characters efficiently is handled by ReaderChar , which buffers input and provides peek() , next() , back() , and hasMore() methods:
package com.json.demo.tokenizer;
import java.io.IOException;
import java.io.Reader;
public class ReaderChar {
private static final int BUFFER_SIZE = 1024;
private Reader reader;
private char[] buffer;
private int index;
private int size;
public ReaderChar(Reader reader) { this.reader = reader; buffer = new char[BUFFER_SIZE]; }
public char peek() { if (index - 1 >= size) return (char)-1; return buffer[Math.max(0, index - 1)]; }
public char next() throws IOException { if (!hasMore()) return (char)-1; return buffer[index++]; }
public void back() { index = Math.max(0, --index); }
public boolean hasMore() throws IOException { if (index < size) return true; fillBuffer(); return index < size; }
void fillBuffer() throws IOException { int n = reader.read(buffer); if (n == -1) return; index = 0; size = n; }
}A TokenList stores the token stream and provides navigation methods:
package com.json.demo.tokenizer;
import java.util.ArrayList;
import java.util.List;
public class TokenList {
private List
tokens = new ArrayList<>();
private int index = 0;
public void add(Token token) { tokens.add(token); }
public Token peek() { return index < tokens.size() ? tokens.get(index) : null; }
public Token peekPrevious() { return index - 1 < 0 ? null : tokens.get(index - 2); }
public Token next() { return tokens.get(index++); }
public boolean hasMore() { return index < tokens.size(); }
@Override
public String toString() { return "TokenList{" + "tokens=" + tokens + '}'; }
}The core tokenization logic resides in the start() method, which reads characters, skips whitespace, and returns the appropriate token based on the current character (e.g., '{' → BEGIN_OBJECT , '"' → STRING , 'n' → NULL , etc.). It also delegates to helper methods such as readString() , readNumber() , readBoolean() , and readNull() . The readString() method handles escape sequences and Unicode literals ( \uXXXX ).
private Token readString() throws IOException {
StringBuilder sb = new StringBuilder();
while (true) {
char ch = readerChar.next();
if (ch == '\\') {
if (!isEscape()) throw new JsonParseException("Invalid escape character");
sb.append('\\');
ch = readerChar.peek();
sb.append(ch);
if (ch == 'u') {
for (int i = 0; i < 4; i++) {
ch = readerChar.next();
if (isHex(ch)) sb.append(ch); else throw new JsonParseException("Invalid character");
}
}
} else if (ch == '"') {
return new Token(TokenType.STRING, sb.toString());
} else if (ch == '\r' || ch == '\n') {
throw new JsonParseException("Invalid character");
} else {
sb.append(ch);
}
}
}After tokenization, the parser builds concrete JSON structures. JsonObject wraps a Map<String,Object> and provides put and get methods, while JsonArray wraps a List with add , get , and size methods.
public class JsonObject {
private Map
map = new HashMap<>();
public void put(String key, Object value) { map.put(key, value); }
public Object get(String key) { return map.get(key); }
// ...
}
public class JsonArray {
private List list = new ArrayList();
public void add(Object obj) { list.add(obj); }
public Object get(int index) { return list.get(index); }
public int size() { return list.size(); }
// ...
}The main parsing routine examines the first token to decide whether to construct a JsonObject or a JsonArray , then recursively processes nested structures using the token stream. Bitwise checks on TokenType codes improve performance. Finally, a test class can feed custom JSON strings or fetch JSON over HTTP, invoke the parser, and format the output with utility classes such as FormatUtil . The complete source code is available on GitHub at https://github.com/gyl-coder/JSON-Parser.git .
Java Captain
Focused on Java technologies: SSM, the Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading; occasionally covers DevOps tools like Jenkins, Nexus, Docker, ELK; shares practical tech insights and is dedicated to full‑stack Java development.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.