Fundamentals 9 min read

Building an IntelliJ Plugin to Convert MySQL DDL into Python ORM Models Using Lexical Analysis

This article explains how to create a JetBrains IntelliJ plugin that parses MySQL DDL statements, performs lexical analysis to extract table and column information, and automatically generates corresponding Python ORM model classes, illustrating the underlying state‑machine lexer, recursive parsing logic, and sample code.

Sohu Tech Products
Sohu Tech Products
Sohu Tech Products
Building an IntelliJ Plugin to Convert MySQL DDL into Python ORM Models Using Lexical Analysis

Most of my recent development time has been spent writing Python scripts, and I often need to convert database table definitions into Python ORM Model classes. Since no ready‑made tool existed, I decided to build a small IntelliJ plugin that can read DDL statements and generate the required Python code automatically.

The plugin targets PyCharm (a JetBrains IDE) and leverages the JetBrains SDK, so the UI layer does not need extra work. The workflow is simple: import a DDL script, and the plugin produces the Python Model definitions needed for the ORM.

Example DDL input:

CREATE TABLE `user` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `userName` varchar(20) DEFAULT NULL COMMENT '用户名',
  `password` varchar(100) DEFAULT NULL COMMENT '密码',
  `roleId` int(11) DEFAULT NULL COMMENT '角色ID',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=7 DEFAULT CHARSET=utf8;

The plugin generates the following Python code:

class User(db.Model):
    __tablename__ = 'user'
    id = db.Column(db.Integer, primary_key=True, autoincrement=True)
    userName = db.Column(db.String)  # 用户名
    password = db.Column(db.String)  # 密码
    roleId = db.Column(db.Integer)   # 角色ID

To achieve this, the core problem is lexical analysis: recognizing the table name, each column’s name, type, length, and comment, and identifying the primary key. This is essentially the same task performed by a MySQL server.

Before parsing full DDL, I demonstrate a simple script parsing example:

x = 20

The lexer splits the input into three token types: VAR (variable name), GE (assignment operator), and VAL (value). The state machine starts in an initial state, switches to VAR when encountering letters, to GE when encountering '=', and returns to the initial state otherwise.

Core Java lexer code (simplified):

public class Result {
    public TokenType tokenType;
    public StringBuilder text = new StringBuilder();
}

Token types are defined as an enum:

public enum TokenType {
    INIT,
    VAR,
    GE,
    VAL,
    FI,
    P_K,
    P_K_V
}

When parsing DDL, the lexer first extracts the table name after the CREATE TABLE keyword, then processes each field definition delimited by backticks and commas. Primary keys are detected by the PRIMARY KEY clause.

Recursive parsing is used for field definitions: when a backtick '`' is encountered, the state changes to FI (FieldInfo); upon reaching a comma, the current field string is recursively parsed to extract name, type, length, and comment. A pid field is added to the result to link child fields to their parent during recursion.

Parsing primary key follows a similar pattern, recognizing the P token and the surrounding parentheses.

After the lexer and recursive parser are wrapped into a service, the plugin can expose a simple API that returns the parsed table structure, ready for code generation.

In summary, the article walks through building a small but functional lexical analyzer and recursive parser for MySQL DDL, integrating it into an IntelliJ plugin, and generating Python ORM models automatically. While the feature is modest, it touches on fundamental compiler concepts such as tokenization, state transitions, and recursive descent parsing.

code generationPythonSQLORMIntelliJ PluginDDLlexical analysis
Sohu Tech Products
Written by

Sohu Tech Products

A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.