Backend Development 40 min read

From Demo to Production: Building a Secure, Scalable Text‑to‑SQL Service with Spring AI Alibaba

This article explains how to turn a simple Text‑to‑SQL demo into a production‑grade service by covering the underlying principles, layered architecture, risk‑control mechanisms, multi‑tenant security, high‑concurrency strategies, caching, observability, and deployment practices using Spring AI Alibaba.

Ray's Galactic Tech

Apr 9, 2026

From Demo to Production: Building a Secure, Scalable Text‑to‑SQL Service with Spring AI Alibaba

Why a Text‑to‑SQL demo works but production fails

In a demo you can simply embed the schema in the prompt, let a large model generate a SQL statement and execute it. In production you must add stability, security, auditability, scalability and high‑concurrency handling.

Typical business scenarios and goals

Analysts, operators, customer‑service agents and product managers ask natural‑language questions such as:

"Show me order count and payment amount for the last 7 days"

"Top 10 products with highest refund rate in East China this month"

"Compare conversion rate of new users between last week and this week"

"List complaints of a specific customer in the past six months"

Business goals: lower the SQL barrier, shorten data‑access cycles and enable self‑service analytics. Technical goals: multi‑tenant support, high throughput, strict data‑access controls.

Core architecture layers

API Gateway – authentication, rate‑limiting, tracing.

Query Orchestrator – loads tenant & user context, builds prompts, calls the LLM, validates & rewrites SQL, evaluates risk, routes to execution or audit.

Schema Service – provides physical metadata and semantic business metadata (aliases, metric definitions, sensitivity tags).

LLM Gateway – encapsulates Spring AI Alibaba model calls with retries, circuit‑breakers and timeout handling.

SQL Validator – parses generated SQL, enforces SELECT‑only, whitelist tables/columns, adds LIMIT, removes unsafe ORDER BY, rewrites for safety.

Risk Engine – scores queries on semantic risk, data‑security risk, execution cost and compliance, decides whether to execute directly or send to audit.

Audit Workflow – asynchronous task queue for high‑risk queries, stores original NL, generated SQL, rewritten SQL, risk reasons and execution plan.

Execution Engine – runs sanitized SQL against read‑only replicas, enforces timeout and row limits, applies result‑level masking for sensitive fields.

Security and risk controls

Only SELECT statements are allowed.

Table and column whitelist based on tenant permissions.

Field‑level sensitivity tags; sensitive fields must be masked.

Mandatory LIMIT 200 and query timeout.

Automatic time‑range requirement for large tables.

Risk levels (LOW, MEDIUM, HIGH, CRITICAL) trigger manual review for HIGH/CRITICAL.

High‑concurrency design

Three‑tier caching:

Semantic cache – NL → structured query plan.

SQL cache – plan → safe SQL.

Result cache – SQL + parameters → query result.

Rate‑limiters at the gateway, model‑call and database‑execution layers protect against overload. Bulkheads and circuit‑breakers isolate slow or failing components.

Observability and monitoring

Metrics collected via Micrometer and exported to Prometheus:

Model calls: count, latency, error rate, token cost.

Query success/failure, risk trigger count.

Execution time, slow‑query count, audit statistics.

Logs shipped to Loki/ELK with TraceId for end‑to‑end tracing.

Production request flow

User submits natural‑language query.

API Gateway performs authentication, authorization and rate‑limiting.

Query Orchestrator loads tenant, user and session context.

Schema Service returns tables, columns, metrics and aliases visible to the user.

Prompt Builder creates a controlled prompt.

LLM Gateway calls Spring AI Alibaba to generate a structured query plan (JSON).

SQL Validator parses the plan, checks syntax, enforces whitelist and rewrites (adds LIMIT, removes unsafe ORDER BY).

Risk Engine evaluates the sanitized SQL and produces a RiskDecision.

If risk is HIGH/CRITICAL, the request is sent to the asynchronous Audit Workflow and a task ID is returned.

Otherwise the Execution Engine runs the SQL on a read‑only replica.

Result Processor masks sensitive fields, trims rows and formats the response.

Response is returned to the user and audit logs & metrics are recorded.

Semantic understanding layer

The model first outputs a structured intent instead of raw SQL. Example JSON:

{
  "intent": "METRIC_QUERY",
  "subject": "customer",
  "metrics": ["repurchase_rate"],
  "dimensions": ["customer_level"],
  "filters": [{"field": "order_status", "operator": "=", "value": "PAID"}],
  "timeRange": {"type": "LAST_N_DAYS", "value": 30}
}

This makes the downstream validation and risk checks deterministic.

Schema grounding

Real database metadata is injected into the prompt. Recommended Java record definitions for metadata:

public record TableMeta(
    String tableName,
    String tableComment,
    List<ColumnMeta> columns,
    List<String> primaryKeys,
    List<ForeignKeyMeta> foreignKeys,
    String domain,
    boolean queryable) {}

public record ColumnMeta(
    String columnName,
    String columnType,
    String comment,
    boolean sensitive,
    boolean filterable,
    boolean sortable,
    List<String> aliases) {}

public record MetricMeta(
    String metricCode,
    String metricName,
    String definition,
    String sqlExpression,
    String domain) {}

Maintain both physical metadata (tables, columns, PK/FK) and semantic metadata (business aliases, metric definitions, sensitivity tags).

Controlled generation (Prompt design)

A prompt template that forces the model to output JSON and restricts the allowed SQL constructs:

@Component
public class PromptBuilder {
    public String build(SqlGenerationCommand command, SchemaContext schemaContext) {
        return """
            你是企业级数据分析系统中的 SQL 生成助手。
            任务要求：
            1. 仅基于给定 Schema 生成查询语句
            2. 只能输出 SELECT 语句
            3. 禁止使用未提供的表和字段
            4. 如语义不明确，采用最保守口径
            5. 默认返回 JSON，字段包括 summary、tables、columns、filters、sql
            6. 如超出可用 Schema，返回 error 字段
            安全规则：
            - 禁止 UPDATE、DELETE、DROP、ALTER、TRUNCATE
            - 默认添加 LIMIT 200
            - 敏感字段必须标记 sensitive=true
            业务口径：%s
            可用 Schema：%s
            示例：%s
            用户问题：%s
            """.formatted(
                schemaContext.metricDefinitionsMarkdown(),
                schemaContext.schemaMarkdown(),
                schemaContext.fewShotExamplesMarkdown(),
                command.query());
    }
}

Spring AI Alibaba integration

Dependency snippet (Maven):

<dependencies>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-web</artifactId>
    </dependency>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-validation</artifactId>
    </dependency>
    <dependency>
        <groupId>com.github.jsqlparser</groupId>
        <artifactId>jsqlparser</artifactId>
        <version>4.9</version>
    </dependency>
    <dependency>
        <groupId>io.github.resilience4j</groupId>
        <artifactId>resilience4j-spring-boot3</artifactId>
    </dependency>
    <dependency>
        <groupId>com.alibaba.cloud.ai</groupId>
        <artifactId>spring-ai-alibaba-starter</artifactId>
    </dependency>
</dependencies>

Model bean configuration (Qwen example):

@Configuration
public class AiModelConfiguration {
    @Bean
    public QwenChatModel qwenChatModel(QwenApi qwenApi) {
        QwenChatOptions options = QwenChatOptions.builder()
                .model("qwen-max")
                .temperature(0.1)
                .topP(0.8)
                .maxTokens(1500)
                .build();
        return new QwenChatModel(qwenApi, options);
    }
}

LLM Gateway

Encapsulates model calls with Resilience4j annotations:

@Service
@Slf4j
public class LlmGateway {
    private final QwenChatModel chatModel;
    private final ObjectMapper objectMapper;

    public LlmGateway(QwenChatModel chatModel, ObjectMapper objectMapper) {
        this.chatModel = chatModel;
        this.objectMapper = objectMapper;
    }

    @TimeLimiter(name = "llmGateway")
    @Retry(name = "llmGateway")
    @CircuitBreaker(name = "llmGateway", fallbackMethod = "fallback")
    public SqlPlan generatePlan(String prompt) {
        Prompt llmPrompt = new Prompt(prompt);
        ChatResponse response = chatModel.call(llmPrompt);
        String content = response.getResult().getOutput().getContent();
        try {
            return objectMapper.readValue(content, SqlPlan.class);
        } catch (Exception ex) {
            throw new IllegalStateException("模型输出无法解析为 SqlPlan: " + content, ex);
        }
    }

    public SqlPlan fallback(String prompt, Throwable throwable) {
        log.warn("LLM gateway fallback triggered, cause={}", throwable.getMessage());
        throw new IllegalStateException("模型服务暂不可用，请稍后重试");
    }
}

Query Orchestrator Service

@Service
@Slf4j
public class QueryOrchestratorService {
    private final SchemaService schemaService;
    private final PromptBuilder promptBuilder;
    private final LlmGateway llmGateway;
    private final SqlValidator sqlValidator;
    private final SqlRiskEngine sqlRiskEngine;
    private final AuditWorkflowService auditWorkflowService;
    private final QueryExecutionService queryExecutionService;
    private final QueryCacheService queryCacheService;

    public QueryOrchestratorService(SchemaService schemaService,
                                   PromptBuilder promptBuilder,
                                   LlmGateway llmGateway,
                                   SqlValidator sqlValidator,
                                   SqlRiskEngine sqlRiskEngine,
                                   AuditWorkflowService auditWorkflowService,
                                   QueryExecutionService queryExecutionService,
                                   QueryCacheService queryCacheService) {
        this.schemaService = schemaService;
        this.promptBuilder = promptBuilder;
        this.llmGateway = llmGateway;
        this.sqlValidator = sqlValidator;
        this.sqlRiskEngine = sqlRiskEngine;
        this.auditWorkflowService = auditWorkflowService;
        this.queryExecutionService = queryExecutionService;
        this.queryCacheService = queryCacheService;
    }

    public QueryResponse handle(SqlGenerationCommand command) {
        long start = System.currentTimeMillis();
        QueryResponse cached = queryCacheService.get(command);
        if (cached != null) {
            return cached.withSource("CACHE");
        }
        SchemaContext schemaContext = schemaService.loadContext(command.tenantId(), command.userId(), command.roles(), command.query());
        String prompt = promptBuilder.build(command, schemaContext);
        SqlPlan plan = llmGateway.generatePlan(prompt);
        SanitizedSql sanitizedSql = sqlValidator.validateAndRewrite(plan.sql(), schemaContext);
        RiskDecision riskDecision = sqlRiskEngine.evaluate(sanitizedSql, schemaContext, command);
        if (riskDecision.needAudit()) {
            String taskId = auditWorkflowService.submit(command, sanitizedSql, riskDecision);
            return QueryResponse.pending(taskId, riskDecision.level(), System.currentTimeMillis() - start);
        }
        QueryResult result = queryExecutionService.execute(command.tenantId(), command.userId(), sanitizedSql.sql(), sanitizedSql.maxRows());
        QueryResponse response = QueryResponse.success(sanitizedSql.sql(), riskDecision.level(), result, System.currentTimeMillis() - start);
        queryCacheService.put(command, response);
        return response;
    }
}

SQL validation and rewrite

@Service
public class SqlValidator {
    public SanitizedSql validateAndRewrite(String originalSql, SchemaContext schemaContext) {
        Statement statement;
        try {
            statement = CCJSqlParserUtil.parse(originalSql);
        } catch (JSQLParserException ex) {
            throw new IllegalArgumentException("SQL 语法错误: " + originalSql, ex);
        }
        if (!(statement instanceof Select select)) {
            throw new IllegalArgumentException("仅允许 SELECT 查询");
        }
        Set<String> tables = SqlAstUtils.extractTables(select);
        if (!schemaContext.allowedTables().containsAll(tables)) {
            throw new IllegalArgumentException("SQL 包含未授权表: " + tables);
        }
        Set<String> columns = SqlAstUtils.extractColumns(select);
        schemaContext.ensureColumnsAllowed(columns);
        Select rewritten = SqlRewriteUtils.ensureLimit(select, 200);
        rewritten = SqlRewriteUtils.ensureOrderBySafe(rewritten);
        return new SanitizedSql(rewritten.toString(), 200);
    }
}

Risk engine

@Service
public class SqlRiskEngine {
    public RiskDecision evaluate(SanitizedSql sql, SchemaContext schemaContext, SqlGenerationCommand command) {
        int score = 0;
        List<String> reasons = new ArrayList<>();
        if (SqlRiskRules.containsSensitiveColumns(sql.sql(), schemaContext)) {
            score += 40;
            reasons.add("包含敏感字段");
        }
        if (SqlRiskRules.missingTimeRangeOnLargeTable(sql.sql(), schemaContext)) {
            score += 30;
            reasons.add("大表缺少时间范围");
        }
        if (SqlRiskRules.hasTooManyJoins(sql.sql(), 3)) {
            score += 20;
            reasons.add("Join 数量过多");
        }
        if (SqlRiskRules.mayReturnTooManyRows(sql.sql(), sql.maxRows())) {
            score += 10;
            reasons.add("返回结果可能过大");
        }
        RiskLevel level = mapScore(score);
        boolean needAudit = level == RiskLevel.HIGH || level == RiskLevel.CRITICAL;
        return new RiskDecision(level, needAudit, reasons);
    }

    private RiskLevel mapScore(int score) {
        if (score >= 70) return RiskLevel.CRITICAL;
        if (score >= 40) return RiskLevel.HIGH;
        if (score >= 20) return RiskLevel.MEDIUM;
        return RiskLevel.LOW;
    }
}

Audit workflow

Audit task entity:

public class AuditTask {
    private String taskId;
    private String tenantId;
    private String userId;
    private String originalQuery;
    private String generatedSql;
    private RiskLevel riskLevel;
    private List<String> reasons;
    private String status;
    private LocalDateTime createdAt;
    private LocalDateTime auditedAt;
    private String auditorId;
    private String comment;
}

Service that creates a task and pushes it to Kafka:

@Service
public class AuditWorkflowService {
    private final AuditTaskRepository auditTaskRepository;
    private final KafkaTemplate<String, AuditEvent> kafkaTemplate;

    public AuditWorkflowService(AuditTaskRepository auditTaskRepository,
                                KafkaTemplate<String, AuditEvent> kafkaTemplate) {
        this.auditTaskRepository = auditTaskRepository;
        this.kafkaTemplate = kafkaTemplate;
    }

    public String submit(SqlGenerationCommand command, SanitizedSql sql, RiskDecision decision) {
        AuditTask task = new AuditTask();
        task.setTaskId(UUID.randomUUID().toString());
        task.setTenantId(command.tenantId());
        task.setUserId(command.userId());
        task.setOriginalQuery(command.query());
        task.setGeneratedSql(sql.sql());
        task.setRiskLevel(decision.level());
        task.setReasons(decision.reasons());
        task.setStatus("PENDING");
        task.setCreatedAt(LocalDateTime.now());
        auditTaskRepository.save(task);
        kafkaTemplate.send("sql-audit-task", task.getTaskId(), new AuditEvent(task.getTaskId()));
        return task.getTaskId();
    }
}

Execution engine

@Service
@Slf4j
public class QueryExecutionService {
    private final DynamicDataSourceRouter dataSourceRouter;
    private final QueryResultMasker queryResultMasker;

    public QueryExecutionService(DynamicDataSourceRouter dataSourceRouter,
                                 QueryResultMasker queryResultMasker) {
        this.dataSourceRouter = dataSourceRouter;
        this.queryResultMasker = queryResultMasker;
    }

    @Bulkhead(name = "queryExecution")
    @CircuitBreaker(name = "queryExecution", fallbackMethod = "fallback")
    public QueryResult execute(String tenantId, String userId, String sql, int maxRows) {
        JdbcTemplate jdbcTemplate = dataSourceRouter.readOnlyJdbcTemplate(tenantId);
        jdbcTemplate.setQueryTimeout(15);
        jdbcTemplate.setMaxRows(maxRows);
        long start = System.currentTimeMillis();
        List<Map<String, Object>> rows = jdbcTemplate.query(sql, new ColumnMapRowMapper());
        long cost = System.currentTimeMillis() - start;
        List<Map<String, Object>> maskedRows = queryResultMasker.mask(tenantId, userId, rows);
        return new QueryResult(maskedRows, maskedRows.size(), cost, "SUCCESS");
    }

    public QueryResult fallback(String tenantId, String userId, String sql, int maxRows, Throwable throwable) {
        log.warn("query execution degraded, sql={}, cause={}", sql, throwable.getMessage());
        return new QueryResult(List.of(), 0, 0L, "DEGRADED");
    }
}

Query cache service (three‑level cache)

@Service
public class QueryCacheService {
    private final RedisTemplate<String, Object> redisTemplate;

    public QueryCacheService(RedisTemplate<String, Object> redisTemplate) {
        this.redisTemplate = redisTemplate;
    }

    public QueryResponse get(SqlGenerationCommand command) {
        return (QueryResponse) redisTemplate.opsForValue().get(key(command));
    }

    public void put(SqlGenerationCommand command, QueryResponse response) {
        redisTemplate.opsForValue().set(key(command), response, Duration.ofMinutes(10));
    }

    private String key(SqlGenerationCommand command) {
        String raw = command.tenantId() + "|" + command.userId() + "|" + normalize(command.query());
        return "sql:query:" + DigestUtils.sha256Hex(raw);
    }

    private String normalize(String query) {
        return query.trim().replaceAll("\\s+", " ").toLowerCase(Locale.ROOT);
    }
}

Resilience4j configuration (rate limiter, bulkhead, circuit breaker)

resilience4j:
  ratelimiter:
    instances:
      llmGateway:
        limitForPeriod: 50
        limitRefreshPeriod: 1s
        timeoutDuration: 0
  bulkhead:
    instances:
      queryExecution:
        maxConcurrentCalls: 100
        maxWaitDuration: 10ms
  circuitbreaker:
    instances:
      llmGateway:
        slidingWindowType: COUNT_BASED
        slidingWindowSize: 50
        failureRateThreshold: 50
        waitDurationInOpenState: 30s

Dockerfile for production image

FROM maven:3.9-eclipse-temurin-17 AS build
WORKDIR /workspace
COPY pom.xml .
RUN mvn -q -DskipTests dependency:go-offline
COPY src ./src
RUN mvn -q -DskipTests clean package

FROM eclipse-temurin:17-jre-alpine
WORKDIR /app
RUN addgroup -S spring && adduser -S spring -G spring
COPY --from=build /workspace/target/sql-ai-app.jar app.jar
USER spring:spring
EXPOSE 8080
ENTRYPOINT ["java","-XX:+UseG1GC","-XX:MaxRAMPercentage=75","-jar","/app/app.jar"]

Best‑practice checklist

Separate orchestration layer from controllers.

Strict schema grounding and semantic layer.

SQL AST validation with automatic safe rewrites.

Field‑level access control and result masking.

Manual audit workflow for high‑risk queries.

Multi‑level caching and rate‑limiting.

Comprehensive observability (metrics, logs, traces).

Cloud‑native deployment with health checks and autoscaling.

Risk Management Scalability observability Security Spring AI Text-to-SQL

Written by

Ray's Galactic Tech

Practice together, never alone. We cover programming languages, development tools, learning methods, and pitfall notes. We simplify complex topics, guiding you from beginner to advanced. Weekly practical content—let's grow together!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.