Backend Development 39 min read

How to Build a High‑Concurrency Story Creation Platform with AgentScope Java

This article presents a step‑by‑step engineering guide for constructing a production‑grade, high‑throughput story generation platform using AgentScope Java, Spring Boot, Kafka, Redis, PostgreSQL, and Kubernetes, covering architecture, task modeling, DAG orchestration, code organization, scalability, observability, and deployment best practices.

Ray's Galactic Tech

Apr 4, 2026

How to Build a High‑Concurrency Story Creation Platform with AgentScope Java

1. Introduction

Story creation in production requires multiple inter‑related steps (genre clarification, world‑building, plot planning, chapter generation, style unification, safety review, user‑preference memory). A single LLM call cannot satisfy quality, latency, or cost requirements, so the workload is decomposed into independent agents coordinated by a scheduler.

2. Business Goals & Technical Constraints

2.1 Goals

C‑end users submit creation requests via Web/App.

Support multi‑dimensional inputs (genre, style, audience, length, character settings).

Return a task identifier within milliseconds; progress is queried asynchronously.

Allow chapter‑level generation, continuation, rewriting, and local polishing.

Handle hundreds to thousands of concurrent requests during peak periods.

2.2 Constraints

LLM APIs are unstable (timeouts, rate limits, occasional failures).

Generation quality is non‑deterministic.

Each request spans many agents and intermediate states (long‑chain orchestration).

Prompt length, model choice, and retry strategy directly affect cost.

High observability is required: latency, failure points, token consumption per step.

3. Why AgentScope Java

AgentScope Java provides a type‑safe, testable runtime for multi‑agent collaboration. It integrates naturally with Spring Boot, Kafka, Redis, databases, and Kubernetes, offering thread‑pool, connection‑pool, tracing, and resource isolation management.

4. Production‑Grade Architecture Overview

4.1 Logical Layers

Client Web/App

API Gateway (auth, rate‑limit, tracing)

Story Command Service (receives requests)

Story Orchestrator (DAG scheduling, concurrency, timeout, compensation)

Agents: Intent, Plot, Character, Chapter, Style, Review

LLM Gateway (model routing, circuit‑break, auditing)

Redis, PostgreSQL, Object Storage (state, cache)

Metrics / Logs / Traces (observability)

4.2 Layered Design

Access Layer : API Gateway, authentication, rate limiting, gray release, trace injection.

Application Layer : Exposes story creation, continuation, query, and result APIs.

Orchestration Layer : Story Orchestrator performs DAG orchestration, concurrent scheduling, timeout handling, and compensation.

Agent Execution Layer : Individual agents implement specific capabilities (plot, character, chapter, style, review).

Infrastructure Layer : Redis, Kafka, PostgreSQL, object storage, monitoring, Kubernetes.

4.3 Synchronous vs Asynchronous Calls

A naïve synchronous chain (Intent → Plot → Character → Chapter → Style → Review) blocks the web thread, makes the overall latency equal to the longest step, and complicates failure recovery. The recommended approach is:

API layer returns 202 Accepted with a taskId immediately.

Orchestrator drives a state machine asynchronously.

Independent agents run in parallel whenever possible.

Intermediate results are persisted for checkpoint‑and‑resume.

Event‑driven scaling keeps the system elastic.

5. Multi‑Agent Collaboration Principles

5.1 DAG Orchestration

Intent Parsing

must run first. Plot Planning and Character Definition can run in parallel after intent. Chapter Generation depends on both plot and character results. Style Unification and Quality Review are performed in the final stage.

This design reduces critical‑path latency, makes dependencies explicit for retries, enables node‑level scaling, and improves observability.

5.2 Context Management Rules

Structured transmission : agents exchange JSON objects, not free‑form text.

Transmit only necessary context : e.g., chapter generation receives character cards, world view, and outline snippets instead of the full log.

Persist intermediate results : snapshots are stored in Redis or the database and loaded on demand.

6. Core Domain Model Design

6.1 StoryTask Entity

public class StoryTask {
    private Long id;
    private String taskId;
    private String userId;
    private String genre;
    private String theme;
    private String audience;
    private Integer targetWords;
    private String status; // PENDING, RUNNING, PARTIAL_SUCCESS, SUCCESS, FAILED
    private String currentStage;
    private Integer progress;
    private String resultVersion;
    private Instant createdAt;
    private Instant updatedAt;
}

Key fields: status: lifecycle state. currentStage: which agent is executing. progress: for front‑end polling or push notifications. resultVersion: supports rewrite, continuation, and multi‑version editing.

6.2 StoryContextSnapshot

public class StoryContextSnapshot {
    private String taskId;
    private String stage;
    private String payloadJson;
    private String promptTemplateVersion;
    private String modelName;
    private Integer promptTokens;
    private Integer completionTokens;
    private Long latencyMs;
    private Boolean success;
    private String errorCode;
    private String errorMessage;
    private Instant createdAt;
}

Snapshots enable quality back‑tracing, prompt optimization, cost analysis, and failure compensation.

7. Production‑Grade API & Service Layer

7.1 Request / Response DTOs

public record CreateStoryCommand(
    @NotBlank String userId,
    @NotBlank String genre,
    @NotBlank String theme,
    @NotBlank String audience,
    @Min(500) @Max(20000) Integer targetWords,
    List<String> keywords,
    String preferredStyle,
    Boolean allowStreaming) {}

public record CreateStoryResponse(String taskId, String status, String message) {}

7.2 Controller – Fast Acceptance

@RestController
@RequestMapping("/api/v1/stories")
@RequiredArgsConstructor
public class StoryController {
    private final StoryApplicationService storyApplicationService;

    @PostMapping
    public ResponseEntity<CreateStoryResponse> createStory(@Valid @RequestBody CreateStoryCommand command) {
        CreateStoryResponse response = storyApplicationService.createStory(command);
        return ResponseEntity.accepted().body(response);
    }

    @GetMapping("/{taskId}")
    public ResponseEntity<StoryTaskDetailVO> getTaskDetail(@PathVariable String taskId) {
        return ResponseEntity.ok(storyApplicationService.getTaskDetail(taskId));
    }
}

Key points:

Return 202 Accepted with a taskId.

Do not block the HTTP thread with long‑running LLM calls.

7.3 Application Service – Store‑then‑Publish

@Service
@RequiredArgsConstructor
public class StoryApplicationService {
    private final StoryTaskRepository storyTaskRepository;
    private final StoryEventPublisher storyEventPublisher;
    private final IdGenerator idGenerator;

    @Transactional
    public CreateStoryResponse createStory(CreateStoryCommand command) {
        String taskId = idGenerator.nextTaskId();
        StoryTaskEntity entity = StoryTaskEntity.builder()
                .taskId(taskId)
                .userId(command.userId())
                .genre(command.genre())
                .theme(command.theme())
                .audience(command.audience())
                .targetWords(command.targetWords())
                .status(TaskStatus.PENDING.name())
                .currentStage("INIT")
                .progress(0)
                .build();
        storyTaskRepository.save(entity);
        storyEventPublisher.publishTaskCreated(new StoryTaskCreatedEvent(
                taskId, command.userId(), command.genre(), command.theme(),
                command.audience(), command.targetWords(), command.keywords(), command.preferredStyle()));
        return new CreateStoryResponse(taskId, "PENDING", "Story creation task accepted");
    }
}

This pattern guarantees that a persisted record exists before any event is emitted, preventing data loss in case of broker failure (Outbox pattern can be added for stronger guarantees).

8. Orchestrator Design

8.1 Responsibilities

Maintain the task state machine.

Select the next agent based on dependencies.

Handle timeouts, retries, downgrade, and compensation.

Aggregate stage results into the final story.

8.2 Parallel Execution with CompletableFuture

@Service
@RequiredArgsConstructor
public class StoryOrchestrator {
    private final IntentAgent intentAgent;
    private final PlotAgent plotAgent;
    private final CharacterAgent characterAgent;
    private final ChapterAgent chapterAgent;
    private final StyleAgent styleAgent;
    private final ReviewAgent reviewAgent;
    private final Executor storyExecutor;

    public StoryResult orchestrate(StoryCreationContext context) {
        IntentResult intentResult = intentAgent.execute(context);
        StoryCreationContext enriched = context.withIntentResult(intentResult);

        CompletableFuture<PlotResult> plotFuture = CompletableFuture.supplyAsync(() -> plotAgent.execute(enriched), storyExecutor);
        CompletableFuture<CharacterResult> characterFuture = CompletableFuture.supplyAsync(() -> characterAgent.execute(enriched), storyExecutor);

        PlotResult plotResult = plotFuture.join();
        CharacterResult characterResult = characterFuture.join();

        StoryCreationContext chapterContext = enriched
                .withPlotResult(plotResult)
                .withCharacterResult(characterResult);

        ChapterDraft draft = chapterAgent.execute(chapterContext);
        StyledStory styledStory = styleAgent.execute(chapterContext.withDraft(draft));
        ReviewResult reviewResult = reviewAgent.execute(chapterContext.withStyledStory(styledStory));

        return StoryResult.builder()
                .plot(plotResult.plot())
                .characters(characterResult.characters())
                .story(reviewResult.finalStory())
                .reviewTags(reviewResult.tags())
                .build();
    }
}

Parallelism is applied to agents without data dependencies (Plot & Character). A unified StoryCreationContext carries only the required JSON payloads.

9. Agent Design Patterns

9.1 Base Agent Class

public abstract class BaseStoryAgent<I, O> {
    protected final LlmGatewayClient llmGatewayClient;
    protected final PromptTemplateEngine promptTemplateEngine;

    protected BaseStoryAgent(LlmGatewayClient llmGatewayClient, PromptTemplateEngine promptTemplateEngine) {
        this.llmGatewayClient = llmGatewayClient;
        this.promptTemplateEngine = promptTemplateEngine;
    }

    public O execute(StoryCreationContext context) {
        I request = buildRequest(context);
        validateRequest(request);
        String prompt = promptTemplateEngine.render(templateName(), request);
        LlmResponse response = llmGatewayClient.generate(buildLlmRequest(prompt));
        O result = parseResponse(response);
        validateResult(result);
        return result;
    }

    protected abstract String templateName();
    protected abstract I buildRequest(StoryCreationContext context);
    protected abstract O parseResponse(LlmResponse response);
    protected abstract void validateRequest(I request);
    protected abstract void validateResult(O result);

    protected LlmRequest buildLlmRequest(String prompt) {
        return LlmRequest.builder()
                .model("gpt-4.1")
                .temperature(0.7)
                .maxTokens(2000)
                .prompt(prompt)
                .build();
    }
}

This class enforces uniform prompt rendering, model invocation, and result validation across all agents.

9.2 Example: PlotAgent

@Component
public class PlotAgent extends BaseStoryAgent<PlotRequest, PlotResult> {
    public PlotAgent(LlmGatewayClient llmGatewayClient, PromptTemplateEngine promptTemplateEngine) {
        super(llmGatewayClient, promptTemplateEngine);
    }

    @Override
    protected String templateName() { return "plot-agent-v3"; }

    @Override
    protected PlotRequest buildRequest(StoryCreationContext ctx) {
        return new PlotRequest(
                ctx.genre(), ctx.theme(), ctx.audience(),
                ctx.intentResult().coreExpectation(), ctx.keywords());
    }

    @Override
    protected PlotResult parseResponse(LlmResponse resp) {
        return JsonUtils.fromJson(resp.content(), PlotResult.class);
    }

    @Override
    protected void validateRequest(PlotRequest req) {
        Assert.hasText(req.genre(), "genre must not be empty");
        Assert.hasText(req.theme(), "theme must not be empty");
    }

    @Override
    protected void validateResult(PlotResult res) {
        if (res == null || res.outline() == null || res.outline().isEmpty()) {
            throw new AgentExecutionException("PLOT_EMPTY", "plot result is empty");
        }
    }
}

The agent demonstrates versioned prompt templates, strict input validation, and explicit error codes.

10. LLM Gateway – Centralized Model Access

10.1 Interface

public interface LlmGatewayClient {
    LlmResponse generate(LlmRequest request);
    default LlmResponse generateWithFallback(LlmRequest primary, LlmRequest fallback) {
        try { return generate(primary); }
        catch (Exception ex) { return generate(fallback); }
    }
}

10.2 Resilient Implementation (Resilience4j)

@Service
@RequiredArgsConstructor
public class ResilientLlmGatewayClient implements LlmGatewayClient {
    private final ExternalLlmClient externalLlmClient;
    private final CircuitBreaker circuitBreaker;
    private final Retry retry;
    private final TimeLimiter timeLimiter;

    @Override
    public LlmResponse generate(LlmRequest request) {
        Supplier<CompletableFuture<LlmResponse>> supplier = () ->
                CompletableFuture.supplyAsync(() -> externalLlmClient.invoke(request));
        try {
            return Decorators.ofSupplier(supplier)
                    .withCircuitBreaker(circuitBreaker)
                    .withRetry(retry)
                    .withTimeLimiter(timeLimiter)
                    .decorate()
                    .get()
                    .join();
        } catch (Exception e) {
            throw new LlmGatewayException("LLM_GATEWAY_ERROR", e.getMessage(), e);
        }
    }
}

Resilience4j provides circuit breaking, retry, and timeout handling for unstable external LLM services.

11. High‑Concurrency Engineering

11.1 Thread‑Pool Isolation

@Configuration
public class ExecutorConfig {
    @Bean("orchestratorExecutor")
    public ThreadPoolTaskExecutor orchestratorExecutor() {
        ThreadPoolTaskExecutor exec = new ThreadPoolTaskExecutor();
        exec.setCorePoolSize(16);
        exec.setMaxPoolSize(64);
        exec.setQueueCapacity(1000);
        exec.setThreadNamePrefix("story-orchestrator-");
        exec.initialize();
        return exec;
    }

    @Bean("agentWorkerExecutor")
    public ThreadPoolTaskExecutor agentWorkerExecutor() {
        ThreadPoolTaskExecutor exec = new ThreadPoolTaskExecutor();
        exec.setCorePoolSize(32);
        exec.setMaxPoolSize(128);
        exec.setQueueCapacity(5000);
        exec.setThreadNamePrefix("story-agent-worker-");
        exec.initialize();
        return exec;
    }
}

Separate pools prevent web threads from being blocked by long‑running LLM calls.

11.2 Back‑Pressure & Rate Limiting

API Gateway rate limiting.

Kafka consumer throttling.

Worker concurrency limits.

Queue‑length alerts that return “system busy, please retry later” when thresholds are exceeded.

11.3 Tiered Degradation

Feature priority levels (P0‑P3) ensure that core task creation and progress queries stay available during extreme load, while optional polishing or multi‑version comparison can be disabled.

12. Extensibility

12.1 Plugin‑Style Agent Registry

public interface StoryAgent {
    String agentName();
    String stageName();
    AgentExecutionResult execute(AgentExecutionContext ctx);
}

@Component
public class StoryAgentRegistry {
    private final Map<String, StoryAgent> agentMap;

    public StoryAgentRegistry(List<StoryAgent> agents) {
        this.agentMap = agents.stream()
                .collect(Collectors.toUnmodifiableMap(StoryAgent::stageName, Function.identity()));
    }

    public StoryAgent getByStage(String stage) {
        StoryAgent agent = agentMap.get(stage);
        if (agent == null) {
            throw new IllegalArgumentException("No agent registered for stage: " + stage);
        }
        return agent;
    }
}

Adding a new agent only requires implementing the interface and registering a prompt template.

12.2 Configurable DAG (YAML)

story:
  workflow:
    stages:
      - name: intent
        dependsOn: []
      - name: plot
        dependsOn: [intent]
      - name: character
        dependsOn: [intent]
      - name: chapter
        dependsOn: [plot, character]
      - name: style
        dependsOn: [chapter]
      - name: review
        dependsOn: [style]

This makes per‑tenant or per‑product workflow variations possible without code changes.

13. Data Persistence Design

13.1 Recommended Tables

story_task

: main task record. story_stage_execution: per‑stage execution details (agent, attempt, input/output summary, latency, token usage, error code, node). story_result: final story and versioned content. story_prompt_snapshot: prompt and model parameters for cost analysis. story_cost_record: token consumption and monetary cost.

13.2 Importance of Stage Execution Details

Recording agent name, retry count, input/output snippets, latency, token usage, and error codes enables root‑cause analysis for quality regressions, cost spikes, and tenant‑specific failures.

14. Redis Usage

Task progress cache for fast front‑end queries.

Hot prompt‑template cache.

User short‑term preference memory.

Idempotent keys and distributed locks for exactly‑once processing.

Key examples:

story:task:progress:{taskId}
story:task:result:{taskId}
story:user:preference:{userId}
story:stage:idempotent:{taskId}:{stage}

Large text results are eventually persisted to the database or object storage; Redis holds only transient data.

15. Kafka Design

15.1 Topic Segmentation

story-task-created

story-stage-intent.completed

story-stage-plot.completed

story-stage-character.completed

story-stage-chapter.completed

story-stage-review.completed

story-task-failed

15.2 Idempotent Consumption

public boolean tryExecuteStage(String taskId, String stage) {
    String key = "story:stage:idempotent:" + taskId + ":" + stage;
    Boolean success = redisTemplate.opsForValue()
            .setIfAbsent(key, "1", Duration.ofMinutes(30));
    return Boolean.TRUE.equals(success);
}

15.3 Dead‑Letter Queue Strategy

Transient errors: exponential back‑off with 2‑3 retries.

Irrecoverable business errors: abort immediately.

Repeated failures: route to dead‑letter topic for manual investigation.

16. Observability

16.1 Metric Set (Prometheus)

story_task_create_qps

story_task_success_rate

story_stage_latency_ms

story_stage_retry_count

llm_call_latency_ms

llm_call_error_rate

llm_tokens_prompt_total

llm_tokens_completion_total

story_cost_total

16.2 Structured Logging

Every log entry must contain traceId, taskId, userId, stage, agentName, model, latencyMs, retryCount, and errorCode.

16.3 Distributed Tracing

Use OpenTelemetry + Micrometer + Prometheus + Grafana to answer questions such as “Why did a story take 30 s?”, “Which agent is the bottleneck?”, and “Was the delay caused by the model, the queue, or the database?”.

17. Deployment & Containerization

17.1 Dockerfile

FROM eclipse-temurin:21-jre
WORKDIR /app
COPY target/story-platform.jar app.jar
ENV JAVA_OPTS="-XX:+UseContainerSupport -XX:MaxRAMPercentage=75"
EXPOSE 8080
ENTRYPOINT ["sh", "-c", "java $JAVA_OPTS -jar app.jar"]

17.2 Kubernetes Deployment (excerpt)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: story-orchestrator
  namespace: story-platform
spec:
  replicas: 3
  selector:
    matchLabels:
      app: story-orchestrator
  template:
    metadata:
      labels:
        app: story-orchestrator
    spec:
      containers:
        - name: story-orchestrator
          image: registry.example.com/story-orchestrator:1.0.0
          ports:
            - containerPort: 8080
          env:
            - name: SPRING_PROFILES_ACTIVE
              value: prod
            - name: JAVA_OPTS
              value: "-Xms512m -Xmx2048m"
          resources:
            requests:
              cpu: "500m"
              memory: "1Gi"
            limits:
              cpu: "2"
              memory: "3Gi"
          readinessProbe:
            httpGet:
              path: /actuator/health/readiness
              port: 8080
            initialDelaySeconds: 15
            periodSeconds: 10
          livenessProbe:
            httpGet:
              path: /actuator/health/liveness
              port: 8080
            initialDelaySeconds: 30
            periodSeconds: 20

17.3 Horizontal Pod Autoscaler (CPU‑based, extendable to Kafka‑lag)

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: story-agent-worker-hpa
  namespace: story-platform
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: story-agent-worker
  minReplicas: 4
  maxReplicas: 30
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 65

18. Fault Handling Strategies

18.1 Model Call Timeout

Set per‑stage timeout limits.

Enable limited retries.

Fallback to a lighter model via the LLM gateway.

Persist intermediate snapshots for manual or offline compensation.

18.2 Agent Output Format Errors

All agents validate JSON against a schema.

Automatic repair attempts before marking failure.

If repair fails, route to the failure branch without contaminating downstream agents.

18.3 Kafka Consumer Lag

Scale consumer instances.

Reduce batch size.

Increase partition count.

Investigate whether slow model calls are causing “pseudo‑lag”.

18.4 Partial Success

Use PARTIAL_SUCCESS status to keep already‑generated stages.

Expose to the user as “base draft ready, polishing pending”.

19. Evolution Roadmap

PoC Phase : single JVM, synchronous LLM calls, simple persistence.

Business‑Launch Phase : async task handling, Redis/Kafka integration, basic observability.

Scale‑Out Phase : separate Agent Workers, centralized LLM gateway, full monitoring, cost analysis.

Platform Phase : DAG configuration, plugin agents, prompt‑management UI, multi‑tenant quota system.

20. Best‑Practice Checklist

Treat story creation as an asynchronous task, not a synchronous API.

Prefer DAG orchestration over linear chaining.

Agents must have single responsibility, structured I/O, and unit tests.

All model calls go through a unified LLM gateway.

Isolate thread pools, apply back‑pressure, and use tiered degradation.

Persist snapshots for every stage to enable debugging and cost tracking.

Implement idempotency, retries, timeouts, circuit breaking, and dead‑letter handling.

Collect metrics at task, stage, model, and cost levels.

Design extension points for new agents and workflow variations.

Continuously balance quality, stability, and cost; never optimise only one dimension.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java Kafka High Concurrency Spring Boot Multi-agent story generation

Written by

Ray's Galactic Tech

Practice together, never alone. We cover programming languages, development tools, learning methods, and pitfall notes. We simplify complex topics, guiding you from beginner to advanced. Weekly practical content—let's grow together!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.