How to Build a High‑Concurrency Story Creation Platform with AgentScope Java
This article presents a step‑by‑step engineering guide for constructing a production‑grade, high‑throughput story generation platform using AgentScope Java, Spring Boot, Kafka, Redis, PostgreSQL, and Kubernetes, covering architecture, task modeling, DAG orchestration, code organization, scalability, observability, and deployment best practices.
1. Introduction
Story creation in production requires multiple inter‑related steps (genre clarification, world‑building, plot planning, chapter generation, style unification, safety review, user‑preference memory). A single LLM call cannot satisfy quality, latency, or cost requirements, so the workload is decomposed into independent agents coordinated by a scheduler.
2. Business Goals & Technical Constraints
2.1 Goals
C‑end users submit creation requests via Web/App.
Support multi‑dimensional inputs (genre, style, audience, length, character settings).
Return a task identifier within milliseconds; progress is queried asynchronously.
Allow chapter‑level generation, continuation, rewriting, and local polishing.
Handle hundreds to thousands of concurrent requests during peak periods.
2.2 Constraints
LLM APIs are unstable (timeouts, rate limits, occasional failures).
Generation quality is non‑deterministic.
Each request spans many agents and intermediate states (long‑chain orchestration).
Prompt length, model choice, and retry strategy directly affect cost.
High observability is required: latency, failure points, token consumption per step.
3. Why AgentScope Java
AgentScope Java provides a type‑safe, testable runtime for multi‑agent collaboration. It integrates naturally with Spring Boot, Kafka, Redis, databases, and Kubernetes, offering thread‑pool, connection‑pool, tracing, and resource isolation management.
4. Production‑Grade Architecture Overview
4.1 Logical Layers
Client Web/App
API Gateway (auth, rate‑limit, tracing)
Story Command Service (receives requests)
Story Orchestrator (DAG scheduling, concurrency, timeout, compensation)
Agents: Intent, Plot, Character, Chapter, Style, Review
LLM Gateway (model routing, circuit‑break, auditing)
Redis, PostgreSQL, Object Storage (state, cache)
Metrics / Logs / Traces (observability)
4.2 Layered Design
Access Layer : API Gateway, authentication, rate limiting, gray release, trace injection.
Application Layer : Exposes story creation, continuation, query, and result APIs.
Orchestration Layer : Story Orchestrator performs DAG orchestration, concurrent scheduling, timeout handling, and compensation.
Agent Execution Layer : Individual agents implement specific capabilities (plot, character, chapter, style, review).
Infrastructure Layer : Redis, Kafka, PostgreSQL, object storage, monitoring, Kubernetes.
4.3 Synchronous vs Asynchronous Calls
A naïve synchronous chain (Intent → Plot → Character → Chapter → Style → Review) blocks the web thread, makes the overall latency equal to the longest step, and complicates failure recovery. The recommended approach is:
API layer returns 202 Accepted with a taskId immediately.
Orchestrator drives a state machine asynchronously.
Independent agents run in parallel whenever possible.
Intermediate results are persisted for checkpoint‑and‑resume.
Event‑driven scaling keeps the system elastic.
5. Multi‑Agent Collaboration Principles
5.1 DAG Orchestration
Intent Parsingmust run first. Plot Planning and Character Definition can run in parallel after intent. Chapter Generation depends on both plot and character results. Style Unification and Quality Review are performed in the final stage.
This design reduces critical‑path latency, makes dependencies explicit for retries, enables node‑level scaling, and improves observability.
5.2 Context Management Rules
Structured transmission : agents exchange JSON objects, not free‑form text.
Transmit only necessary context : e.g., chapter generation receives character cards, world view, and outline snippets instead of the full log.
Persist intermediate results : snapshots are stored in Redis or the database and loaded on demand.
6. Core Domain Model Design
6.1 StoryTask Entity
public class StoryTask {
private Long id;
private String taskId;
private String userId;
private String genre;
private String theme;
private String audience;
private Integer targetWords;
private String status; // PENDING, RUNNING, PARTIAL_SUCCESS, SUCCESS, FAILED
private String currentStage;
private Integer progress;
private String resultVersion;
private Instant createdAt;
private Instant updatedAt;
}Key fields: status: lifecycle state. currentStage: which agent is executing. progress: for front‑end polling or push notifications. resultVersion: supports rewrite, continuation, and multi‑version editing.
6.2 StoryContextSnapshot
public class StoryContextSnapshot {
private String taskId;
private String stage;
private String payloadJson;
private String promptTemplateVersion;
private String modelName;
private Integer promptTokens;
private Integer completionTokens;
private Long latencyMs;
private Boolean success;
private String errorCode;
private String errorMessage;
private Instant createdAt;
}Snapshots enable quality back‑tracing, prompt optimization, cost analysis, and failure compensation.
7. Production‑Grade API & Service Layer
7.1 Request / Response DTOs
public record CreateStoryCommand(
@NotBlank String userId,
@NotBlank String genre,
@NotBlank String theme,
@NotBlank String audience,
@Min(500) @Max(20000) Integer targetWords,
List<String> keywords,
String preferredStyle,
Boolean allowStreaming) {}
public record CreateStoryResponse(String taskId, String status, String message) {}7.2 Controller – Fast Acceptance
@RestController
@RequestMapping("/api/v1/stories")
@RequiredArgsConstructor
public class StoryController {
private final StoryApplicationService storyApplicationService;
@PostMapping
public ResponseEntity<CreateStoryResponse> createStory(@Valid @RequestBody CreateStoryCommand command) {
CreateStoryResponse response = storyApplicationService.createStory(command);
return ResponseEntity.accepted().body(response);
}
@GetMapping("/{taskId}")
public ResponseEntity<StoryTaskDetailVO> getTaskDetail(@PathVariable String taskId) {
return ResponseEntity.ok(storyApplicationService.getTaskDetail(taskId));
}
}Key points:
Return 202 Accepted with a taskId.
Do not block the HTTP thread with long‑running LLM calls.
7.3 Application Service – Store‑then‑Publish
@Service
@RequiredArgsConstructor
public class StoryApplicationService {
private final StoryTaskRepository storyTaskRepository;
private final StoryEventPublisher storyEventPublisher;
private final IdGenerator idGenerator;
@Transactional
public CreateStoryResponse createStory(CreateStoryCommand command) {
String taskId = idGenerator.nextTaskId();
StoryTaskEntity entity = StoryTaskEntity.builder()
.taskId(taskId)
.userId(command.userId())
.genre(command.genre())
.theme(command.theme())
.audience(command.audience())
.targetWords(command.targetWords())
.status(TaskStatus.PENDING.name())
.currentStage("INIT")
.progress(0)
.build();
storyTaskRepository.save(entity);
storyEventPublisher.publishTaskCreated(new StoryTaskCreatedEvent(
taskId, command.userId(), command.genre(), command.theme(),
command.audience(), command.targetWords(), command.keywords(), command.preferredStyle()));
return new CreateStoryResponse(taskId, "PENDING", "Story creation task accepted");
}
}This pattern guarantees that a persisted record exists before any event is emitted, preventing data loss in case of broker failure (Outbox pattern can be added for stronger guarantees).
8. Orchestrator Design
8.1 Responsibilities
Maintain the task state machine.
Select the next agent based on dependencies.
Handle timeouts, retries, downgrade, and compensation.
Aggregate stage results into the final story.
8.2 Parallel Execution with CompletableFuture
@Service
@RequiredArgsConstructor
public class StoryOrchestrator {
private final IntentAgent intentAgent;
private final PlotAgent plotAgent;
private final CharacterAgent characterAgent;
private final ChapterAgent chapterAgent;
private final StyleAgent styleAgent;
private final ReviewAgent reviewAgent;
private final Executor storyExecutor;
public StoryResult orchestrate(StoryCreationContext context) {
IntentResult intentResult = intentAgent.execute(context);
StoryCreationContext enriched = context.withIntentResult(intentResult);
CompletableFuture<PlotResult> plotFuture = CompletableFuture.supplyAsync(() -> plotAgent.execute(enriched), storyExecutor);
CompletableFuture<CharacterResult> characterFuture = CompletableFuture.supplyAsync(() -> characterAgent.execute(enriched), storyExecutor);
PlotResult plotResult = plotFuture.join();
CharacterResult characterResult = characterFuture.join();
StoryCreationContext chapterContext = enriched
.withPlotResult(plotResult)
.withCharacterResult(characterResult);
ChapterDraft draft = chapterAgent.execute(chapterContext);
StyledStory styledStory = styleAgent.execute(chapterContext.withDraft(draft));
ReviewResult reviewResult = reviewAgent.execute(chapterContext.withStyledStory(styledStory));
return StoryResult.builder()
.plot(plotResult.plot())
.characters(characterResult.characters())
.story(reviewResult.finalStory())
.reviewTags(reviewResult.tags())
.build();
}
}Parallelism is applied to agents without data dependencies (Plot & Character). A unified StoryCreationContext carries only the required JSON payloads.
9. Agent Design Patterns
9.1 Base Agent Class
public abstract class BaseStoryAgent<I, O> {
protected final LlmGatewayClient llmGatewayClient;
protected final PromptTemplateEngine promptTemplateEngine;
protected BaseStoryAgent(LlmGatewayClient llmGatewayClient, PromptTemplateEngine promptTemplateEngine) {
this.llmGatewayClient = llmGatewayClient;
this.promptTemplateEngine = promptTemplateEngine;
}
public O execute(StoryCreationContext context) {
I request = buildRequest(context);
validateRequest(request);
String prompt = promptTemplateEngine.render(templateName(), request);
LlmResponse response = llmGatewayClient.generate(buildLlmRequest(prompt));
O result = parseResponse(response);
validateResult(result);
return result;
}
protected abstract String templateName();
protected abstract I buildRequest(StoryCreationContext context);
protected abstract O parseResponse(LlmResponse response);
protected abstract void validateRequest(I request);
protected abstract void validateResult(O result);
protected LlmRequest buildLlmRequest(String prompt) {
return LlmRequest.builder()
.model("gpt-4.1")
.temperature(0.7)
.maxTokens(2000)
.prompt(prompt)
.build();
}
}This class enforces uniform prompt rendering, model invocation, and result validation across all agents.
9.2 Example: PlotAgent
@Component
public class PlotAgent extends BaseStoryAgent<PlotRequest, PlotResult> {
public PlotAgent(LlmGatewayClient llmGatewayClient, PromptTemplateEngine promptTemplateEngine) {
super(llmGatewayClient, promptTemplateEngine);
}
@Override
protected String templateName() { return "plot-agent-v3"; }
@Override
protected PlotRequest buildRequest(StoryCreationContext ctx) {
return new PlotRequest(
ctx.genre(), ctx.theme(), ctx.audience(),
ctx.intentResult().coreExpectation(), ctx.keywords());
}
@Override
protected PlotResult parseResponse(LlmResponse resp) {
return JsonUtils.fromJson(resp.content(), PlotResult.class);
}
@Override
protected void validateRequest(PlotRequest req) {
Assert.hasText(req.genre(), "genre must not be empty");
Assert.hasText(req.theme(), "theme must not be empty");
}
@Override
protected void validateResult(PlotResult res) {
if (res == null || res.outline() == null || res.outline().isEmpty()) {
throw new AgentExecutionException("PLOT_EMPTY", "plot result is empty");
}
}
}The agent demonstrates versioned prompt templates, strict input validation, and explicit error codes.
10. LLM Gateway – Centralized Model Access
10.1 Interface
public interface LlmGatewayClient {
LlmResponse generate(LlmRequest request);
default LlmResponse generateWithFallback(LlmRequest primary, LlmRequest fallback) {
try { return generate(primary); }
catch (Exception ex) { return generate(fallback); }
}
}10.2 Resilient Implementation (Resilience4j)
@Service
@RequiredArgsConstructor
public class ResilientLlmGatewayClient implements LlmGatewayClient {
private final ExternalLlmClient externalLlmClient;
private final CircuitBreaker circuitBreaker;
private final Retry retry;
private final TimeLimiter timeLimiter;
@Override
public LlmResponse generate(LlmRequest request) {
Supplier<CompletableFuture<LlmResponse>> supplier = () ->
CompletableFuture.supplyAsync(() -> externalLlmClient.invoke(request));
try {
return Decorators.ofSupplier(supplier)
.withCircuitBreaker(circuitBreaker)
.withRetry(retry)
.withTimeLimiter(timeLimiter)
.decorate()
.get()
.join();
} catch (Exception e) {
throw new LlmGatewayException("LLM_GATEWAY_ERROR", e.getMessage(), e);
}
}
}Resilience4j provides circuit breaking, retry, and timeout handling for unstable external LLM services.
11. High‑Concurrency Engineering
11.1 Thread‑Pool Isolation
@Configuration
public class ExecutorConfig {
@Bean("orchestratorExecutor")
public ThreadPoolTaskExecutor orchestratorExecutor() {
ThreadPoolTaskExecutor exec = new ThreadPoolTaskExecutor();
exec.setCorePoolSize(16);
exec.setMaxPoolSize(64);
exec.setQueueCapacity(1000);
exec.setThreadNamePrefix("story-orchestrator-");
exec.initialize();
return exec;
}
@Bean("agentWorkerExecutor")
public ThreadPoolTaskExecutor agentWorkerExecutor() {
ThreadPoolTaskExecutor exec = new ThreadPoolTaskExecutor();
exec.setCorePoolSize(32);
exec.setMaxPoolSize(128);
exec.setQueueCapacity(5000);
exec.setThreadNamePrefix("story-agent-worker-");
exec.initialize();
return exec;
}
}Separate pools prevent web threads from being blocked by long‑running LLM calls.
11.2 Back‑Pressure & Rate Limiting
API Gateway rate limiting.
Kafka consumer throttling.
Worker concurrency limits.
Queue‑length alerts that return “system busy, please retry later” when thresholds are exceeded.
11.3 Tiered Degradation
Feature priority levels (P0‑P3) ensure that core task creation and progress queries stay available during extreme load, while optional polishing or multi‑version comparison can be disabled.
12. Extensibility
12.1 Plugin‑Style Agent Registry
public interface StoryAgent {
String agentName();
String stageName();
AgentExecutionResult execute(AgentExecutionContext ctx);
}
@Component
public class StoryAgentRegistry {
private final Map<String, StoryAgent> agentMap;
public StoryAgentRegistry(List<StoryAgent> agents) {
this.agentMap = agents.stream()
.collect(Collectors.toUnmodifiableMap(StoryAgent::stageName, Function.identity()));
}
public StoryAgent getByStage(String stage) {
StoryAgent agent = agentMap.get(stage);
if (agent == null) {
throw new IllegalArgumentException("No agent registered for stage: " + stage);
}
return agent;
}
}Adding a new agent only requires implementing the interface and registering a prompt template.
12.2 Configurable DAG (YAML)
story:
workflow:
stages:
- name: intent
dependsOn: []
- name: plot
dependsOn: [intent]
- name: character
dependsOn: [intent]
- name: chapter
dependsOn: [plot, character]
- name: style
dependsOn: [chapter]
- name: review
dependsOn: [style]This makes per‑tenant or per‑product workflow variations possible without code changes.
13. Data Persistence Design
13.1 Recommended Tables
story_task: main task record. story_stage_execution: per‑stage execution details (agent, attempt, input/output summary, latency, token usage, error code, node). story_result: final story and versioned content. story_prompt_snapshot: prompt and model parameters for cost analysis. story_cost_record: token consumption and monetary cost.
13.2 Importance of Stage Execution Details
Recording agent name, retry count, input/output snippets, latency, token usage, and error codes enables root‑cause analysis for quality regressions, cost spikes, and tenant‑specific failures.
14. Redis Usage
Task progress cache for fast front‑end queries.
Hot prompt‑template cache.
User short‑term preference memory.
Idempotent keys and distributed locks for exactly‑once processing.
Key examples:
story:task:progress:{taskId}
story:task:result:{taskId}
story:user:preference:{userId}
story:stage:idempotent:{taskId}:{stage}Large text results are eventually persisted to the database or object storage; Redis holds only transient data.
15. Kafka Design
15.1 Topic Segmentation
story-task-created story-stage-intent.completed story-stage-plot.completed story-stage-character.completed story-stage-chapter.completed story-stage-review.completed story-task-failed15.2 Idempotent Consumption
public boolean tryExecuteStage(String taskId, String stage) {
String key = "story:stage:idempotent:" + taskId + ":" + stage;
Boolean success = redisTemplate.opsForValue()
.setIfAbsent(key, "1", Duration.ofMinutes(30));
return Boolean.TRUE.equals(success);
}15.3 Dead‑Letter Queue Strategy
Transient errors: exponential back‑off with 2‑3 retries.
Irrecoverable business errors: abort immediately.
Repeated failures: route to dead‑letter topic for manual investigation.
16. Observability
16.1 Metric Set (Prometheus)
story_task_create_qps story_task_success_rate story_stage_latency_ms story_stage_retry_count llm_call_latency_ms llm_call_error_rate llm_tokens_prompt_total llm_tokens_completion_total story_cost_total16.2 Structured Logging
Every log entry must contain traceId, taskId, userId, stage, agentName, model, latencyMs, retryCount, and errorCode.
16.3 Distributed Tracing
Use OpenTelemetry + Micrometer + Prometheus + Grafana to answer questions such as “Why did a story take 30 s?”, “Which agent is the bottleneck?”, and “Was the delay caused by the model, the queue, or the database?”.
17. Deployment & Containerization
17.1 Dockerfile
FROM eclipse-temurin:21-jre
WORKDIR /app
COPY target/story-platform.jar app.jar
ENV JAVA_OPTS="-XX:+UseContainerSupport -XX:MaxRAMPercentage=75"
EXPOSE 8080
ENTRYPOINT ["sh", "-c", "java $JAVA_OPTS -jar app.jar"]17.2 Kubernetes Deployment (excerpt)
apiVersion: apps/v1
kind: Deployment
metadata:
name: story-orchestrator
namespace: story-platform
spec:
replicas: 3
selector:
matchLabels:
app: story-orchestrator
template:
metadata:
labels:
app: story-orchestrator
spec:
containers:
- name: story-orchestrator
image: registry.example.com/story-orchestrator:1.0.0
ports:
- containerPort: 8080
env:
- name: SPRING_PROFILES_ACTIVE
value: prod
- name: JAVA_OPTS
value: "-Xms512m -Xmx2048m"
resources:
requests:
cpu: "500m"
memory: "1Gi"
limits:
cpu: "2"
memory: "3Gi"
readinessProbe:
httpGet:
path: /actuator/health/readiness
port: 8080
initialDelaySeconds: 15
periodSeconds: 10
livenessProbe:
httpGet:
path: /actuator/health/liveness
port: 8080
initialDelaySeconds: 30
periodSeconds: 2017.3 Horizontal Pod Autoscaler (CPU‑based, extendable to Kafka‑lag)
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: story-agent-worker-hpa
namespace: story-platform
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: story-agent-worker
minReplicas: 4
maxReplicas: 30
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 6518. Fault Handling Strategies
18.1 Model Call Timeout
Set per‑stage timeout limits.
Enable limited retries.
Fallback to a lighter model via the LLM gateway.
Persist intermediate snapshots for manual or offline compensation.
18.2 Agent Output Format Errors
All agents validate JSON against a schema.
Automatic repair attempts before marking failure.
If repair fails, route to the failure branch without contaminating downstream agents.
18.3 Kafka Consumer Lag
Scale consumer instances.
Reduce batch size.
Increase partition count.
Investigate whether slow model calls are causing “pseudo‑lag”.
18.4 Partial Success
Use PARTIAL_SUCCESS status to keep already‑generated stages.
Expose to the user as “base draft ready, polishing pending”.
19. Evolution Roadmap
PoC Phase : single JVM, synchronous LLM calls, simple persistence.
Business‑Launch Phase : async task handling, Redis/Kafka integration, basic observability.
Scale‑Out Phase : separate Agent Workers, centralized LLM gateway, full monitoring, cost analysis.
Platform Phase : DAG configuration, plugin agents, prompt‑management UI, multi‑tenant quota system.
20. Best‑Practice Checklist
Treat story creation as an asynchronous task, not a synchronous API.
Prefer DAG orchestration over linear chaining.
Agents must have single responsibility, structured I/O, and unit tests.
All model calls go through a unified LLM gateway.
Isolate thread pools, apply back‑pressure, and use tiered degradation.
Persist snapshots for every stage to enable debugging and cost tracking.
Implement idempotency, retries, timeouts, circuit breaking, and dead‑letter handling.
Collect metrics at task, stage, model, and cost levels.
Design extension points for new agents and workflow variations.
Continuously balance quality, stability, and cost; never optimise only one dimension.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Ray's Galactic Tech
Practice together, never alone. We cover programming languages, development tools, learning methods, and pitfall notes. We simplify complex topics, guiding you from beginner to advanced. Weekly practical content—let's grow together!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
