Evolving RagFlow Text Upload: From Placeholder Files to Direct Temp‑File Upload

The article compares two Java‑based integration methods for sending pure‑text documents to RagFlow—first uploading an empty placeholder file then adding chunks, and later writing the text to a temporary file and uploading it directly—detailing implementation, pitfalls, and why the latter is preferred.

Tech Musings
Tech Musings
Tech Musings
Evolving RagFlow Text Upload: From Placeholder Files to Direct Temp‑File Upload

Background

We need to sync pure‑text content from our knowledge base to the RagFlow engine for Retrieval‑Augmented Generation (RAG), but RagFlow’s document upload API only accepts multipart files, not raw strings. Our system stores cleaned text fields without physical files.

Problem Analysis

The document table shows a content column that holds the pure text. RagFlow requires a file form field, so the text must be turned into a file before it can be uploaded.

Solution 1: Placeholder File + Manual Chunk Append

Idea

Upload a placeholder file (containing a single space) to obtain a document_id.

Use the RagFlow Chunks API to append the real text to that document.

Step 1                     Step 2
          ┌─────────────┐           ┌─────────────┐
          │ Upload placeholder │   │ Append Chunk │
          │ (content: space)   │   │ (content: real)│
          └──────┬──────┘           └──────┬──────┘
                 │                       │
                 ▼                       ▼
          POST /documents          POST /chunks

Implementation

public String uploadDocument(String datasetId, String title, String content) {
    // Step 1: create document with placeholder file
    String documentId = createDocumentWithFile(datasetId, title + ".txt");
    if (documentId == null) return null;
    // Step 2: add real content via chunks API
    boolean added = addChunk(datasetId, documentId, content);
    if (!added) {
        log.error("ragFlow addChunk failed, datasetId={}, documentId={}", datasetId, documentId);
        return null;
    }
    return documentId;
}

private String createDocumentWithFile(String datasetId, String filename) {
    String url = buildUrl("/api/v1/datasets/" + datasetId + "/documents");
    File tempFile = null;
    try {
        tempFile = File.createTempFile("ragflow_upload_", ".txt");
        Files.writeString(tempFile.toPath(), " ");
        try (HttpResponse resp = HttpRequest.post(url)
                .header("Authorization", "Bearer " + apiKey)
                .form("file", tempFile)
                .timeout(TIMEOUT_MS)
                .execute()) {
            String body = resp.body();
            JsonNode root = MAPPER.readTree(body);
            if (root.path("code").asInt(-1) != 0) {
                log.error("ragFlow createDocumentWithFile failed: {}", body);
                return null;
            }
            JsonNode data = root.path("data");
            if (data.isArray() && !data.isEmpty()) {
                return data.get(0).path("id").asText(null);
            }
            return null;
        }
    } catch (Exception e) {
        log.error("ragFlow createDocumentWithFile error", e);
        return null;
    } finally {
        if (tempFile != null && tempFile.exists()) {
            tempFile.delete();
        }
    }
}

private boolean addChunk(String datasetId, String documentId, String content) {
    String url = buildUrl("/api/v1/datasets/" + datasetId + "/documents/" + documentId + "/chunks");
    String jsonBody = MAPPER.writeValueAsString(Map.of("content", content));
    try (HttpResponse resp = HttpRequest.post(url)
            .header("Authorization", "Bearer " + apiKey)
            .header("Content-Type", "application/json")
            .body(jsonBody)
            .timeout(TIMEOUT_MS)
            .execute()) {
        String body = resp.body();
        JsonNode root = MAPPER.readTree(body);
        if (root.path("code").asInt(-1) != 0) {
            log.error("ragFlow addChunk failed: {}", body);
            return false;
        }
        return true;
    } catch (Exception e) {
        log.error("ragFlow addChunk error", e);
        return false;
    }
}

Problems Encountered

Invalid chunk : The placeholder space becomes a meaningless chunk.

Two network calls : Separate document creation and chunk addition increase latency.

Rollback difficulty : If chunk addition fails, the empty document remains as orphan data.

Large text instability : The Chunks API limits the size of a single write, requiring batch appends for long texts.

Solution 2: Direct Temporary File Upload (Current Approach)

Idea

Instead of creating a placeholder, write the actual cleaned text into a temporary .txt file and upload it in a single request. RagFlow’s built‑in text parser automatically splits the file into chunks.

┌──────────────────┐          ┌──────────────────┐
│ 1. Write text to temp file │   │ 2. multipart upload │
│    content → .txt   │ ─────► │    file → RagFlow │
└──────────────────┘          └─────────┬─────────┘
                                   │
                                   ▼
                           RagFlow parses & splits
                                   │
                                   ▼
                           Delete temporary file

Feasibility

RagFlow accepts .txt files and uses its native text parser.

Temporary files are deleted in a finally block; deletion failures are logged.

Concurrent uploads are safe because File.createTempFile() adds a random suffix (e.g., ragflow_upload_1234567890.txt).

Document names in RagFlow are random suffixes; our system maps our own IDs to RagFlow IDs, so the name does not affect retrieval.

Full Implementation

public String uploadDocument(String datasetId, String title, String content) {
    String url = buildUrl("/api/v1/datasets/" + datasetId + "/documents");
    File tempFile = null;
    try {
        tempFile = File.createTempFile("ragflow_upload_", ".txt");
        Files.writeString(tempFile.toPath(), content, StandardCharsets.UTF_8);
        try (HttpResponse resp = HttpRequest.post(url)
                .header("Authorization", "Bearer " + apiKey)
                .form("file", tempFile)
                .timeout(TIMEOUT_MS)
                .execute()) {
            String body = resp.body();
            JsonNode root = MAPPER.readTree(body);
            if (root.path("code").asInt(-1) != 0) {
                log.error("ragFlow uploadDocument failed: {}", body);
                return null;
            }
            JsonNode data = root.path("data");
            if (data.isArray() && !data.isEmpty()) {
                return data.get(0).path("id").asText(null);
            }
            return null;
        }
    } catch (Exception e) {
        log.error("ragFlow uploadDocument error, datasetId={}, title={}", datasetId, title, e);
        return null;
    } finally {
        if (tempFile != null && !tempFile.delete()) {
            log.warn("ragFlow temporary file deletion failed: {}", tempFile.getAbsolutePath());
        }
    }
}

Key Points

File.createTempFile(...)

creates a random‑named file in the system temp directory, safe for concurrent use. Files.writeString(..., StandardCharsets.UTF_8) ensures UTF‑8 encoding, avoiding platform differences. .form("file", tempFile) lets Hutool build a multipart/form-data request automatically.

The finally { … delete() } block guarantees cleanup; failures are logged.

Comparison of the Two Approaches

API calls : Scheme 1 = 2 calls, Scheme 2 = 1 call.

Network overhead : Scheme 1 requires 2 × RTT, Scheme 2 requires 1 × RTT.

Invalid chunks : Scheme 1 creates an extra empty chunk; Scheme 2 has none.

Code size : Scheme 1 ≈ 100 lines, Scheme 2 ≈ 40 lines.

Method count : Scheme 1 uses 3 methods, Scheme 2 uses a single method.

RagFlow API used : Scheme 1 calls documents and chunks; Scheme 2 calls only documents.

Large‑text handling : Scheme 1 needs manual chunk batching; Scheme 2 lets RagFlow handle it automatically.

Rollback : Scheme 1 needs explicit cleanup of orphan documents; Scheme 2 has no rollback requirement.

While Scheme 1 allows precise control over each chunk, in most scenarios RagFlow’s automatic chunking is sufficient, and Scheme 2 offers a cleaner, more efficient workflow with lower maintenance cost.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

backendJavafile-uploadapitemporary-fileragflowrags
Tech Musings
Written by

Tech Musings

Capturing thoughts and reflections while coding.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.