Evolving RagFlow Text Upload: From Placeholder Files to Direct Temp‑File Upload
The article compares two Java‑based integration methods for sending pure‑text documents to RagFlow—first uploading an empty placeholder file then adding chunks, and later writing the text to a temporary file and uploading it directly—detailing implementation, pitfalls, and why the latter is preferred.
Background
We need to sync pure‑text content from our knowledge base to the RagFlow engine for Retrieval‑Augmented Generation (RAG), but RagFlow’s document upload API only accepts multipart files, not raw strings. Our system stores cleaned text fields without physical files.
Problem Analysis
The document table shows a content column that holds the pure text. RagFlow requires a file form field, so the text must be turned into a file before it can be uploaded.
Solution 1: Placeholder File + Manual Chunk Append
Idea
Upload a placeholder file (containing a single space) to obtain a document_id.
Use the RagFlow Chunks API to append the real text to that document.
Step 1 Step 2
┌─────────────┐ ┌─────────────┐
│ Upload placeholder │ │ Append Chunk │
│ (content: space) │ │ (content: real)│
└──────┬──────┘ └──────┬──────┘
│ │
▼ ▼
POST /documents POST /chunksImplementation
public String uploadDocument(String datasetId, String title, String content) {
// Step 1: create document with placeholder file
String documentId = createDocumentWithFile(datasetId, title + ".txt");
if (documentId == null) return null;
// Step 2: add real content via chunks API
boolean added = addChunk(datasetId, documentId, content);
if (!added) {
log.error("ragFlow addChunk failed, datasetId={}, documentId={}", datasetId, documentId);
return null;
}
return documentId;
}
private String createDocumentWithFile(String datasetId, String filename) {
String url = buildUrl("/api/v1/datasets/" + datasetId + "/documents");
File tempFile = null;
try {
tempFile = File.createTempFile("ragflow_upload_", ".txt");
Files.writeString(tempFile.toPath(), " ");
try (HttpResponse resp = HttpRequest.post(url)
.header("Authorization", "Bearer " + apiKey)
.form("file", tempFile)
.timeout(TIMEOUT_MS)
.execute()) {
String body = resp.body();
JsonNode root = MAPPER.readTree(body);
if (root.path("code").asInt(-1) != 0) {
log.error("ragFlow createDocumentWithFile failed: {}", body);
return null;
}
JsonNode data = root.path("data");
if (data.isArray() && !data.isEmpty()) {
return data.get(0).path("id").asText(null);
}
return null;
}
} catch (Exception e) {
log.error("ragFlow createDocumentWithFile error", e);
return null;
} finally {
if (tempFile != null && tempFile.exists()) {
tempFile.delete();
}
}
}
private boolean addChunk(String datasetId, String documentId, String content) {
String url = buildUrl("/api/v1/datasets/" + datasetId + "/documents/" + documentId + "/chunks");
String jsonBody = MAPPER.writeValueAsString(Map.of("content", content));
try (HttpResponse resp = HttpRequest.post(url)
.header("Authorization", "Bearer " + apiKey)
.header("Content-Type", "application/json")
.body(jsonBody)
.timeout(TIMEOUT_MS)
.execute()) {
String body = resp.body();
JsonNode root = MAPPER.readTree(body);
if (root.path("code").asInt(-1) != 0) {
log.error("ragFlow addChunk failed: {}", body);
return false;
}
return true;
} catch (Exception e) {
log.error("ragFlow addChunk error", e);
return false;
}
}Problems Encountered
Invalid chunk : The placeholder space becomes a meaningless chunk.
Two network calls : Separate document creation and chunk addition increase latency.
Rollback difficulty : If chunk addition fails, the empty document remains as orphan data.
Large text instability : The Chunks API limits the size of a single write, requiring batch appends for long texts.
Solution 2: Direct Temporary File Upload (Current Approach)
Idea
Instead of creating a placeholder, write the actual cleaned text into a temporary .txt file and upload it in a single request. RagFlow’s built‑in text parser automatically splits the file into chunks.
┌──────────────────┐ ┌──────────────────┐
│ 1. Write text to temp file │ │ 2. multipart upload │
│ content → .txt │ ─────► │ file → RagFlow │
└──────────────────┘ └─────────┬─────────┘
│
▼
RagFlow parses & splits
│
▼
Delete temporary fileFeasibility
RagFlow accepts .txt files and uses its native text parser.
Temporary files are deleted in a finally block; deletion failures are logged.
Concurrent uploads are safe because File.createTempFile() adds a random suffix (e.g., ragflow_upload_1234567890.txt).
Document names in RagFlow are random suffixes; our system maps our own IDs to RagFlow IDs, so the name does not affect retrieval.
Full Implementation
public String uploadDocument(String datasetId, String title, String content) {
String url = buildUrl("/api/v1/datasets/" + datasetId + "/documents");
File tempFile = null;
try {
tempFile = File.createTempFile("ragflow_upload_", ".txt");
Files.writeString(tempFile.toPath(), content, StandardCharsets.UTF_8);
try (HttpResponse resp = HttpRequest.post(url)
.header("Authorization", "Bearer " + apiKey)
.form("file", tempFile)
.timeout(TIMEOUT_MS)
.execute()) {
String body = resp.body();
JsonNode root = MAPPER.readTree(body);
if (root.path("code").asInt(-1) != 0) {
log.error("ragFlow uploadDocument failed: {}", body);
return null;
}
JsonNode data = root.path("data");
if (data.isArray() && !data.isEmpty()) {
return data.get(0).path("id").asText(null);
}
return null;
}
} catch (Exception e) {
log.error("ragFlow uploadDocument error, datasetId={}, title={}", datasetId, title, e);
return null;
} finally {
if (tempFile != null && !tempFile.delete()) {
log.warn("ragFlow temporary file deletion failed: {}", tempFile.getAbsolutePath());
}
}
}Key Points
File.createTempFile(...)creates a random‑named file in the system temp directory, safe for concurrent use. Files.writeString(..., StandardCharsets.UTF_8) ensures UTF‑8 encoding, avoiding platform differences. .form("file", tempFile) lets Hutool build a multipart/form-data request automatically.
The finally { … delete() } block guarantees cleanup; failures are logged.
Comparison of the Two Approaches
API calls : Scheme 1 = 2 calls, Scheme 2 = 1 call.
Network overhead : Scheme 1 requires 2 × RTT, Scheme 2 requires 1 × RTT.
Invalid chunks : Scheme 1 creates an extra empty chunk; Scheme 2 has none.
Code size : Scheme 1 ≈ 100 lines, Scheme 2 ≈ 40 lines.
Method count : Scheme 1 uses 3 methods, Scheme 2 uses a single method.
RagFlow API used : Scheme 1 calls documents and chunks; Scheme 2 calls only documents.
Large‑text handling : Scheme 1 needs manual chunk batching; Scheme 2 lets RagFlow handle it automatically.
Rollback : Scheme 1 needs explicit cleanup of orphan documents; Scheme 2 has no rollback requirement.
While Scheme 1 allows precise control over each chunk, in most scenarios RagFlow’s automatic chunking is sufficient, and Scheme 2 offers a cleaner, more efficient workflow with lower maintenance cost.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
