Can AI Automate the Entire Research Cycle? From Paper Reading to Code Reproduction
The author builds an AI‑driven end‑to‑end assistant that transforms a research paper into a structured reading note, generates reproducible code, runs experiments, summarizes results, and creates a report, demonstrating how large language models like Kimi K2 can streamline the entire paper‑to‑implementation workflow.
Pipeline Overview
End‑to‑end assistant that automates the workflow from academic paper to reproducible code, experiment execution, result summarization, and final report using large language models (Kimi K2).
Step 1 – Paper Analysis
Upload a PDF to the Moonshot API, retrieve the extracted text, and send a structured prompt to the model. The model returns a reading note with sections: research problem, method, dataset, hyper‑parameters, and conclusions.
npm install -g @anthropic-ai/claude-code
# Set environment variables for Kimi‑v2 model
export ANTHROPIC_BASE_URL=https://api.moonshot.cn/anthropic
export ANTHROPIC_AUTH_TOKEN=apikey
export ANTHROPIC_MODEL=kimi-k2
export ANTHROPIC_SMALL_FAST_MODEL=kimi-k2-0905-preview # Upload file to Moonshot API
file_object = client.files.create(
file=Path(tmp_file_path),
purpose="file-extract"
)
# Retrieve file content
file_content = client.files.content(file_id=file_object.id).text
# Build structured prompt
messages = [
{"role": "system", "content": "You are a Kimi assistant for paper analysis."},
{"role": "system", "content": file_content},
{"role": "user", "content": "Please analyze the paper and output a reading note with sections: problem, method, dataset, hyper‑parameters, conclusions."}
]The response is a concise, machine‑readable note containing the required fields.
Step 2 – Code Generation
Based on the extracted method and dataset schema, a second prompt asks the model to generate a complete reproducibility package (data‑cleaning script, training script, etc.). Code blocks are extracted with a regular expression, written to files, and compressed into a ZIP archive for download.
# Prompt for code generation
classification_prompt = """Please analyze the code‑related parts of the paper and generate a complete reproducibility package, including data‑cleaning scripts and a training script. Return each file as a fenced code block."""
# Extract code blocks
code_pattern = r'```(?:python|py|txt|markdown|md)?
(.*?)
```'
code_blocks = re.findall(code_pattern, answer, re.DOTALL)
# Create ZIP archive
zip_buffer = io.BytesIO()
with zipfile.ZipFile(zip_buffer, 'w', zipfile.ZIP_DEFLATED) as zip_file:
for file_name, content in generated_files.items():
zip_file.writestr(file_name, content)The generated ZIP can be executed directly; the first version may contain bugs that require a few correction cycles.
Step 3 – Document Comparison
Both the original paper and the generated report are uploaded, processed in parallel, and a comparison prompt produces a concise Markdown summary.
# Upload two documents
file_object_a = client.files.create(file=Path(tmp_file_path_a), purpose="file-extract")
file_object_b = client.files.create(file=Path(tmp_file_path_b), purpose="file-extract")
# Comparison prompt
comparison_prompt = f"""Please compare the two documents for {comparison_type} analysis and highlight differences in methodology, results, and conclusions."""
# Generate summary
summary_prompt = """Based on the comparison, generate a brief summary in Markdown format."""Step 4 – Result Summarization
Experiment logs are parsed to automatically generate tables and visual curves (e.g., loss vs. epoch). The figures are saved as image files and referenced in the Markdown report.
Step 5 – Report Generation
A final prompt assembles the reading note, code package description, experiment summary, and comparison into a Markdown (or PPT) outline ready for presentation.
Technical Observations
Kimi K2 supports up to 256 KB context, enabling a full 80‑page PDF to be processed in a single request.
Compared with Claude Code, Kimi K2 produces more stable code and handles long‑document reasoning better, though generated conclusions can be generic and occasional rendering errors in tables/figures require manual adjustment.
The workflow dramatically reduces manual effort for paper reading, code scaffolding, and result reporting, but iterative debugging of the generated code remains necessary.
Wu Shixiong's Large Model Academy
We continuously share large‑model know‑how, helping you master core skills—LLM, RAG, fine‑tuning, deployment—from zero to job offer, tailored for career‑switchers, autumn recruiters, and those seeking stable large‑model positions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
