Artificial Intelligence 10 min read

Accelerate Research 10× with Academic-Search: Open‑Source AI Literature Retrieval

Academic‑Search is an open‑source AI‑powered literature retrieval skill that unifies multi‑platform search, deduplication, citation tracking, BibTeX export, PDF download, and code completion, dramatically accelerating research workflows by up to ten times while integrating smoothly with agents like AutoGPT and LangChain.

PaperAgent

Apr 4, 2026

Accelerate Research 10× with Academic-Search: Open‑Source AI Literature Retrieval

Academic-Search is an open‑source skill released by the State Key Lab of Cognitive Intelligence (USTC) that automates frequent academic tasks such as literature search, deduplication, citation tracking, BibTeX export, PDF download, and code‑link completion. It is implemented in Python and is intended to be called from large‑language‑model (LLM) agents.

Re‑engineering the literature‑search workflow

Typical keyword queries on a single platform return thousands of results, making it difficult to identify state‑of‑the‑art papers, track new conference releases, and keep results organized across sources such as arXiv, Semantic Scholar, Google Scholar, PubMed, Papers with Code, ACM DL, and IEEE Xplore. Academic‑Search consolidates these steps into a single assistant that performs joint retrieval, result deduplication, time‑priority ranking, and provides ready‑to‑use metadata.

Two‑pass retrieval strategy

The tool first returns a lightweight summary table (title, authors, year, citation count). After the user confirms which entries are of interest, Academic‑Search fetches the full metadata (abstract, venue, PDF link, code repository). If the user explicitly requests a fixed number of top results (e.g., “top 10”), the second pass is skipped and the full records are returned directly.

Automatic query expansion

For each user query the system automatically generates 2–3 complementary terms—such as synonyms, sub‑concepts, or abbreviations—to improve recall. This reduces missed papers caused by overly narrow keyword choices.

Example usage

Command (Python):

from academic_search import search_papers
results = search_papers("Time Series Forecasting with LLM", top_k=10)
print(results.to_json())

Observed behavior:

Fast response: the top‑10 highly‑cited papers are returned within seconds.

Effective filtering: non‑academic pages (blogs, news) are removed automatically.

Structured output: results are provided as JSON or Markdown, ready for note‑taking or downstream LLM summarisation.

Core capabilities

Multi‑platform joint retrieval (Semantic Scholar, arXiv, etc.) and unified result aggregation.

Automatic deduplication and frontier‑first ranking based on citation count and recency.

BibTeX export, direct PDF download, and code‑link completion for papers with associated repositories.

Robust error handling for sites with rate limits or authentication requirements.

Agent‑oriented design

Academic‑Search is packaged as a “Skill” that can be imported into AutoGPT, LangChain, or custom agentic frameworks. The exposed functions include:

search_papers(query: str, top_k: int = 10) -> ResultSet

get_paper_details(paper_id: str) -> PaperMetadata

format_for_llm(paper: PaperMetadata) -> str

These functions return structured data that LLMs can consume without additional parsing.

Simple deployment

Installation steps:

Clone the repository:

git clone https://github.com/ustc-ai4science/academic-search.git

Install dependencies: pip install -r requirements.txt Configure an API key for the underlying search services (e.g., Semantic Scholar API key) in config.yaml.

Run the example script or import the library in your own agent code.

Getting started

The source code and documentation are hosted at https://github.com/ustc-ai4science/academic-search. The README provides detailed usage examples and explains how to extend the skill for additional databases.

Conclusion

Academic‑Search demonstrates a practical approach to modularising high‑frequency research tasks. By exposing a clean API and supporting multi‑source retrieval, it enables LLM‑driven agents to perform literature scouting, citation tracking, and resource collection as part of an end‑to‑end research workflow.

Python LLM integration Research Automation academic tools AI literature search

Written by

PaperAgent

Daily updates, analyzing cutting-edge AI research papers

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.