Accelerate Research 10× with Academic-Search: Open‑Source AI Literature Retrieval
Academic‑Search is an open‑source AI‑powered literature retrieval skill that unifies multi‑platform search, deduplication, citation tracking, BibTeX export, PDF download, and code completion, dramatically accelerating research workflows by up to ten times while integrating smoothly with agents like AutoGPT and LangChain.
Academic-Search is an open‑source skill released by the State Key Lab of Cognitive Intelligence (USTC) that automates frequent academic tasks such as literature search, deduplication, citation tracking, BibTeX export, PDF download, and code‑link completion. It is implemented in Python and is intended to be called from large‑language‑model (LLM) agents.
Re‑engineering the literature‑search workflow
Typical keyword queries on a single platform return thousands of results, making it difficult to identify state‑of‑the‑art papers, track new conference releases, and keep results organized across sources such as arXiv, Semantic Scholar, Google Scholar, PubMed, Papers with Code, ACM DL, and IEEE Xplore. Academic‑Search consolidates these steps into a single assistant that performs joint retrieval, result deduplication, time‑priority ranking, and provides ready‑to‑use metadata.
Two‑pass retrieval strategy
The tool first returns a lightweight summary table (title, authors, year, citation count). After the user confirms which entries are of interest, Academic‑Search fetches the full metadata (abstract, venue, PDF link, code repository). If the user explicitly requests a fixed number of top results (e.g., “top 10”), the second pass is skipped and the full records are returned directly.
Automatic query expansion
For each user query the system automatically generates 2–3 complementary terms—such as synonyms, sub‑concepts, or abbreviations—to improve recall. This reduces missed papers caused by overly narrow keyword choices.
Example usage
Command (Python):
from academic_search import search_papers
results = search_papers("Time Series Forecasting with LLM", top_k=10)
print(results.to_json())Observed behavior:
Fast response: the top‑10 highly‑cited papers are returned within seconds.
Effective filtering: non‑academic pages (blogs, news) are removed automatically.
Structured output: results are provided as JSON or Markdown, ready for note‑taking or downstream LLM summarisation.
Core capabilities
Multi‑platform joint retrieval (Semantic Scholar, arXiv, etc.) and unified result aggregation.
Automatic deduplication and frontier‑first ranking based on citation count and recency.
BibTeX export, direct PDF download, and code‑link completion for papers with associated repositories.
Robust error handling for sites with rate limits or authentication requirements.
Agent‑oriented design
Academic‑Search is packaged as a “Skill” that can be imported into AutoGPT, LangChain, or custom agentic frameworks. The exposed functions include:
search_papers(query: str, top_k: int = 10) -> ResultSet get_paper_details(paper_id: str) -> PaperMetadata format_for_llm(paper: PaperMetadata) -> strThese functions return structured data that LLMs can consume without additional parsing.
Simple deployment
Installation steps:
Clone the repository:
git clone https://github.com/ustc-ai4science/academic-search.gitInstall dependencies: pip install -r requirements.txt Configure an API key for the underlying search services (e.g., Semantic Scholar API key) in config.yaml.
Run the example script or import the library in your own agent code.
Getting started
The source code and documentation are hosted at https://github.com/ustc-ai4science/academic-search. The README provides detailed usage examples and explains how to extend the skill for additional databases.
Conclusion
Academic‑Search demonstrates a practical approach to modularising high‑frequency research tasks. By exposing a clean API and supporting multi‑source retrieval, it enables LLM‑driven agents to perform literature scouting, citation tracking, and resource collection as part of an end‑to‑end research workflow.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
