Artificial Intelligence 19 min read

How to Make Large Language Models Understand Third‑Party Java Packages: From Failure to Success

This article explains why AI coding assistants like Cursor and Claude fail to read external Java libraries, explores naive "feed‑the‑code" tricks, evaluates built‑in IDE tools, and ultimately presents a robust solution using a local decompilation pipeline (MCP) that lets LLMs query class definitions and generate correct backend code.

DaTaobao Tech

Sep 8, 2025

How to Make Large Language Models Understand Third‑Party Java Packages: From Failure to Success

Problem

When developers ask AI assistants (e.g., Cursor, Claude) to generate code that calls methods from third‑party Java packages, the models often hallucinate or produce compilation errors because they cannot read the compiled .class files that are not opened in the current project.

Failed Attempts

Directly feeding source snippets to the model works only for tiny examples and quickly exceeds token limits. Using IDE‑integrated tools such as AoneCopilot’s Add File To Chat or Claude’s read_file can read a JAR’s contents, but they still require manual file selection and only work when source code is available.

Solutions

1. Manual "feed‑the‑code"

Copy‑paste the needed classes into the chat. Simple but labor‑intensive and impractical for large libraries.

2. Built‑in tool reading

Both AoneCopilot and Claude provide a read_file ‑like tool that can fetch a class from a JAR, but the agent must be explicitly told which class to read, and the approach still depends on the presence of source files.

3. Local Decompilation (MCP) – the preferred method

By building a local index of all dependencies ( mvn dependency:tree + jar tf) and linking each fully‑qualified class name to its JAR, a custom Java Class Analyzer tool can decompile the class on demand using cfr (or any other decompiler). The LLM calls this tool whenever it needs to understand a class, receiving the real source code instantly.

Implementation Details

1. mvn dependency:tree -DoutputType=text -o generates the list of third‑party JARs. 2. For each JAR, jar tf "${jarPath}" | grep '\.class$' extracts class names and builds a map className → jarPath. 3. The Java Class Analyzer receives a fully‑qualified class name, looks up the JAR, runs cfr on the .class file, and returns the decompiled source inside a <code> block. 4. The LLM can now resolve method signatures such as com.tb.xxx.yyy.api.xxxService.queryMainAndDetail or com.tb.ccc.ddd.service.xxxBaseService.queryMainAndDetail, generate correct imports, and produce working code.

Using this pipeline, the author successfully implemented a batch order‑query method

List<BizOrderDO> queryBizOrderList(long userId, List<Long> subOrderIdList) throws TCException

that first queries the online database, falls back to the historical database for missing IDs, merges results, and returns them in the original order.

Evaluation

The manual feeding approach is quick for one‑off tasks but does not scale. Tool‑assisted reading reduces manual work but still requires explicit file selection. The MCP‑based solution provides a "search‑engine‑like" experience for the LLM, works with any JAR (source or binary), and yields accurate, import‑ready code with minimal hallucination.

Conclusion

Treat LLMs as powerful but bounded assistants. By supplying them with a reliable code‑lookup service (MCP + Java Class Analyzer), developers can bridge the gap between AI reasoning and real‑world codebases, turning AI from a "magician" into a dependable "tool‑person".