Unlocking LLM Secrets: From Prompt Basics to RAG and Tool Integration

This article introduces the fundamental paradigms of large language models, explaining how simple prompts, messages, and tools like RAG and ReAct enable powerful applications, while providing practical code examples, translation strategies, and insights on prompt engineering, tool usage, and model fine‑tuning.

Alibaba Cloud Developer
Alibaba Cloud Developer
Alibaba Cloud Developer
Unlocking LLM Secrets: From Prompt Basics to RAG and Tool Integration

Introduction

The author shares a beginner‑level guide on large language models (LLMs) for a peer team, covering basic LLM patterns and hoping to help more people participate in LLM application development.

Model Basics

Although many LLMs exist on the market, they share a common API style similar to OpenAI. From the interface perspective, the two core parameters are messages and tools. All LLM applications ultimately rely on these two parameters.

Messages – How LLMs Remember Context

The messages field is an array of dialogue turns, typically containing the following roles:

system : system instruction or prompt.

user : user request.

assistant : LLM response.

...: vendor‑specific extensions.

Conversation memory is achieved by passing the entire messages array to the model each time; the model itself is stateless.

Tools – Can LLMs Execute Any Tool?

LLMs cannot directly run external programs, but they can decide which tool to invoke and pass the required parameters. A tool definition includes a description and its parameters. Example: a weather‑query tool.

Using a tool typically involves three steps:

Provide the user query together with the list of tools; the LLM decides which tool to call and with what arguments.

The backend executes the selected tool and returns the result.

A second LLM call receives the tool result in the context and generates the final answer.

Retrieval‑Augmented Generation (RAG)

RAG combines external knowledge retrieval with LLM generation. The workflow is:

Knowledge retrieval : query a knowledge base to obtain the most relevant passages.

LLM answering : feed the retrieved passages to the LLM so it can produce a confident, accurate response.

RAG is useful for building Q&A bots that answer from a curated knowledge base.

ReAct (Reason + Act)

ReAct models human problem‑solving behavior by alternating reasoning steps and tool actions. It is especially useful when the task requires multiple sub‑steps or external tool calls.

Example scenario: a boss asks for a report on external LLM frameworks. The workflow uses three tools – internet search, PPT generation, and mind‑map creation – and follows a Reason‑Act‑Observe loop.

思考 (Reasoning): Need to research mainstream LLM frameworks, collect their features and use cases.
行动 (Acting): Search the web for "mainstream LLM frameworks 2024/2025" and "open source LLM frameworks".
观察 (Observation): Results show PyTorch, TensorFlow, JAX, MindSpore, Paddle, etc.
思考 (Reasoning): Build a mind‑map for the report structure (framework name, language, features, pros, cons, scenarios).
行动 (Acting): Use a mind‑map tool to create the outline.
观察 (Observation): Outline includes name, language, dynamic/static graph, distributed training support, etc.
思考 (Reasoning): Assemble the information into a PPT.
行动 (Acting): Use a PPT tool to generate slides from the mind‑map.
观察 (Observation): Draft PPT created, needs polishing.

Prompt Engineering for Translation

The article demonstrates three translation strategies using LLMs:

Solution 1 – Direct Prompt : Simple role‑based prompt that asks the model to translate English movie lines into Chinese.

Solution 2 – Chain‑of‑Thought (CoT) : Adds a reasoning step where the model first translates, then evaluates and refines the translation.

Solution 3 – Few‑Shot : Provides a few example translations in the prompt, allowing the model to infer the desired style.

Example source sentences and model outputs (using Gemini‑2.0 Flash) are shown in a table.

Source

Gemini‑2.0 Flash

The prejudice in people's hearts is like a mountain. No matter how hard you try, you can't move it.

人心中的成见就像一座大山,任你怎么努力也无法搬动。

Looking back on it, three years isn't that long.

如今想来,三年光阴,也不算长。

Be quick to obey my command

还不快快听我号令!

I'm the captain of my destiny, not heaven.

我命由我定,不由天!

If you ask me whether people can change their own destiny, I don't know. But defying fate is Nezha's destiny.

要问我人能否改变自己的命运,我不知道。但是,逆天而行,就是哪吒的命。

Few‑shot prompting is highlighted as a lightweight, high‑impact optimization that can satisfy the majority of scenarios with a small set of examples.

LLM + Agent as a Calculator (Increasing Call Count)

Because LLMs are weak at arithmetic, the article shows how to wrap simple arithmetic functions as tools and let a ReAct‑based agent orchestrate the calculation step‑by‑step.

import os
from dotenv import load_dotenv
from llama_index.core.agent import ReActAgent
from llama_index.core.tools import FunctionTool
from llama_index.llms.azure_openai import AzureOpenAI

def multiply(a: int, b: int) -> int:
    """Multiply two integers and return the result"""
    return a * b

def add(a: int, b: int) -> int:
    """Add two integers and return the result"""
    return a + b

def subtract(a: int, b: int) -> int:
    """Subtract two integers and return the result"""
    return a - b

load_dotenv()
multiply_tool = FunctionTool.from_defaults(fn=multiply)
add_tool = FunctionTool.from_defaults(fn=add)
subtract_tool = FunctionTool.from_defaults(fn=subtract)

llm = AzureOpenAI(model="gpt-4o", engine='gpt-4o', deployment_name="gpt-4o", api_key=os.getenv('AZURE_KEY'), azure_endpoint="https://ilm-dev.openai.azure.com", api_version="2023-07-01-preview")

agent = ReActAgent.from_tools([multiply_tool, add_tool, subtract_tool], llm=llm, verbose=True)
response = agent.chat("What is 60-(20+(2*4))? Calculate step by step ")

The agent’s reasoning trace is displayed, showing the three tool calls (multiply → 8, add → 28, subtract → 32) and the final answer.

> Running step ...
Thought: The current language of the user is English. I need to use a tool to help me answer the question.
Action: multiply
Action Input: {'a': 2, 'b': 4}
Observation: 8
...
Answer: The result of the expression 60-(20+(2*4)) is 32.

The same approach works with natural‑language placeholders for tools (e.g., "张三代表减法"), demonstrating LLMs' tolerance for varied phrasing.

Model Fine‑Tuning

Fine‑tuning can embed domain‑specific knowledge (e.g., distinguishing bank statements from transaction logs) directly into the model, but it is time‑consuming and often less effective than sophisticated prompt engineering and dynamic fallback mechanisms.

Some Reflections

LLMs can dramatically change problem‑solving approaches, but their impact on business outcomes is not absolute. Over‑reliance on LLMs without considering core service factors—such as operational capability and human‑like empathy in customer service—leads to sub‑optimal results. Successful applications start by identifying the real business need, then evaluate where LLMs add genuine value.

In summary, the article covers the essential LLM paradigms (messages, tools, RAG, ReAct), practical prompt‑engineering techniques (direct prompting, CoT, few‑shot), code examples for tool integration, and strategic advice on when and how to employ LLMs effectively.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

AIReactlarge language modelsRAGLLM applications
Alibaba Cloud Developer
Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.