Understanding LangChain Callback Mechanism, Custom Async Handlers, and Token Cost Management in Python
This article introduces LangChain's callback mechanism, demonstrates how to implement custom synchronous and asynchronous callbacks in Python, compares them with JavaScript async patterns, and shows how to monitor token usage and control costs using OpenAI callbacks.
In the preface, the author notes that 2024 will be a harvest year for artificial intelligence and decides to review LangChain, a popular framework for building LLM applications.
AI practice: RAG for self‑service model rooms – introduces document loading and embedding.
LangChain practice: Text‑to‑SQL – showcases a new paradigm for large‑model databases.
LangChain practice: SequentialChain – explains LangChain's chain‑based workflow.
The article then focuses on learning the callback mechanism, referencing a previous discussion of AutoGen callbacks and preparing a comparison.
Callbacks and Asynchrony
For developers familiar with JavaScript, callbacks and async programming are common in event listeners, Ajax requests, and timers. The same concepts are demonstrated in Python using asyncio :
# Python asyncio module for asynchronous programming
import asyncio
async def rectangleArea(w, h, callback):
print("开始计算矩形的面积...")
await asyncio.sleep(0.5)
# = x * y
print("计算结束")
async def circleArea():
print("开始圆形计算")
await asyncio.sleep(1)
print("完成圆形计算")
async def main():
print("主线程开始...")
task1 = asyncio.create_task(rectangleArea(3, 4, print_result))
task2 = asyncio.create_task(circleArea())
await task1
await task2
print("主线程结束...")
asyncio.run(main())When the code reaches await asyncio.sleep() , the current task pauses and the event loop runs other tasks, illustrating asynchronous behavior similar to JavaScript's async/await .
LangChain Callback Mechanism
LangChain relies heavily on CallbackHandler objects for logging, monitoring, and data flow control. A simple example writes the LLM output to an output.log file:
from loguru import logger
from langchain.callbacks import FileCallbackHandler
from langchain.chain import LLMChain
from langchain.prompts import PromptTemplate
logfile = "output.log"
logger.add(logfile, colorize=True, enqueue=True)
handler = FileCallbackHandler(logfile)
llm = OpenAI()
prompt = PromptTemplate.from_template("1 + {number} = ")
chain = LLMChain(llm=llm, prompt=prompt, callbacks=[handler], verbose=True)
answer = chain.run(number=2)
logger.info(answer)During LLM execution, the provided FileCallbackHandler captures the result and writes it to the log file.
Custom Callback Functions
The following code shows how to create custom synchronous and asynchronous callback handlers for a hypothetical "dry‑food shop" chatbot.
# Imports
import asyncio
from typing import Any, Dict, List
from langchain.chat_models import ChatOpenAI
from langchain.schema import LLMResult, HumanMessage
from langchain.callbacks.base import AsyncCallbackHandler, BaseCallbackHandler # Synchronous handler based on BaseCallbackHandler
class MyDryFoodShopSyncHandler(BaseCallbackHandler):
def on_llm_new_token(self, token: str, **kwargs) -> None:
print(f"干货数据: token: {token}") # Asynchronous handler
class MyDryFoodAsyncHandler(AsyncCallbackHandler):
async def on_llm_start(self, serialized: Dict[str, Any], prompts: List[str], **kwargs: Any) -> None:
print("正在获取干货数据...")
await asyncio.sleep(0.5)
print("干货数据获取完毕。提供建议...")
async def on_llm_end(self, response: LLMResult, **kwargs: Any) -> None:
print("整理干货建议...")
await asyncio.sleep(0.5)
print("祝你买货愉快!") # Main async function invoking the chatbot with both handlers
async def main():
drayfood_shop_chat = ChatOpenAI(
max_tokens=100,
streaming=True,
callbacks=[MyDryFoodShopSyncHandler(), MyDryFoodAsyncHandler()],
)
await drayfood_shop_chat.agenerate([[HumanMessage(content="哪种干货最适合炖鸡?只简单说3种,不超过60字")]])
asyncio.run(main())When a user asks about suitable dried goods for soup, the chatbot prints a message for each new token, logs messages before and after the OpenAI call, and finally wishes the user a pleasant purchase.
Calculating Token Usage and Cost Control
from langchain import OpenAI
from langchain.chains import ConversationChain
from langchain.chains.conversation.memory import ConversationBufferMemory
llm = OpenAI(temperature=0.5, model_name="gpt-3.5-turbo-instruct")
conversation = ConversationChain(llm=llm, memory=ConversationBufferMemory())
# Sample dialogue
conversation("我家明天要开party,我需要一些干海货。")
print("第一次对话后的记忆:", conversation.memory.buffer)
conversation("爷爷喜欢虾干,一两一只的。")
print("第二次对话后的记忆:", conversation.memory.buffer)
conversation("我又来了,还记得我昨天为什么要买干海货吗?")
print("第三次对话后提示:", conversation.prompt.template)
print("第三次对话后的记忆:", conversation.memory.buffer)To obtain precise token counts, the get_openai_callback context manager can be used:
from langchain import OpenAI
from langchain.chains import ConversationChain
from langchain.chains.conversation.memory import ConversationBufferMemory
from langchain.callbacks import get_openai_callback
llm = OpenAI(temperature=0.5, model_name="gpt-3.5-turbo-instruct")
conversation = ConversationChain(llm=llm, memory=ConversationBufferMemory())
with get_openai_callback() as cb:
conversation("我家明天要开party,我需要一些干海货。")
conversation("爷爷喜欢虾干,一两一只的。")
conversation("我又来了,还记得我昨天为什么要买干海货吗?")
print("\n总计使用的tokens:", cb.total_tokens)The callback reports the total number of tokens used (e.g., 1023), allowing developers to monitor and manage LLM costs effectively.
Conclusion
By leveraging LangChain's callback system, developers can handle token accounting, logging, and custom workflow steps, making it easier to control costs and gain insight into LLM interactions.
Rare Earth Juejin Tech Community
Juejin, a tech community that helps developers grow.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.