Artificial Intelligence 15 min read

Implementing Plugin Functionality for a Large Language Model Chatbot Using Function Calling and Asynchronous Execution

This article explains how Ctrip's security R&D team built a web‑based LLM chatbot with version‑2.0 features such as plugin support, function calling, synchronous and asynchronous execution, WebSocket/Socket.IO communication, and provides full Python code examples for defining and invoking plugins.

Ctrip Technology

Jan 26, 2024

Implementing Plugin Functionality for a Large Language Model Chatbot Using Function Calling and Asynchronous Execution

Author : Cheng Xue, senior security R&D engineer at Ctrip, focuses on Python/Golang backend development and large language models.

Background : In early 2023 large language models (LLMs) became a hot topic. Ctrip launched an internal LLM‑powered chatbot (Web 1.0) and, after six months of user feedback, released Web 2.0 with a simplified UI, conversation history, custom settings, plugin support, and AI drawing capabilities.

Requirement Research – Function Calling : Modern LLMs provide a Function Calling capability that lets developers describe functions (name, description, parameters) and send these definitions together with a user query. The model selects the most relevant function, returns the function name and arguments, and the developer executes the actual business logic. An example shows how a weather‑query function can be invoked to answer “What’s the weather in Shanghai today?”. The article also notes that a well‑crafted prompt can mimic this behavior, but Function Calling standardizes the process.

Requirement Research – Asynchronous Execution : Some plugins (e.g., ping, IP scan) are time‑consuming and must run asynchronously. The solution is to push results to the front‑end via WebSocket. WebSocket provides full‑duplex communication, low overhead, binary support, and extensibility. Socket.IO is compared as a higher‑level library that adds fallback mechanisms, richer APIs, and automatic reconnection, but it is not directly compatible with a pure WebSocket server.

Basic Implementation – Model : The demo uses the open‑source Chinese LLM ChatGLM‑3 (released by Zhipu AI and Tsinghua KEG).

3.1 Defining Plugins

{
    "google": {
        "name_cn": "谷歌搜索",
        "sync": True,
        "message": "{result}",
        "info": {
            "name": "google",
            "description": "When a question requires real‑time search, use Google search",
            "parameters": {
                "type": "object",
                "properties": {
                    "keyword": {"type": "string", "description": "Search keyword"}
                },
                "required": ["keyword"]
            }
        }
    },
    "ping": {
        "name_cn": "ping",
        "sync": False,
        "message": "Ping task is long; result will be sent later.",
        "info": {
            "name": "ping",
            "description": "Ping an IP address or domain",
            "parameters": {
                "type": "object",
                "properties": {"addr": {"type": "string", "description": "Target IP or domain"}},
                "required": ["addr"]
            }
        }
    }
}

The corresponding Python functions are implemented in a Functions class:

class Functions:
    @classmethod
    def ping(cls, **kwargs):
        """Ping implementation"""
        # omitted ping code
        pass

    @classmethod
    def google(cls, **kwargs):
        """Google search implementation"""
        keyword = kwargs['keyword']
        search_context = []
        res = server['service'].cse().list(q=keyword, cx=server['cx']).execute()
        for row in res.get('items', []):
            search_context.append(row['snippet'])
        prompt = [{"role": "user", "content": f"Please answer the question using the following content: {keyword}
" + "
".join(search_context)}]
        return reply_text(prompt)

3.2 Using Function Calling

import torch
from transformers import AutoTokenizer, AutoModel

def main():
    """Run the chatbot with plugins"""
    DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'
    tokenizer = AutoTokenizer.from_pretrained('/home/chatglm3-6b', trust_remote_code=True)
    model = AutoModel.from_pretrained('/home/chatglm3-6b', trust_remote_code=True).to(DEVICE).eval()
    tools = [plugin['info'] for plugin in all_plugins.values()]
    history = [{"role": "system", "content": "Answer the following questions as best as you can. You have access to the following tools:", "tools": tools}]
    response, _ = model.chat(tokenizer, query, history=history)
    plugin_name = response.get("name", "")
    arguments = response.get("parameters", {})
    if not plugin_name:
        return None
    plugin = all_plugins[plugin_name]
    func = getattr(Functions, plugin_name)
    res = func(**arguments)
    return res

3.3 Asynchronous Plugin Execution

The backend uses Flask with Flask‑SocketIO. After wrapping the Flask app with SocketIO, the server can emit messages to a specific user when an asynchronous task finishes.

from flask import Flask
from flask_socketio import SocketIO

web_app = Flask(__name__, static_folder=Config.STATIC_PATH)
socketio = SocketIO(web_app, cors_allowed_origins="*", logger=True)

@socketio.on('connect')
def handle_connect():
    print("connect")

@socketio.on('disconnect')
def handle_disconnect():
    print("disconnect")

if __name__ == '__main__':
    socketio.run(web_app, address, port, allow_unsafe_werkzeug=True)

When a plugin is asynchronous, a thread pool is used and the result is sent back via socketio.emit(user, f"Task result: {result}"):

def main(user, question):
    # ... same logic as above
    func = getattr(Functions, plugin_name)
    if plugin['sync']:
        res = func(**arguments)
    else:
        thread_pool = ThreadPool(3)
        def callback(result):
            socketio.emit(user, f"Task result: {result}")
        res = thread_pool.apply_async(func, kwds=arguments, callback=callback)
    return res

Future Plans

Expand the plugin ecosystem beyond the current dozen examples.

Investigate ways to let end users author their own plugins without requiring code changes in the project.

Recruitment : Ctrip Information Security team is hiring senior security engineers, senior infrastructure security experts, compliance auditors, and SDL security specialists (links omitted for brevity).

Recommended Reading : Links to additional Ctrip technical articles on time‑series forecasting, train‑ticket anomaly detection, SMS recall optimization, and CNN‑based new‑word discovery.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

backend AI LLM WebSocket Function Calling Plugins

Written by

Ctrip Technology

Official Ctrip Technology account, sharing and discussing growth.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.