Using ChatGPT, WhisperX, and LangChain for Smart Meetings & Knowledge Bases

Ziroom demonstrates how large language models like ChatGPT, combined with WhisperX transcription and LangChain-powered pipelines, can automate meeting minutes, generate summaries, and build intelligent knowledge bases, outlining implementation steps, code snippets, and practical challenges for internal AI adoption.

Ziru Technology
Ziru Technology
Ziru Technology
Using ChatGPT, WhisperX, and LangChain for Smart Meetings & Knowledge Bases

Background

Recently AI has become a hot topic, with tools such as Midjourney, Stable Diffusion, and ChatGPT gaining widespread attention. Many developers are shifting from Web3 to AI, and large models are beginning to be applied in internal scenarios. This article shares concrete product implementations, challenges, and insights gained from deploying large models within the company.

ChatGPT in Ziroom's Direction

After ChatGPT’s emergence, ordinary users can create their own apps by writing prompts, even without development experience. Various tools like Notion, ChatMind, and Raycast have seen rapid AI-driven growth. Ziroom identifies two main internal application directions: (1) automating frequent meeting recordings using Feishu’s voice‑to‑text and summarization features, turning meetings into digital assets; (2) enhancing knowledge bases so that retrieved documents can be intelligently refined and answered by LLMs.

Implementation Ideas

3.1 Smart Meeting Platform

The first version supports offline audio‑to‑text conversion and long‑text summarization. Users can upload audio files or record via a web interface. The speech model uses WhisperX, which provides speaker diarization and word‑level timestamps.

import whisperx
import gc 

device = "cuda"
audio_file = "audio.mp3"
batch_size = 16 # reduce if low on GPU mem
compute_type = "float16" # change to "int8" if low on GPU mem (may reduce accuracy)

# 1. Transcribe with original whisper (batched)
model = whisperx.load_model("large-v2", device, compute_type=compute_type)

audio = whisperx.load_audio(audio_file)
result = model.transcribe(audio, batch_size=batch_size)
print(result["segments"]) # before alignment

# delete model if low on GPU resources
# import gc; gc.collect(); torch.cuda.empty_cache(); del model

# 2. Align whisper output
model_a, metadata = whisperx.load_align_model(language_code=result["language"], device=device)
result = whisperx.align(result["segments"], model_a, metadata, audio, device, return_char_alignments=False)
print(result["segments"]) # after alignment

# delete model if low on GPU resources
# import gc; gc.collect(); torch.cuda.empty_cache(); del model_a

# 3. Assign speaker labels
diarize_model = whisperx.DiarizationPipeline(use_auth_token=YOUR_HF_TOKEN, device=device)
di arize_segments = diarize_model(audio_file)
result = whisperx.assign_word_speakers(diarize_segments, result)
print(diarize_segments)
print(result["segments"]) # segments are now assigned speaker IDs

For long‑text summarization, the large model’s token limit requires splitting the transcript. Two methods are used: (1) split by token count using tiktoken; (2) recursively summarize chunks, feeding each summary back into the next chunk.

def _generate_summary(self, prompt):
    final_prompt = f"[{prompt}]"
    return self._call(final_prompt, stop=['
'])

    def summary_text(self, transcript):
        if len(transcript) < 1300:
            return self._generate_summary(transcript)
        prompt = "请总结以下文"
        text = prompt + transcript
        tokens = token_encoder.encode(text)
        chunks = []
        while tokens:
            chunk_tokens = tokens[:1500]
            chunk_text = token_encoder.decode(chunk_tokens)
            chunks.append(chunk_text)
            tokens = tokens[1500:]
        summary = "
".join([self._generate_summary(chunk) for chunk in chunks])
        return summary

3.2 Knowledge Base Construction

The knowledge base pipeline consists of loading files, reading text, splitting, vectorizing, vectorizing the query, retrieving the top‑k most similar document vectors, and feeding the retrieved texts together with the question into the LLM prompt. LangChain is used extensively for this workflow.

LangChain supports several key areas:

LLM and prompts management

Chains for sequencing calls

Data‑augmented generation

Agents for decision‑making actions

Memory for stateful interactions

Evaluation utilities

A simple QA bot built with LangChain and OpenAI embeddings is shown below.

from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.text_splitter import CharacterTextSplitter
from langchain import OpenAI, VectorDBQA
from langchain.document_loaders import DirectoryLoader
from langchain.chains import RetrievalQA

loader = DirectoryLoader('/content/sample_data/data/', glob='**/*.txt')
documents = loader.load()

text_splitter = CharacterTextSplitter(chunk_size=100, chunk_overlap=0)
split_docs = text_splitter.split_documents(documents)

embeddings = OpenAIEmbeddings()
docsearch = Chroma.from_documents(split_docs, embeddings)

qa = VectorDBQA.from_chain_type(llm=OpenAI(), chain_type="stuff", vectorstore=docsearch, return_source_documents=True)
result = qa({"query": "科大讯飞今年第一季度收入是多少?"})
print(result)

The internal knowledge base uses ChatGLM as a cost‑effective LLM alternative to ChatGPT. Although performance differs, the overall workflow remains the same.

Overall, large models are gradually being adopted internally to solve practical problems, though challenges such as hallucinations, performance constraints, and content moderation remain. Continued experimentation will improve efficiency and reduce manual effort.

AILangChainChatGPTMeeting AutomationWhisperX
Ziru Technology
Written by

Ziru Technology

Ziru Official Tech Account

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.