Using ChatGPT, WhisperX, and LangChain for Smart Meetings & Knowledge Bases
Ziroom demonstrates how large language models like ChatGPT, combined with WhisperX transcription and LangChain-powered pipelines, can automate meeting minutes, generate summaries, and build intelligent knowledge bases, outlining implementation steps, code snippets, and practical challenges for internal AI adoption.
Background
Recently AI has become a hot topic, with tools such as Midjourney, Stable Diffusion, and ChatGPT gaining widespread attention. Many developers are shifting from Web3 to AI, and large models are beginning to be applied in internal scenarios. This article shares concrete product implementations, challenges, and insights gained from deploying large models within the company.
ChatGPT in Ziroom's Direction
After ChatGPT’s emergence, ordinary users can create their own apps by writing prompts, even without development experience. Various tools like Notion, ChatMind, and Raycast have seen rapid AI-driven growth. Ziroom identifies two main internal application directions: (1) automating frequent meeting recordings using Feishu’s voice‑to‑text and summarization features, turning meetings into digital assets; (2) enhancing knowledge bases so that retrieved documents can be intelligently refined and answered by LLMs.
Implementation Ideas
3.1 Smart Meeting Platform
The first version supports offline audio‑to‑text conversion and long‑text summarization. Users can upload audio files or record via a web interface. The speech model uses WhisperX, which provides speaker diarization and word‑level timestamps.
import whisperx
import gc
device = "cuda"
audio_file = "audio.mp3"
batch_size = 16 # reduce if low on GPU mem
compute_type = "float16" # change to "int8" if low on GPU mem (may reduce accuracy)
# 1. Transcribe with original whisper (batched)
model = whisperx.load_model("large-v2", device, compute_type=compute_type)
audio = whisperx.load_audio(audio_file)
result = model.transcribe(audio, batch_size=batch_size)
print(result["segments"]) # before alignment
# delete model if low on GPU resources
# import gc; gc.collect(); torch.cuda.empty_cache(); del model
# 2. Align whisper output
model_a, metadata = whisperx.load_align_model(language_code=result["language"], device=device)
result = whisperx.align(result["segments"], model_a, metadata, audio, device, return_char_alignments=False)
print(result["segments"]) # after alignment
# delete model if low on GPU resources
# import gc; gc.collect(); torch.cuda.empty_cache(); del model_a
# 3. Assign speaker labels
diarize_model = whisperx.DiarizationPipeline(use_auth_token=YOUR_HF_TOKEN, device=device)
di arize_segments = diarize_model(audio_file)
result = whisperx.assign_word_speakers(diarize_segments, result)
print(diarize_segments)
print(result["segments"]) # segments are now assigned speaker IDsFor long‑text summarization, the large model’s token limit requires splitting the transcript. Two methods are used: (1) split by token count using tiktoken; (2) recursively summarize chunks, feeding each summary back into the next chunk.
def _generate_summary(self, prompt):
final_prompt = f"[{prompt}]"
return self._call(final_prompt, stop=['
'])
def summary_text(self, transcript):
if len(transcript) < 1300:
return self._generate_summary(transcript)
prompt = "请总结以下文"
text = prompt + transcript
tokens = token_encoder.encode(text)
chunks = []
while tokens:
chunk_tokens = tokens[:1500]
chunk_text = token_encoder.decode(chunk_tokens)
chunks.append(chunk_text)
tokens = tokens[1500:]
summary = "
".join([self._generate_summary(chunk) for chunk in chunks])
return summary3.2 Knowledge Base Construction
The knowledge base pipeline consists of loading files, reading text, splitting, vectorizing, vectorizing the query, retrieving the top‑k most similar document vectors, and feeding the retrieved texts together with the question into the LLM prompt. LangChain is used extensively for this workflow.
LangChain supports several key areas:
LLM and prompts management
Chains for sequencing calls
Data‑augmented generation
Agents for decision‑making actions
Memory for stateful interactions
Evaluation utilities
A simple QA bot built with LangChain and OpenAI embeddings is shown below.
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.text_splitter import CharacterTextSplitter
from langchain import OpenAI, VectorDBQA
from langchain.document_loaders import DirectoryLoader
from langchain.chains import RetrievalQA
loader = DirectoryLoader('/content/sample_data/data/', glob='**/*.txt')
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=100, chunk_overlap=0)
split_docs = text_splitter.split_documents(documents)
embeddings = OpenAIEmbeddings()
docsearch = Chroma.from_documents(split_docs, embeddings)
qa = VectorDBQA.from_chain_type(llm=OpenAI(), chain_type="stuff", vectorstore=docsearch, return_source_documents=True)
result = qa({"query": "科大讯飞今年第一季度收入是多少?"})
print(result)The internal knowledge base uses ChatGLM as a cost‑effective LLM alternative to ChatGPT. Although performance differs, the overall workflow remains the same.
Overall, large models are gradually being adopted internally to solve practical problems, though challenges such as hallucinations, performance constraints, and content moderation remain. Continued experimentation will improve efficiency and reduce manual effort.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
