How to Master Few-Shot Prompting with LangChain’s Example Selectors
The article explains why few-shot prompting benefits from dynamically selecting a small set of relevant examples, introduces LangChain’s ExampleSelector component, compares three selector strategies—LengthBased, SemanticSimilarity, and MaxMarginalRelevance—detailing their algorithms, advantages, drawbacks, and provides step-by-step Python code demonstrations for each.
Problem with Large Example Libraries
Hard‑coding dozens or hundreds of input‑output pairs in a prompt quickly exceeds the model's context window and raises token costs, so a method is needed to pick a small, relevant subset of examples based on the current user input.
Solution: ExampleSelector
LangChain provides the ExampleSelector component, which selects a subset of candidates according to a chosen strategy and can be combined with FewShotPromptTemplate or FewShotChatMessagePromptTemplate to build dynamic few‑shot prompts.
Common Selector Types
LengthBasedExampleSelector : selects examples sequentially until the total token length approaches a configurable max_length. Pros : simple and guarantees the prompt stays within length limits. Cons : relevance to the current input is ignored.
SemanticSimilarityExampleSelector : embeds every example with an embedding model, stores the vectors in a vector store, then computes similarity between the new input and all examples and returns the top k most similar. Pros : chooses the most relevant examples, greatly improving few‑shot effectiveness. Cons : requires an embedding model and a vector database.
MaxMarginalRelevanceExampleSelector : variant of the semantic selector that also enforces diversity among the chosen examples, preventing them from being too homogeneous. Pros : balances relevance with diverse perspectives. Cons : shares the same setup complexity as the semantic selector.
Embedding Models
Any model listed at https://docs.cherry-ai.com/knowledge-base/emb-models-info can be used.
Code Walkthrough
Example 1: LengthBased Selector (example_1_length_based_selector.py)
Steps:
Prepare five {"input": ..., "output": ...} pairs and define an example_prompt with PromptTemplate to format each pair.
Instantiate LengthBasedExampleSelector with max_length=25; the selector stops adding examples once the next one would exceed the limit.
Create a FewShotPromptTemplate that receives the selector; calling format() automatically injects the chosen examples.
examples = [
{"input": "开心", "output": "难过"},
{"input": "高", "output": "矮"},
{"input": "快", "output": "慢"},
{"input": "一个非常非常长的输入字符串,这会占用很多空间", "output": "一个同样非常非常长的输出字符串"},
{"input": "白天", "output": "黑夜"},
]
example_selector = LengthBasedExampleSelector(
examples=examples,
example_prompt=example_prompt,
max_length=25,
)
few_shot_prompt = FewShotPromptTemplate(
example_selector=example_selector,
example_prompt=example_prompt,
prefix="请根据下面的示例,给出输入词的反义词。",
suffix="输入: {user_input}
输出:",
input_variables=["user_input"],
)
print(few_shot_prompt.format(user_input="强壮"))Running the script prints a prompt that contains only the first three short examples, confirming that the length limit works as intended.
Example 2: Semantic Similarity Selector (example_2_semantic_similarity_selector.py)
Steps:
Read OPENAI_API_KEY from a .env file and create a HuggingFaceEmbeddings instance (e.g., model_name="mixedbread-ai/mxbai-embed-large-v1").
Use resolve_faiss() to import the FAISS vector store; if the dependency is missing, the script prints an installation hint.
Build a SemanticSimilarityExampleSelector via from_examples, passing the examples, the embeddings, the FAISS class, and k=2 to retrieve the two most similar examples.
Pass this selector to FewShotPromptTemplate; when format() is called, the nearest examples are automatically injected into the prompt.
embeddings = HuggingFaceEmbeddings(model_name="mixedbread-ai/mxbai-embed-large-v1")
example_selector = SemanticSimilarityExampleSelector.from_examples(
examples,
embeddings,
faiss_cls,
k=2,
)
few_shot_prompt = FewShotPromptTemplate(
example_selector=example_selector,
example_prompt=example_prompt,
prefix="请根据下面的示例,为输入生成一个富有想象力的场景描述。",
suffix="输入: {user_input}
输出:",
input_variables=["user_input"],
)
final_prompt = few_shot_prompt.format(user_input="一头孤独的狼")
print(final_prompt)If faiss-cpu, tiktoken, or the API key are missing, the script prints a clear message indicating which dependency must be installed or configured.
Dependencies
faiss-cpu: Facebook AI’s high‑performance similarity search library, used via langchain_community.vectorstores.FAISS. Runs on CPU, no GPU required. tiktoken: OpenAI’s fast tokenizer; LangChain relies on it for accurate token‑length calculations when building or trimming prompts.
References
How to: use example selectors – https://python.langchain.com/docs/how_to/example_selectors
How to: select examples by length – https://python.langchain.com/docs/how_to/length_based_example_selector
How to: select examples by semantic similarity – https://python.langchain.com/docs/how_to/semantic_similarity_example_selector
How to: select examples by semantic n‑gram overlap – https://python.langchain.com/docs/how_to/semantic_ngram_overlap_example_selector
How to: select examples by maximal marginal relevance – https://python.langchain.com/docs/how_to/maximal_marginal_relevance_example_selector
How to: select examples from LangSmith few‑shot datasets – https://python.langchain.com/docs/how_to/langsmith_few_shot_datasets
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
BirdNest Tech Talk
Author of the rpcx microservice framework, original book author, and chair of Baidu's Go CMC committee.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
