How Amazon Nova’s Multimodal Embedding Model Handles All Modalities in One Go

Amazon Nova, a new multimodal embedding model now available on Amazon Bedrock, unifies text, document, image, video, and audio into a single semantic space, offering up to 8000‑token context, multiple output dimensions, and detailed Python examples for embedding generation, storage, and cross‑modal search.

Amazon Cloud Developers
Amazon Cloud Developers
Amazon Cloud Developers
How Amazon Nova’s Multimodal Embedding Model Handles All Modalities in One Go

Amazon Nova is a multimodal foundation model launched on Amazon Bedrock that provides a single embedding service for text, documents, images, video, and audio. By mapping all modalities into a unified semantic space, it enables cross‑modal retrieval, semantic search, and Retrieval‑Augmented Generation (RAG) scenarios.

The model supports up to 8000 tokens of text context and can process 200 languages. It offers four output dimensions (3072, 1024, 384, 256) via Matryoshka Representation Learning, allowing users to balance representation detail against storage and compute costs.

Performance evaluation shows that the out‑of‑the‑box accuracy of Amazon Nova is leading among comparable models, as illustrated in the benchmark table (included in the original article). The model also provides chunking capabilities to split long texts, videos, or audio into manageable segments for embedding.

Basic text embedding example (Python, Boto3) :

import json
import boto3
MODEL_ID = "amazon.nova-2-multimodal-embeddings-v1:0"
EMBEDDING_DIMENSION = 3072
bedrock_runtime = boto3.client("bedrock-runtime", region_name="us-east-1")
text = "Amazon Nova is a multimodal foundation model"
request_body = {
    "taskType": "SINGLE_EMBEDDING",
    "singleEmbeddingParams": {
        "embeddingPurpose": "GENERIC_INDEX",
        "embeddingDimension": EMBEDDING_DIMENSION,
        "text": {"truncationMode": "END", "value": text}
    }
}
response = bedrock_runtime.invoke_model(
    body=json.dumps(request_body),
    modelId=MODEL_ID,
    contentType="application/json"
)
embedding = json.loads(response["body"].read())["embeddings"][0]["embedding"]
print(f"Generated embedding with {len(embedding)} dimensions")

Image embedding example (the image is read, base64‑encoded, and sent to the same endpoint with an image payload).

with open("photo.jpg", "rb") as f:
    image_bytes = base64.b64encode(f.read()).decode("utf-8")
request_body = {
    "taskType": "SINGLE_EMBEDDING",
    "singleEmbeddingParams": {
        "embeddingPurpose": "GENERIC_INDEX",
        "embeddingDimension": EMBEDDING_DIMENSION,
        "image": {"format": "jpeg", "source": {"bytes": image_bytes}}
    }
}
response = bedrock_runtime.invoke_model(body=json.dumps(request_body), modelId=MODEL_ID, contentType="application/json")

Video (and audio) embedding requires the asynchronous API because files larger than 25 MB must be processed asynchronously. The workflow uploads the video to an S3 bucket, starts an async job with

SEGMENTED_EMBEDDING**, and polls for completion.</p>
<pre><code>S3_VIDEO_URI = "s3://my-video-bucket/videos/presentation.mp4"
S3_EMBEDDING_DESTINATION_URI = "s3://my-video-bucket/embeddings-output/"
model_input = {
    "taskType": "SEGMENTED_EMBEDDING",
    "segmentedEmbeddingParams": {
        "embeddingPurpose": "GENERIC_INDEX",
        "embeddingDimension": EMBEDDING_DIMENSION,
        "video": {
            "format": "mp4",
            "embeddingMode": "AUDIO_VIDEO_COMBINED",
            "source": {"s3Location": {"uri": S3_VIDEO_URI}},
            "segmentationConfig": {"durationSeconds": 15}
        }
    }
}
response = bedrock_runtime.start_async_invoke(
    modelId=MODEL_ID,
    modelInput=model_input,
    outputDataConfig={"s3OutputDataConfig": {"s3Uri": S3_EMBEDDING_DESTINATION_URI}}
)
# poll until status != "InProgress"

After embeddings are generated, they can be stored in a vector database. The article demonstrates using Amazon S3 Vectors to create a vector bucket and index, then bulk‑load embeddings for three sample texts.

VECTOR_BUCKET = "my-vector-store"
INDEX_NAME = "embeddings"
# create bucket and index if needed
s3vectors = boto3.client("s3vectors", region_name="us-east-1")
# ... (bucket/index creation omitted for brevity)
texts = ["Machine learning on AWS", "Amazon Bedrock provides foundation models", "S3 Vectors enables semantic search"]
vectors = []
for text in texts:
    response = bedrock_runtime.invoke_model(... )  # same request as text example
    embedding = json.loads(response["body"].read())["embeddings"][0]["embedding"]
    vectors.append({"key": f"text:{text[:50]}", "data": {"float32": embedding}, "metadata": {"type": "text", "content": text}})
s3vectors.put_vectors(vectorBucketName=VECTOR_BUCKET, indexName=INDEX_NAME, vectors=vectors)

Cross‑modal search is illustrated by generating an embedding for a query string, then using query_vectors to retrieve the top‑5 most similar vectors across all stored modalities, with distance scores and optional metadata displayed.

query_text = "foundation models"
# generate query embedding (same as text example, but purpose = GENERIC_RETRIEVAL)
response = s3vectors.query_vectors(
    vectorBucketName=VECTOR_BUCKET,
    indexName=INDEX_NAME,
    queryVector={"float32": query_embedding},
    topK=5,
    returnDistance=True,
    returnMetadata=True
)
for i, result in enumerate(response["vectors"], 1):
    print(f"{i}. {result['key']} - Distance: {result['distance']:.4f}")
    if result.get("metadata"):
        print(f"   Metadata: {result['metadata']}")

Practical considerations include choosing the output dimension (larger dimensions capture richer semantics but increase storage/computation), handling long contexts (up to 8192 tokens for text, 30‑second chunks for video/audio), and responsible AI features such as content safety filtering and fairness mitigations built into Bedrock. The model is accessible via both synchronous and asynchronous APIs, making it suitable for real‑time search interfaces as well as batch processing of large media files. Amazon Nova is currently available in the US East (N. Virginia) region on Amazon Bedrock; pricing details are on the Bedrock pricing page, and further documentation is provided in the Amazon Nova user guide and the GitHub sample repository.

Amazon Nova overview
Amazon Nova overview
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

vector searchcross-modal retrievalPython SDKAWS Bedrockmultimodal embeddingsAmazon Nova
Amazon Cloud Developers
Written by

Amazon Cloud Developers

Official technical community of Amazon Cloud. Shares practical AI/ML, big data, database, modern app development, IoT content, offers comprehensive learning resources, hosts regular developer events, and continuously empowers developers.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.