Bedrock Embedding
Supported Embedding Modelsโ
| Provider | LiteLLM Route | AWS Documentation | 
|---|---|---|
| Amazon Titan | bedrock/amazon.* | Amazon Titan Embeddings | 
| Cohere | bedrock/cohere.* | Cohere Embeddings | 
| TwelveLabs | bedrock/us.twelvelabs.* | TwelveLabs | 
Async Invoke Supportโ
LiteLLM supports AWS Bedrock's async-invoke feature for embedding models that require asynchronous processing, particularly useful for large media files (video, audio) or when you need to process embeddings in the background.
Supported Modelsโ
| Provider | Async Invoke Route | Use Case | 
|---|---|---|
| TwelveLabs Marengo | bedrock/async_invoke/us.twelvelabs.marengo-embed-2-7-v1:0 | Video, audio, image, and text embeddings | 
Required Parametersโ
When using async-invoke, you must provide:
| Parameter | Description | Required | 
|---|---|---|
| output_s3_uri | S3 URI where the embedding results will be stored | โ Yes | 
| input_type | Type of input: "text","image","video", or"audio" | โ Yes | 
| aws_region_name | AWS region for the request | โ Yes | 
Usageโ
Basic Async Invokeโ
from litellm import embedding
# Text embedding with async-invoke
response = embedding(
    model="bedrock/async_invoke/us.twelvelabs.marengo-embed-2-7-v1:0",
    input=["Hello world from LiteLLM async invoke!"],
    aws_region_name="us-east-1",
    input_type="text",
    output_s3_uri="s3://your-bucket/async-invoke-output/"
)
print(f"Job submitted! Invocation ARN: {response._hidden_params._invocation_arn}")
Video/Audio Embeddingโ
# Video embedding (requires async-invoke)
response = embedding(
    model="bedrock/async_invoke/us.twelvelabs.marengo-embed-2-7-v1:0",
    input=["s3://your-bucket/video.mp4"],  # S3 URL for video
    aws_region_name="us-east-1",
    input_type="video",
    output_s3_uri="s3://your-bucket/async-invoke-output/"
)
print(f"Video embedding job submitted! ARN: {response._hidden_params._invocation_arn}")
Image Embedding with Base64โ
import base64
# Load and encode image
with open("image.jpg", "rb") as img_file:
    img_data = base64.b64encode(img_file.read()).decode('utf-8')
    img_base64 = f"data:image/jpeg;base64,{img_data}"
response = embedding(
    model="bedrock/async_invoke/us.twelvelabs.marengo-embed-2-7-v1:0",
    input=[img_base64],
    aws_region_name="us-east-1",
    input_type="image",
    output_s3_uri="s3://your-bucket/async-invoke-output/"
)
Retrieving Job Informationโ
Getting Job ID and Invocation ARNโ
The async-invoke response includes the invocation ARN in the hidden parameters:
response = embedding(
    model="bedrock/async_invoke/us.twelvelabs.marengo-embed-2-7-v1:0",
    input=["Hello world"],
    aws_region_name="us-east-1",
    input_type="text",
    output_s3_uri="s3://your-bucket/async-invoke-output/"
)
# Access invocation ARN
invocation_arn = response._hidden_params._invocation_arn
print(f"Invocation ARN: {invocation_arn}")
# Extract job ID from ARN (last part after the last slash)
job_id = invocation_arn.split("/")[-1]
print(f"Job ID: {job_id}")
Checking Job Statusโ
Use LiteLLM's retrieve_batch function to check if your job is still processing:
from litellm import retrieve_batch
def check_async_job_status(invocation_arn, aws_region_name="us-east-1"):
    """Check the status of an async invoke job using LiteLLM batch API"""
    try:
        response = retrieve_batch(
            batch_id=invocation_arn,
            custom_llm_provider="bedrock",
            aws_region_name=aws_region_name
        )
        return response
    except Exception as e:
        print(f"Error checking job status: {e}")
        return None
# Check status
status = check_async_job_status(invocation_arn, "us-east-1")
if status:
    print(f"Job Status: {status.status}")
    print(f"Output Location: {status.output_file_id}")
Note: The actual embedding results are stored in S3. The output_file_id from the batch status can be used to locate the results file in your S3 bucket.
Error Handlingโ
Common Errorsโ
| Error | Cause | Solution | 
|---|---|---|
| ValueError: output_s3_uri cannot be empty | Missing S3 output URI | Provide a valid S3 URI | 
| ValueError: Input type 'video' requires async_invoke route | Using video/audio without async-invoke | Use bedrock/async_invoke/model prefix | 
| ValueError: input_type is required | Missing input type parameter | Specify input_typeparameter | 
Example Error Handlingโ
try:
    response = embedding(
        model="bedrock/async_invoke/us.twelvelabs.marengo-embed-2-7-v1:0",
        input=["Hello world"],
        aws_region_name="us-east-1",
        input_type="text",
        output_s3_uri="s3://your-bucket/output/"  # Required for async-invoke
    )
    print("Job submitted successfully!")
    
except ValueError as e:
    if "output_s3_uri cannot be empty" in str(e):
        print("Error: Please provide a valid S3 output URI")
    elif "requires async_invoke route" in str(e):
        print("Error: Use async_invoke model for video/audio inputs")
    else:
        print(f"Error: {e}")
except Exception as e:
    print(f"Unexpected error: {e}")
Best Practicesโ
- Use async-invoke for large files: Video and audio files are better processed asynchronously
- Use LiteLLM batch API: Use retrieve_batch()instead of direct Bedrock API calls for status checking
- Monitor job status: Check job status periodically using the batch API to know when results are ready
- Handle errors gracefully: Implement proper error handling for network issues and job failures
- Set appropriate timeouts: Consider the processing time for large files
- Use S3 for large inputs: For video/audio, use S3 URLs instead of base64 encoding
Limitationsโ
- Async-invoke is currently only supported for TwelveLabs Marengo models
- Results are stored in S3 and must be retrieved separately using the output file ID
- Job status checking requires using LiteLLM's retrieve_batch()function
- No built-in polling mechanism in LiteLLM (must implement your own status checking loop)
API keysโ
This can be set as env variables or passed as params to litellm.embedding()
import os
os.environ["AWS_ACCESS_KEY_ID"] = ""        # Access key
os.environ["AWS_SECRET_ACCESS_KEY"] = ""    # Secret access key
os.environ["AWS_REGION_NAME"] = ""           # us-east-1, us-east-2, us-west-1, us-west-2
Usageโ
LiteLLM Python SDKโ
from litellm import embedding
response = embedding(
    model="bedrock/amazon.titan-embed-text-v1",
    input=["good morning from litellm"],
)
print(response)
LiteLLM Proxy Serverโ
1. Setup config.yamlโ
model_list:
  - model_name: titan-embed-v1
    litellm_params:
      model: bedrock/amazon.titan-embed-text-v1
      aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID
      aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY
      aws_region_name: us-east-1
  - model_name: titan-embed-v2
    litellm_params:
      model: bedrock/amazon.titan-embed-text-v2:0
      aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID
      aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY
      aws_region_name: us-east-1
2. Start Proxyโ
litellm --config /path/to/config.yaml
3. Use with OpenAI Python SDKโ
import openai
client = openai.OpenAI(
    api_key="anything",
    base_url="http://0.0.0.0:4000"
)
response = client.embeddings.create(
    input=["good morning from litellm"],
    model="titan-embed-v1"
)
print(response)
4. Use with LiteLLM Python SDKโ
import litellm
response = litellm.embedding(
    model="titan-embed-v1", # model alias from config.yaml
    input=["good morning from litellm"],
    api_base="http://0.0.0.0:4000",
    api_key="anything"
)
print(response)
Supported AWS Bedrock Embedding Modelsโ
| Model Name | Usage | Supported Additional OpenAI params | 
|---|---|---|
| Titan Embeddings V2 | embedding(model="bedrock/amazon.titan-embed-text-v2:0", input=input) | here | 
| Titan Embeddings - V1 | embedding(model="bedrock/amazon.titan-embed-text-v1", input=input) | here | 
| Titan Multimodal Embeddings | embedding(model="bedrock/amazon.titan-embed-image-v1", input=input) | here | 
| TwelveLabs Marengo Embed 2.7 | embedding(model="bedrock/us.twelvelabs.marengo-embed-2-7-v1:0", input=input) | Supports multimodal input (text, video, audio, image) | 
| Cohere Embeddings - English | embedding(model="bedrock/cohere.embed-english-v3", input=input) | here | 
| Cohere Embeddings - Multilingual | embedding(model="bedrock/cohere.embed-multilingual-v3", input=input) | here | 
| Cohere Embed v4 | embedding(model="bedrock/cohere.embed-v4:0", input=input) | Supports text and image input, configurable dimensions (256, 512, 1024, 1536), 128k context length |