Embeddings: Quickstart

LanceDB will automatically vectorize the data both at ingestion and query time. All you need to do is specify which model to use. Popular embedding models like OpenAI, Hugging Face, Sentence Transformers, CLIP, and more, are supported.

Step 1: Import Required Libraries

First, import the necessary LanceDB components:

import lancedb
from lancedb.pydantic import LanceModel, Vector
from lancedb.embeddings import get_registry

lancedb: The main database connection and operations
LanceModel: Pydantic model for defining table schemas
Vector: Field type for storing vector embeddings
get_registry(): Access to the embedding function registry. It has all the supported as well as custom embedding functions registered by the user
TypeScript uses lancedb.embedding.getRegistry() and lancedb.embedding.LanceSchema() for the same registry/schema workflow

Step 2: Connect to LanceDB

Establish a connection to your LanceDB OSS directory or Enterprise cluster:

# Enter your LanceDB connection URI for OSS or Enterprise here
db = lancedb.connect(...)

Step 3: Initialize the Embedding Function

Choose and configure your embedding model:

model = get_registry().get("sentence-transformers").create(name="BAAI/bge-small-en-v1.5", )

This creates an embedding function from the local embedding registry. The Python snippet uses the sentence-transformers provider with the BGE model; the TypeScript snippet uses the Transformers-backed huggingface provider. You can:

Change "sentence-transformers" to other providers like "openai", "cohere", etc.
Modify the model name for different embedding models
Set device="cuda" for GPU acceleration if available

Step 4: Define Your Schema

Create a Pydantic model that defines your table structure:

class Words(LanceModel):
    text: str = model.SourceField()  
    vector: Vector(model.ndims()) = model.VectorField()

SourceField(): This field will be embedded
VectorField(): This stores the embeddings
model.ndims(): Sets vector dimensions for your model
In TypeScript, use model.sourceField(...) and model.vectorField() inside LanceSchema(...)

Step 5: Create Table and Ingest Data

Create a table with your schema and add data:

table = db.create_table("words", schema=Words)
table.add([
    {"text": "hello world"},
    {"text": "goodbye world"}
])

The table.add() call automatically:

Takes the text from each document
Generates embeddings using your chosen model
Stores both the original text and the vector embeddings

Step 6: Query with Automatic Embedding

Note: On LanceDB Enterprise, the server does not generate embeddings from query text. In the Python remote client, table.search("greetings") can still work when the table schema includes embedding metadata, because the client computes the query embedding before sending the vector search. If there is no embedding metadata for that search, search("greetings") in auto mode is treated as FTS instead. Search your data using natural language queries:

query = "greetings"
actual = table.search(query).limit(1).to_pydantic(Words)[0]
print(actual.text)

The search process:

Automatically converts your query text to embeddings
Finds the most similar vectors in your table
Returns the matching documents

Get started

Guides

Feature Engineering (Geneva)

Support

Embeddings: Quickstart

Step 1: Import Required Libraries

Step 2: Connect to LanceDB

Step 3: Initialize the Embedding Function

Step 4: Define Your Schema

Step 5: Create Table and Ingest Data

Step 6: Query with Automatic Embedding

Get started

Guides

Feature Engineering (Geneva)

Support

Documentation Index

​Step 1: Import Required Libraries

​Step 2: Connect to LanceDB

​Step 3: Initialize the Embedding Function

​Step 4: Define Your Schema

​Step 5: Create Table and Ingest Data

​Step 6: Query with Automatic Embedding

Step 1: Import Required Libraries

Step 2: Connect to LanceDB

Step 3: Initialize the Embedding Function

Step 4: Define Your Schema

Step 5: Create Table and Ingest Data

Step 6: Query with Automatic Embedding