Building Production-Grade Vector Search Building Production-Grade Vector Search
Vector stores play a pivotal role in the evolution of machine learning, serving as essential repositories for numerical encodings of data.... Building Production-Grade Vector Search

Vector stores play a pivotal role in the evolution of machine learning, serving as essential repositories for numerical encodings of data. Vectors are mathematical entities used to represent categorical data points in a multi-dimensional space. In the context of machine learning, vector stores provide a means to store, retrieve, filter, and manipulate these vectors. ‍

As vector models excel at finding similarities between data points, they are indispensable for large language models, as well as the foundation for many prediction, similarity search, and context retrieval use cases.‍

By enabling quick and structured access to data representations, vector stores empower machine learning models to process and learn from complex data with remarkable efficiency, facilitating the development of advanced AI applications that continue to reshape industries and technologies.‍

‍Qwak Vector Store

Qwak is excited to announce the release of our managed vector store service. The Qwak Vector Store provides a scalable solution for the transformation and ingestion of vector data, as well as a low-latency query engine that can provide advanced filtering on metadata and properties.

‍The Qwak vector store fits seamlessly within the Qwak platform, allowing you to easily connect models to the vector store for training and prediction and manage all of your machine learning infrastructure in one place.

In-Person and Virtual Conference

April 23rd to 25th, 2024

Join us for a deep dive into the latest data science and AI trends, tools, and techniques, from LLMs to data analytics and from machine learning to responsible AI.

Building production-grade vector search

In this tutorial, we will walk through an implementation of the Qwak Vector store – building a Qwak model that transforms Wikipedia article texts into embedding, storing that embedding data in the Qwak Vector store with relevant metadata related to the articles, and finally, querying the vector store to retrieve objects similar to our search criteria.

building production grage vector search flow

Create a Model that transforms words into vector

First, we’ll need to create a model that transforms and embeds our input text so that it can be persisted in the Qwak Vector store. Luckily, we can do this all on the Qwak platform.

For this model, we’ll be using the sentence transformer all-MiniLM-L12-v2 model from Hugging Face. We first import the QwakModel class from the qwak-sdk and define our base model. ‍

We’ll need to define two functions, the build() function and the predict() function. Because the model is pre-trained, we can simply pass our build function and set self.model to an instance of the SentenceTransformer. ‍

In the predict function, we’ll define how our model will handle input data.

import qwak
from pandas import DataFrame
from qwak.model.base import QwakModel
from qwak.model.schema import ModelSchema, ExplicitFeature
from sentence_transformers import SentenceTransformer

from helpers import get_device

class SentenceEmbeddingsModel(QwakModel):
    def __init__(self):
        self.model_id = "sentence-transformers/all-MiniLM-L12-v2"
        self.model = None
        self.device = None

    def build(self):
        qwak.log_metric({"val_accuracy": 1})

    def schema(self):
        return ModelSchema(
                ExplicitFeature(name="input", type=str),

    def initialize_model(self):
        self.device = get_device()
        print(f"Inference using device: {self.device}")
        self.model = SentenceTransformer(

    def predict(self, df):
        text_embeds = self.model.encode(

        return DataFrame({"embeddings": text_embeds})

We transform our input text into a list, so that it can be handled by the SentenceTransformer encoding logic, define the batch size, and add a few configuration settings.

We return a DataFrame with field output and value of a list of vectors.

We’ll need to build and deploy it as a real time endpoint so we can call our embedding function when querying the vector store.

You can find an example of this model at our Qwak examples repository.

  1. Clone the repository locally
  2. Make sure you have the Qwak CLI installed and configured.
  3. Go to the sentence_transformers_poetry directory.
  4. Run make build to kick off the training job for this model. You can navigate to the Models -> Builds tab in the Qwak UI and monitor the progress of the build.
  5. Now that the model has been successfully trained and stored in the Qwak Model Repository, you can run make deploy to take this build version and deploy it as a real-time endpoint. You can also monitor the Deployment steps by going to the Models -> Deployments tab in the Qwak UI.
  6. When the Deployment completes, click on the Test Model tab in the upper right hand corner of the platform, and Qwak will generate example inference calls that you can use to call your real time endpoint and test your predictions live!‍

Read more about the Qwak Build and Deploy process in our documentation.

Create Collection

Collections are a Qwak organizational feature that allow you to structure and manage your various vector groupings across your vector store. Collections allow you to specify the metric configuration (cosine, L2) as well as the number of vector dimensions, providing fine grained control over the grouping and indexing of your data.

You can create collections in the UI, or define the collections as code using the qwak-sdk. For this tutorial, you can see an example collection we’ve created below. We select the cosine metric for grouping and 384 dimensions to be used in the vector plane. We also need to select a vectorizer.‍

A vectorizer is a deployed Qwak model that accepts data as an input and returns a vector in its predict function – just like the model we created in the previous step!

Here we select the sentence-transformer model that is running in the Qwak platform, and the Qwak Vector Store will automatically use this model’s embedding function when preparing input data for insertion or searching the vector store, allowing us to send free text to the Qwak collections API.

‍Now that we have our model deployed and our collection in place, we are ready to start using our vector store.

Insert data into vector store

We’ll first need to prepare and format our data so that it can be properly inserted into the vector store.

Let’s install our project dependencies first.

pip install pandas pyarrow numpy qwak-sdk

We have a source parquet file that contains a series of Wikipedia articles and their contents. We also have fields for url, article title, and text length that we will use as properties for our vectors. We first read the parquet file into a Dataframe using pandas, and filter out articles that do not contain text.

import pandas as pd
import numpy as np
import os

df = pd.read_parquet("short_articles.parquet")
df = df[df["text"].str.len() > 0].sample(frac=0.25)
df = df.reset_index()

Next we’ll need a unique identifier for each vector that we store, so we’ll select the article id field. We’ll also select the properties that we want to include with our vectors. Fortunately, these fields are fairly straightforward so we won’t have to do much transformation.

## collect our ids for each article
ids = df["article_id"].tolist()
## collect the properties that we will attach to each vector
properties = df.apply(
   lambda r:{
       "url": r.url,
       "title": r.title,
       "title_len": r.title_len,
       "text": r.text,
       "text_len": r.text_len}
   , axis=1

We’ll need to retrieve the collection that we created in the UI. With the Qwak VectorStoreClient, we find the collection that we created in the previous step, or we can create a new collection using the create_collection() method.

from qwak.exceptions import QwakException
from qwak.vector_store import VectorStoreClient

## Create vector client and fetch collection
client = VectorStoreClient()

# Retrieve a collection or create a new one
collection_name = "wikipedia-vectorizer-demo"
   collection = client.get_collection_by_name(collection_name)
except QwakException:
   collection = client.create_collection(
       description="Indexing Wikipedia articles ",
       vectorizer="sentence_transformer"  # The name of a deployed realtime model on Qwak

With our ids and metadata properly formatted, we’ll select the article text field from the Dataframe as this is the column we’ll want to embed in our vector store. We use the upsert() method from the collection client that we fetched in the previous step. Using the “natural_inputs” parameter for our article text, the upsert command will call the real time model, use the predict function to return our article text as vectors, and persist the vectors in the database.

   ## List of the article ids
   # Natural inputs
   ## List of dict of the article properties

Query vector store and see our results

Now that our vector store has been populated, let’s use it!

Let’s query our vector store to see if it has any content related to ducks.We can use the same VectorStoreClient that we created in the upsert step.‍

We take our query, “ducks” and pass it into the client’s search() method. On the backend, the vector client will call our sentence-transformer model to vectorize our search so it can be properly used by the vector store.We can specify the number of results and the vector properties that we want returned. We can also return the distance between our input and the returned results, if we want to gauge the query performance or quality of our vector indexing.

from qwak.vector_store import VectorStoreClient

## Search vector store using vector provided by model
search_results = collection.search(
   output_properties=["title", "title_len", "url"],

We specify to the client to return the title, title_len, url of the three closest articles and here are the results!

[print(x.properties, x.distance) for x in search_results]

# Search result objects
{'title': 'Anaheim Ducks', 'title_len': 13.0, 'url': 'https://simple.wikipedia.org/wiki/Anaheim%20Ducks'} 0.38589573
{'title': 'Duck', 'title_len': 4.0, 'url': 'https://simple.wikipedia.org/wiki/Duck'} 0.4547128

Our query returned articles related to the Anaheim Ducks, who are a hockey team in the NHL, and ‘Quack’ the noise that ducks make. Not the best search result, but for only a few thousand article vectors, we’ll take it! You should use the metric feature of collections to experiment with different distance calculations and measure how they affect performance.

Supporting pre-filtering vector queries

Qwak offers vector pre-filtering to optimize search queries, resulting in faster and lighter queries.

Pre-Filtering represents an approach where eligible candidates are identified prior to initiating a vector search. Subsequently, the vector search exclusively considers candidates present on the “allow” list.‍

The two primary benefits in pre-filter queries include:

  1. Prediction of Result Count: Applying the filter to an already reduced list of candidates makes it simple to estimate the number of elements in the search results.
  2. Immediate Match Detection: If the filter is highly restrictive, meaning it matches only a small percentage of data points relative to the dataset’s size, you’ll instantly know if there are no matches in the original vector search.
from qwak.vector_store import VectorStoreClient
from qwak.vector_store.filters import Or, GreaterThan, Equal

## Search vector store using vector provided by model
search_results = collection.search(
   output_properties=["title", "title_len", "url"],
    GreaterThan(property="text_len", value=110.0),
    Equal(property="title_len", value=16.0)
Learn more in our docs about about pre-filtering vector queries.


The Qwak Vector Store is available now and you can get started today by going to the Collections tab in the Qwak UI and creating a new collection. To learn more about all the Qwak Vector Store functionality, you can visit our documentation or reach out to a member of the Qwak team.

Article originally posted here. Reposted with permission.

ODSC Community

The Open Data Science community is passionate and diverse, and we always welcome contributions from data science professionals! All of the articles under this profile are from our community, with individual authors mentioned in the text itself.