Using Vector DB Retrieval
Vector database retrieval allows you to fetch relevant data from an external vector database based on similarity searches. By configuring a retrieval column, you can dynamically query stored vectors and integrate contextually relevant information into your dataset.
Following steps are required to configure and retrieve from vector database to create dynamic column.
1. Select a Dataset
Before configuring retrieval, ensure you have selected a dataset. If no dataset is available, follow the steps to Add Dataset on the Future AGI platform.
2. Access the Retrieval Interface
- Navigate to your dataset under Build.
- Click on the Add Columns button (+) in the top-right menu.
- Select column type
- Under Dynamic Columns, select Retrieval.
3. Configure Retrieval Settings
The Retrieval panel will appear, where you need to configure key parameters. Assign a name, and follow below steps:
Choose a Vector Database
- Select a vector database from the available options:
- Pinecone
- Qdrant
- Weaviate
Choose the Column
- Select the column in your dataset that will be used as the query reference.
- This column will contain the data points that are used to fetch similar items from the vector database.
Database Authentication
- You need to provide an API Key for authentication for vector database.
- Click on “Create Secret” if setting up first time. A pop-up window will appear, where you have save the API key to authenticate the vector database.
Database Configuration
To establish a connection between your dataset and the vector database, you must configure additional settings:
- Index Name: This is the name of the index in the vector database where your embeddings are stored. The Index Name helps the system locate and retrieve relevant vectors. Ensure that the name entered matches the index that contains your stored embeddings.
- Namespace: The Namespace is used for organising data within the vector database. If you are managing multiple groups of vectors within the same index, specifying a Namespace allows for structured retrieval and prevents overlapping searches across different datasets.
- Number of Chunks to Fetch: This determines how many top-matching vectors should be retrieved for each query. A lower number will return the closest matches, while a higher number will increase recall but might reduce specificity. Setting an optimal Number of Chunks helps balance retrieval efficiency and accuracy.
- Query Key: The Query Key is a critical field that specifies which dataset attribute will be used to query the vector database. This key must be carefully chosen to ensure meaningful similarity searches. If the wrong key is selected, retrieval results may be inconsistent or irrelevant.
Embedding Configuration
- Select an embedding type from the available options and correspondingly enter the model:
- OpenAI
- Hugging Face
- Sentence Transformer
- Define the Key to Extract, which determines the specific field from which relevant data will be retrieved
- Vector Length: Determines the dimensions of the vector representation.
- Concurrency: Defines the number of rows to process in parallel.
Once all parameters are set, users should click Test to preview the retrieved results. If the retrieval output looks accurate, clicking Create New Column will finalise the setup. The new retrieval column will then dynamically populate with the most relevant data fetched from the vector database.