Vector database retrieval allows you to fetch relevant data from an external vector database based on similarity searches.
By configuring a retrieval column, you can dynamically query stored vectors and integrate contextually relevant information into your dataset.Following steps are required to configure and retrieve from vector database to create dynamic column.
Before configuring retrieval, ensure you have selected a dataset. If no dataset is available, follow the steps to Add Dataset on the Future AGI platform.
To establish a connection between your dataset and the vector database, you must configure additional settings:
Index Name: This is the name of the index in the vector database where your embeddings are stored. The Index Name helps the system locate and retrieve relevant vectors. Ensure that the name entered matches the index that contains your stored embeddings.
Namespace: The Namespace is used for organising data within the vector database. If you are managing multiple groups of vectors within the same index, specifying a Namespace allows for structured retrieval and prevents overlapping searches across different datasets.
Number of Chunks to Fetch: This determines how many top-matching vectors should be retrieved for each query. A lower number will return the closest matches, while a higher number will increase recall but might reduce specificity. Setting an optimal Number of Chunks helps balance retrieval efficiency and accuracy.
Query Key: The Query Key is a critical field that specifies which dataset attribute will be used to query the vector database. This key must be carefully chosen to ensure meaningful similarity searches. If the wrong key is selected, retrieval results may be inconsistent or irrelevant.
Select an embedding type from the available options and correspondingly enter the model:
OpenAI
Hugging Face
Sentence Transformer
Define the Key to Extract, which determines the specific field from which relevant data will be retrieved
Vector Length: Determines the dimensions of the vector representation.
Concurrency: Defines the number of rows to process in parallel.
Once all parameters are set, users should click Test to preview the retrieved results. If the retrieval output looks accurate, clicking Create New Column will finalise the setup. The new retrieval column will then dynamically populate with the most relevant data fetched from the vector database.