Metadata Agent

Published on
avatar

Ray

avatar

Mingtian

avatar

Evan

Introduction

The performance of Retrieval Augmented Generation (RAG) systems hinges on the efficiency and accuracy of retrieval techniques. Most current RAG applications employ a pre-trained embedding model to convert data and queries into an embedding space, followed by a K-nearest neighbors search. This approach, commonly known as semantic search, is an approximate search algorithm that operates within the semantic space.

Traditional semantic search can leads to irrelevant or redundant retrieval results, especially when specific conditions must be met. Consider the following query example seeking time-sensitive information about climate change policies in different countries:

Query Example: What are the latest climate change policies in European countries after 2020?

In this case, documents discussing climate change policies before 2020 might be deemed highly relevant by semantic search methods. However, they fail to satisfy the time-specific condition of the query, thus compromising the retrieval process's accuracy.

The Vectify AI Solution: Metadata Agent

To bridge this gap, Vectify AI has developed the Metadata Agent for hybrid semantic and metadata retrieval. Our Metadata Agent utilizes a sophisticated blend of semantic understanding and exact metadata filtering. Users can add document-level metadata when uploading their documents, for example:

{
    "author": "John Doe",
    "publication_date": "2021-04-10",
    "region": "Europe",
}

When a user raises a query, the Metadata Agent will follow the following steps for retrieval:

  1. Extract exact filtering conditions from the user query.
  2. Conduct hybrid semantic and metadata search to retrieve results that are both semantically relevant and satisfy the exact filtering conditions.

In the above query example about climate change policies, the Metadata Agent interprets the question from both semantic and metadata perspectives: semantically, it seeks latest European climate change policies; from a metadata standpoint, it looks for documents published after 2020, ensuring the retrieved policies are current. This dual perspective allows the Metadata Agent to retrieve results that are both relevant and precise.

Therefore, by integrating detailed document-level metadata, the Metadata Agent processes natural language queries with a more nuanced understanding, ensuring results align precisely with specified metadata-level criteria.

Evaluation

We evaluated the performance of our Metadata Agent on datasets with various scenarios. We compared the results of our Metadata Agent retrieval with traditional semantic search, using the same queries. The results are shown below:

summary

We can find the Metadata agent can give us a huge performance boost for the tested queires. More details of the evaluation are available in our Github repo.

Using the Metadata Agent

The Metadata Agent has been integrated in our RAG platform. Our retrieval service processes natural language queries with a blend of semantic understanding and exact metadata filtering. This not only refines the search results but also ensures they align precisely with your specified criteria, improving the effectiveness and efficiency of the retrieval process.

Users can upload documents with metadata in JSON and experience enhanced retrieval with Metadata Agent, through our SDK. For example, after initializing the Vectify client, users can add metadata for a file as follows:

# pip install vectifyai

client = vectifyai.Client(api_key='YOUR_API_KEY')

client.upsert_file_metadata(
   source_name = 'docs', 
   file_name = 'climate.pdf', 
   metadata = {
      'author': 'John Doe', 
      'publication_date': '2021-04-10', 
      'region': 'Europe'
   }
)

File metadata is also visiable in our Dashboard:

dashboard

Users can then experience enhanced retrieval with Metadata Agent through our SDK:

results = client.retrieve(
   query = 'What are the latest climate change policies in European countries after 2020?', 
   top_k = 5, 
   sources = ['docs'], 
   metadata = 'on'
)

Or via our dashboard:

retrieval

More details are available in our documentation.

Try it out

We invite you to try out our Metadata Agent. Start today for free and unlock the potential of RAG with hassle-free retrieval and intelligent agents from Vectify AI. Contact us us to integrate our Metadata Agent service into your API, database, or other type of data sources, and discover how our service can enhance your generative AI applications.