Querying
Exograph's support for embeddings extends queries to include vector-based filtering and ordering, so once you have familiarized yourself with Exograph's querying capabilities, you can start using embeddings easily. It also provides a way to retrieve the distance of a document from a target vector and aggregate information about the documents.
As we've seen in the mutations section, in the GraphQL API, the Vector
type surfaces as a float array ([Float!]
).
Filtering and ordering
Exograph extends its querying capabilities to support filtering and ordering based on vector embeddings. Both capabilities allow specifying a target vector and filtering or ordering based on the distance from the target vector.
Retrieving closest documents
A common query with embedding is to retrieve the top matching documents. Exograph supports this with the orderBy
clause (along with the existing support for limit
and offset
). For example, to retrieve the top three documents similar to a search vector, you can use the following query:
query topThreeSimilar($searchVector: [Float!]!) {
documents(
orderBy: { contentVector: { distanceTo: $searchVector, order: ASC } }
limit: 3
) {
id
title
content
}
}
Here, the orderBy
clause for the vector field accepts a distanceTo
operator to specify the target vector and order
to specify the sorting order. Exograph will automatically use the distance function specified for the field using the @distanceFunction
annotation (see Customizing Embeddings).
Filtering based on distance
Limiting the number of documents is often sufficient for a typical search or RAG application. However, sometimes, you want to ensure you retrieve only documents within a certain distance. For this, you can also use the similar
operator to filter documents based on the distance from the search vector:
query similar($searchVector: [Float!]!) {
documents(
where: {
contentVector: {
similar: { distanceTo: $searchVector, distance: { lt: 0.5 } }
}
}
) {
id
title
content
}
}
The similar
operator accepts a distanceTo
operator to specify the target vector and a distance
operator to specify the distance condition. The distance
operator allows you to specify the comparison operator (lt
, lte
, gt
, gte
, eq
) and the distance value. Like the orderBy
clause, Exograph will automatically use the distance function specified for the field.
Combining filters and ordering
You can combine the orderBy
and where
clauses to return the closest documents within a certain distance. For example, you can retrieve the top three similar documents only if they are within a certain distance:
query topThreeSimilarDocumentsWithThreshold(
$searchVector: [Float!]!
$threshold: Float!
) {
documents(
where: {
contentVector: {
similar: { distanceTo: $searchVector, distance: { lt: $threshold } }
}
}
orderBy: { contentVector: { distanceTo: $searchVector, order: ASC } }
) {
id
title
content
}
}
Combining other fields with vector-based queries
You can combine vector-based queries with other fields to filter and order based on other structured fields. This is often useful to narrow the search space based on structured data.
For example, you can filter based on the document's title along with a similarity filter and order based on the distance from the search vector:
query topThreeSimilarDocumentsWithTitle(
$searchVector: [Float!]!
$title: String!
$threshold: Float!
) {
documents(
where: {
title: { eq: $title }
contentVector: {
similar: { distanceTo: $searchVector, distance: { lt: $threshold } }
}
}
orderBy: { contentVector: { distanceTo: $searchVector, order: ASC } }
limit: 3
) {
id
title
content
}
}
Here, we filter documents based on title equality and similarity to the search vector and order them based on the distance from the search vector.
Finding the distance
When querying for similar documents, you may want to know the distance of each document from the search vector. Exograph supports this by returning the distance with the document through a special <field-name>Distance
name field. This field accepts one argument to
of the vector type to specify the target vector and returns a float value representing the distance.
{
documents {
id
title
content
contentVectorDistance(to: $searchVector)
}
}
Here, the contentVectorDistance
field returns the distance of each document's contentVector
field from the search vector. You can use this field to post-process or to display an indication of relevance to the user.
Finding aggregate information
In addition to retrieving individual documents, you may want to find aggregate information about the documents. For example, you may want to compute the average vector. This is useful for classification problems where you want to find a representative vector for a set of documents. Then, when a new vector comes into the system, you can compare it with the average vector to classify it.
Exograph supports this through the aggregation API. Currently, it supports only the avg
aggregation function for vector fields (besides count
, which all fields support).
{
documents(where: ...) {
contentVectorAgg {
avg
}
}
}
Here, the contentVectorAgg
field returns the average vector of all documents that match the where
condition.