Create Database Functions
Duration: 0:10:00
At this stage, you should have your Stripe data sync’ed into the public schema running in Supabase. You will have three tables corresponding to the streams you set up in Airbyte. You will also notice that, thanks to the PGVector connector an embedding
column has automatically been created and populated for you. This abtracts the need to create embeddings on our source data and instead, we can focus on creating the context for our LLM to use.
In order to utilize those embeddings however, we need to create a few Postgres functions in Supabase.
These functions will allow us to:
- Query the tables directly from our Python script
- Pass the embeddings of a user’s query as an argument
- Perform a similarity search on the source data
- Return data with the closest matches by distance in vectors
Let’s go ahead and create each function. From within Supabase, you can head into the SQL Editor tab and copy/paste the SQL below:
[!NOTE] If, when running the queries below, you recieve an error that public.vector cannot be found, first make sure than the PGVector extension is enabled. If it is, replace
public.vector
withvector
. This will resolve any discrepencies with how your schema is set up.
find_related_customer
CREATE OR REPLACE FUNCTION find_related_customer(question_vector public.vector)
RETURNS TABLE (
document_id text,
chunk_id text,
metadata json,
document_content text,
embedding public.vector
)
LANGUAGE sql AS $$
SELECT *
FROM customers
ORDER BY embedding <=> question_vector
LIMIT 5;
$$;
find_related_invoices
CREATE OR REPLACE FUNCTION find_related_invoices(question_vector public.vector)
RETURNS TABLE (
document_id text,
chunk_id text,
metadata json,
document_content text,
embedding public.vector
)
LANGUAGE sql AS $$
SELECT *
FROM invoices
ORDER BY embedding <=> question_vector
LIMIT 5;
$$;
find_related_products
CREATE OR REPLACE FUNCTION find_related_products(question_vector public.vector)
RETURNS TABLE (
document_id text,
chunk_id text,
metadata json,
document_content text,
embedding public.vector
)
LANGUAGE sql AS $$
SELECT *
FROM products
ORDER BY embedding <=> question_vector
LIMIT 5;
$$;
At the top of the function, you’ll see that an argument called question_vector
is passed into the function as type vector
. This is then used in the query to run a similarity search against the embedding column identified by the <=>
operator.
PGVector supports 3 different kinds of operators for computing distance between embeddings:
<->
Euclidean distance<#>
Negative inner product<=>
Cosine distance
In this tutorial, we’ll be using cosine distance since it’s typically a safe option with most embedding models. You can read more on this in Supabase’s docs.
With the functions created, we can move on to creating our AI Chatbot!