This tutorial will guide you through creating a retrieval-augmented generation (RAG) agent using Python. We’ll use:

  • Tune Studio with the Llama 3 model.
  • OpenAI for generating embeddings.
  • Supabase for data storage.
  • FastAPI for serving the application.

At the end of the tutorial, you will be able to add documents to the database using URLs and query them.

Tune RAG App

Setting up

First, set up a Python virtual environment.

In a terminal in your working directory, run the following commands to set up a virtual environment with Python 3.11 and activate it:

python3.11 -m venv venv
source venv/bin/activate

Installing dependencies

Create a requirements.txt file in your project folder and paste the following configuration into it:

requirements.txt
fastapi==0.110.2
uvicorn==0.29.0
beautifulsoup4==4.12.3
requests==2.31.0
python-dotenv==1.0.1
supabase==2.4.3
openai==1.23.3
gpt3-tokenizer==0.1.4

Install the dependencies by running the following command:

pip install -r requirements.txt

Setting up environment variables

Create a .env file in the project root directory and add your Tune API key, Supabase URL, Supabase key, and OpenAI API key:

.env
TUNE_API_KEY=tune-api-key  # https://studio.tune.app/profile
SUPABASE_URL=supabase-url
SUPABASE_KEY=supabase-key
OPENAI_API_KEY=openai-key-for-embeddings

Find your Tune API key in Access Keys.

Setting up a vector database

Create a new table and a function for similarity search in Supabase by running the following SQL commands in the Supabase SQL Editor:

create extension vector;
create table documents (
  id bigserial primary key,
  content text,
  embedding vector(1536),
  url text,
  title text
);

create or replace function match_documents (
  query_embedding vector(1536),
  match_threshold float,
  match_count int
)
returns table (
  id bigint,
  content text,
  url text,
  title text,
  similarity float
)
language sql stable
as $$
  select
    documents.id,
    documents.content,
    documents.url,
    documents.title,
    1 - (documents.embedding <=> query_embedding) as similarity
  from documents
  where documents.embedding <=> query_embedding < 1 - match_threshold
  order by documents.embedding <=> query_embedding
  limit match_count;
$$;

Setting up a FastAPI server

We’ll create utility functions and set up server routes for the FastAPI server.

Creating utility functions

We’ll create utility functions to extract data from websites, interact with Tune Studio, and search documents in Supabase.

Create a new file named utils.py and add the following imports to it:

utils.py
import json
import os
import re
import gpt3_tokenizer
import requests
from bs4 import BeautifulSoup
from urllib.parse import urlparse, urljoin
import time
from openai import OpenAI
from supabase import create_client, Client
from typing import List

Add the following functions below the imports in the utils.py file.

  1. The get_embedding function calls the OpenAI API to generate an embedding for the provided text.
utils.py
def get_embedding(query, model="text-embedding-ada-002"):
    client = OpenAI()
    query = query.replace("\n", " ")
    embedding = client.embeddings.create(input = [query], model=model).data[0].embedding
    return embedding
  1. The generate_embedding function generates an embedding for the given text using the OpenAI API and stores the result in the Supabase database.
utils.py
def generate_embedding(text, target_url, title):
    url: str = os.environ.get("SUPABASE_URL")
    key: str = os.environ.get("SUPABASE_KEY")
    supabase: Client = create_client(url, key)
    text = text.replace("\n", " ")
    embedding = get_embedding(text)
    supabase.table('documents').insert({
        "content": text,
        "embedding": embedding,
        "url": target_url,
        "title": title
    }).execute()
  1. The search_documents function searches the Supabase database for documents with similar embeddings to a given query.
utils.py
def search_documents(query, model="text-embedding-ada-002"):
    url: str = os.environ.get("SUPABASE_URL")
    key: str = os.environ.get("SUPABASE_KEY")
    supabase: Client = create_client(url, key)
    embedding = get_embedding(query, model)
    matches = supabase.rpc('match_documents',{
        "query_embedding" : embedding,
        "match_threshold" : 0.7,
        "match_count" : 6
    }).execute()
    return matches
  1. The clean_text function cleans raw text by removing unnecessary characters and spaces.
utils.py
def clean_text(text):
    # Remove extra newlines and spaces
    cleaned_text = re.sub(r'\n+', '\n', text)
    cleaned_text = re.sub(r'\s+', ' ', cleaned_text)
    # Remove wiki-specific text such as headings, links, categories, and special characters
    cleaned_text = re.sub(r'\[.*?\]', ' ', cleaned_text)  # Remove text within square brackets
    cleaned_text = re.sub(r'\{.*?\}', ' ', cleaned_text)  # Remove text within curly braces
    cleaned_text = re.sub(r'\(.*?\)', ' ', cleaned_text)  # Remove text within parentheses
    cleaned_text = re.sub(r'==.*?==', ' ', cleaned_text)  # Remove text within double equals
    # Remove special characters
    cleaned_text = re.sub(r'[\|•\t]', ' ', cleaned_text)
    return cleaned_text.strip()
  1. The extract_website_data function scrapes web content, cleans the text, and follows internal links to a specified depth.

Here’s how the extract_website_data function works:

  • Checks whether the maximum recursion level or timeout has been reached.
  • Uses requests to fetch the webpage and BeautifulSoup to parse the HTML content.
  • Cleans the extracted text using clean_text.
  • Generates embeddings for the cleaned text using generate_embedding.
  • Recursively follows all internal links found on the page, up to the specified depth.
utils.py
def extract_website_data(url, start_time=0, level=0, max_level=3, visited_urls=None, host=None):
    if visited_urls is None:
        visited_urls = set()
    if time.time() - start_time > 90:
        return []

    if host is None:
        host = urlparse(url).netloc

    if level > max_level or urlparse(url).netloc != host:
        return []

    if url in visited_urls:
        return []
    else:
        visited_urls.add(url)

    try:
        response = requests.get(url, timeout=20, headers={"User-Agent": "Mozilla/5.0"})
        data = []
        if response.status_code == 200:
            soup = BeautifulSoup(response.content, "html.parser")
            base_url = urlparse(url)._replace(path='', query='', fragment='').geturl()
            cleaned_text_data = clean_text(soup.get_text().strip())
            current_data = {"url": url, "text": cleaned_text_data}
            page_title = soup.title.string
            generate_embedding(cleaned_text_data, url, page_title)
            data.append(current_data)

            all_links = soup.find_all("a", href=True)
            for link in all_links:
                href = link.get("href")
                if href:
                    full_url = urljoin(base_url, href)
                    cleaned_url = urlparse(full_url)._replace(fragment='').geturl()
                    if cleaned_url not in visited_urls:
                        data.extend(extract_website_data(cleaned_url, start_time, level + 1, max_level, visited_urls, host))
            return data
        else:
            return []
    except requests.RequestException as e:
        print("Request to", url, "failed:", str(e))
        return []
    except Exception as e:
        print("An error occurred while processing", url, ":", str(e))
        return []
  1. The get_response_tunestudio function generates a response using the Tune Studio API and the Llama 3 model.
utils.py
async def get_response_tunestudio(prompt: str, matches: List[dict]):
    max_context_tokens = 1600
    context = ""
    for match in matches:
        if gpt3_tokenizer.count_tokens(match['content'] + context) < max_context_tokens:
            context = context + match['url'] + ":\n" +  match['content'] + "\n"

    system = "You are a very enthusiastic TuneAi representative, your goal is to assist people effectively! Using the provided sections from the documentation, craft your answers in markdown format. If the documentation doesn't clearly state the answer, or you are uncertain, please respond with \"Apologies, but I'm unable to provide assistance with that.\", do not mention documentation keywords in the response.\n\n"

    url = "https://proxy.tune.app/chat/completions"
    headers = {
        "Authorization": os.environ.get("TUNE_API_KEY"),
        "Content-Type": "application/json",
    }
    data = {
        "temperature": 0.2,
        "messages":  [{
            "role": "system",
            "content": system + context
        },{
            "role": "user",
            "content": prompt
        }],
        "model": "rohan/Meta-Llama-3-8B-Instruct",
        "stream": True,
        "max_tokens": 300
    }

    with requests.post(url, headers=headers, json=data, stream=True) as response:
        for line in response.iter_lines():
            decoded_chunk = line.decode().replace("data: ","")
            if decoded_chunk and decoded_chunk != "[DONE]":
                json_chunk = json.loads(decoded_chunk)
                yield json_chunk["choices"][0]["delta"].get("content","")

Adding server routes

Now let’s set up the FastAPI server routes.

We’ll use dotenv to load environment variables and the utility functions we created. Create a main.py file in the root directory of your project and add the following code to it:

main.py
import time
from fastapi import FastAPI, Request
from dotenv import load_dotenv
from pydantic import BaseModel, Field
from fastapi.responses import StreamingResponse

# file we wrote
from utils import extract_website_data, get_response_tunestudio, search_documents

load_dotenv()  # Loads environment variables from .env file
app = FastAPI()

Next, define a class to store document data and implement the API endpoints:

main.py
class DocumentBody(BaseModel):
    url : str = Field(..., title="URL of the website to extract data from")

class SearchBody(BaseModel):
    query: str

Now you can add the API endpoints defined below to the main.py file.

  1. The POST /add_documents endpoint accepts a URL, processes the website to extract text, generates embeddings, and stores them in Supabase.
main.py
@app.post("/add_documents")
def read_root(document: DocumentBody):
    url = document.url
    print("URL:", url)
    start_time = time.time()
    urls = extract_website_data(url, start_time)
    return {"urls_processed": len(urls), "time_taken": time.time() - start_time}
  1. The POST /prompt endpoint accepts a search query and returns relevant documents based on embeddings similarity.
main.py
@app.post("/prompt")
def resolve_prompt(prompt: SearchBody):
    prompt = prompt.query
    search = search_documents(prompt)
    return StreamingResponse(get_response_tunestudio(prompt, search.data), media_type="text/event-stream")

Running the server

Add the following code to the bottom of the main.py file to use uvicorn to launch the server:

main.py
if __name__ == "__main__":
    import uvicorn

    uvicorn.run(app, host="0.0.0.0", port=8000)

Start the server with the following command:

python main.py

Adding documents

To add document data, execute the following command:

curl -X POST http://localhost:8000/add_documents \
     -H "Content-Type: application/json" \
     -d '{"url":"http://tunehq.ai"}'

Let’s add some more data (if you receive warnings denoting exceeded maximum context length, you can safely ignore them):

for link in "https://news.ycombinator.com/" "https://example.com/"; do
    curl -X POST http://localhost:8000/add_documents \
         -H "Content-Type: application/json" \
         -d "{\"url\":\"$link\"}"
done

Querying the documents

You can query the database using a command like the following:

curl -X POST http://localhost:8000/prompt \
     -H "Content-Type: application/json" \
     -d '{"query":"Pricing of Tune Studio"}'

Find the complete code for this tutorial on the supabase-rag GitHub repository.