Vision Completion

Some models on Tune Studio support vision completions, allowing you to pass an image URL in the prompt and receive completions based on the image. Below is a list of models available for both image generation and image-to-text tasks, with their respective availability.

Some of the Supported Models

Model NameTasksModel ID
Pixtral 12BImage-Instruction Tasksmistral/pixtral-12B-2409
GPT-4oImage Generation, Image-Instruction Tasksopenai/gpt-4o
GPT-4o MiniImage Generation, Image-Instruction Tasksopenai/gpt-4o-mini
Claude 3.5Image Generation, Image-Instruction TasksN/A
Claude 3Image Generation, Image-Instruction TasksN/A
PalliGemmaImage Generation, Image-Instruction TasksN/A
Phi 3.5Image Generation, Image-Instruction TasksN/A
Google GeminiImage Generation, Image-Instruction TasksN/A
Qwen-VLImage-Instruction TasksN/A
Groq LLaVAImage-Instruction TasksN/A
BakkllavaImage-Instruction TasksN/A

Bringing your Own VLM to Studio for Inference

We have a variety of self-hosted free to use Vision Language Models that can be called using the corresponding Model-ID, however if you wish to bring any other model for inferences over to Tune Studio, here is an example of how you can achieve that.

We will be bringing llava-v1.5-7b-4096-preview over from Groq:

  1. Select New Model from Models on Tune Studio and Select Groq Integration.
  2. Add Model ID of Llava v1.5 from Groq Cloud
  3. Use Newly hosted Llava v1.5 on Tune Studio through the Studio Model ID

Interacting with Images

You can interact with vision-enabled models using the following tabs for different methods of accessing the API and the Tune Studio Playground:

Image in URL

To pass an image as an URL you can simple change the type of input to the model to image_link and add the link to the image.

curl -X POST "https://proxy.tune.app/chat/completions" \
  -H "Authorization: <access key>" \
  -H "Content-Type: application/json" \
  -H "X-Org-Id: <organization id>" \
  -d '{
        "temperature": 0.9,
        "messages": [
          {
            "role": "user",
            "content": [
              {
                "type": "text",
                "text": "What do you see?"
              },
              {
                "type": "image_url",
                "image_url": {
                  "url": "https://chat.tune.app/icon-512.png"
                }
              }
            ]
          }
        ],
        "model": "MODEL_ID",
        "stream": false,
        "frequency_penalty": 0.2,
        "max_tokens": 200
    }'

Image in Base64 Format

To pass an image in base64 format, replace the image section in the API request with the following:

curl -X POST "https://proxy.tune.app/chat/completions" \
-H "Authorization: <access key>" \
-H "Content-Type: application/json" \
-H "X-Org-Id: <organization id>" \
-d '{
      "temperature": 0.9,
      "messages": [
        {
          "role": "user",
          "content": [
            {
              "type": "text",
              "text": "What do you see?"
            },
            {
              "type": "image_url",
              "image_url": {
                "url": "data:image/jpeg;base64,{base64_image}"
              }
            }
          ]
        }
      ],
      "model": "MODEL_ID",
      "stream": false,
      "frequency_penalty": 0.2,
      "max_tokens": 200
  }'

Generating Images on Tune Studio

Tune Studio doesn’t support Image Generation using the API however you can generate images on Tune Chat and Studio directly by deploying an Image Generation model as a Tune Assistant.

Here is how you can achieve that:

  1. Deploy Image Generation Model as an Assistant on Tune Studio
  2. Turn on Image Generation for the Assistant and Save Updates
  3. Use Playground or Tune Chat for Image Generation