Vision Completion

Some models on Tune Studio support vision completions, allowing you to pass an image URL in the prompt and receive completions based on the image. Below is a list of models available for both image generation and image-to-text tasks, with their respective availability.

Some of the Supported Models

Model Name	Tasks	Model ID
Pixtral 12B	Image-Instruction Tasks	mistral/pixtral-12B-2409
GPT-4o	Image Generation, Image-Instruction Tasks	openai/gpt-4o
GPT-4o Mini	Image Generation, Image-Instruction Tasks	openai/gpt-4o-mini
Claude 3.5	Image Generation, Image-Instruction Tasks	N/A
Claude 3	Image Generation, Image-Instruction Tasks	N/A
PalliGemma	Image Generation, Image-Instruction Tasks	N/A
Phi 3.5	Image Generation, Image-Instruction Tasks	N/A
Google Gemini	Image Generation, Image-Instruction Tasks	N/A
Qwen-VL	Image-Instruction Tasks	N/A
Groq LLaVA	Image-Instruction Tasks	N/A
Bakkllava	Image-Instruction Tasks	N/A

Bringing your Own VLM to Studio for Inference

We have a variety of self-hosted free to use Vision Language Models that can be called using the corresponding Model-ID, however if you wish to bring any other model for inferences over to Tune Studio, here is an example of how you can achieve that.

We will be bringing llava-v1.5-7b-4096-preview over from Groq:

Select New Model from Models on Tune Studio and Select Groq Integration.
Add Model ID of Llava v1.5 from Groq Cloud
Use Newly hosted Llava v1.5 on Tune Studio through the Studio Model ID

Interacting with Images

You can interact with vision-enabled models using the following tabs for different methods of accessing the API and the Tune Studio Playground:

Image in URL

To pass an image as an URL you can simple change the type of input to the model to image_link and add the link to the image.

curl -X POST "https://proxy.tune.app/chat/completions" \
  -H "Authorization: <access key>" \
  -H "Content-Type: application/json" \
  -H "X-Org-Id: <organization id>" \
  -d '{
        "temperature": 0.9,
        "messages": [
          {
            "role": "user",
            "content": [
              {
                "type": "text",
                "text": "What do you see?"
              },
              {
                "type": "image_url",
                "image_url": {
                  "url": "https://chat.tune.app/icon-512.png"
                }
              }
            ]
          }
        ],
        "model": "MODEL_ID",
        "stream": false,
        "frequency_penalty": 0.2,
        "max_tokens": 200
    }'

Image in Base64 Format

To pass an image in base64 format, replace the image section in the API request with the following:

curl -X POST "https://proxy.tune.app/chat/completions" \
-H "Authorization: <access key>" \
-H "Content-Type: application/json" \
-H "X-Org-Id: <organization id>" \
-d '{
      "temperature": 0.9,
      "messages": [
        {
          "role": "user",
          "content": [
            {
              "type": "text",
              "text": "What do you see?"
            },
            {
              "type": "image_url",
              "image_url": {
                "url": "data:image/jpeg;base64,{base64_image}"
              }
            }
          ]
        }
      ],
      "model": "MODEL_ID",
      "stream": false,
      "frequency_penalty": 0.2,
      "max_tokens": 200
  }'

Generating Images on Tune Studio

Tune Studio doesn’t support Image Generation using the API however you can generate images on Tune Chat and Studio directly by deploying an Image Generation model as a Tune Assistant.

Here is how you can achieve that:

Deploy Image Generation Model as an Assistant on Tune Studio
Turn on Image Generation for the Assistant and Save Updates
Use Playground or Tune Chat for Image Generation

Getting Started

Concepts

Miscellaneous

Vision Completion