Tune Studio supports deploying a host of multimodal LLMs which includes text and vision models.

Deploying

We have a large list of popular base models pre configured to run with a single click just provide a name for the model and it’s deployed. If you are trying to deploy a custom model that is not present in the library of base models you can go the deploy section inside models tabs and click on more which would should you bucket(S3, GCS) and hugging face as an option to deploy the models.

Deploying via Huggingface

Note: If you’re deploying a private Hugging Face model, ensure that your Hugging Face key is integrated into the platform.

Below are two examples of deploying a text and a vision model.

  1. Click on the Deploy Model button.
  2. In the “Get Started” section, select “Hugging Face” or choose from our list of supported models. You can deploy vision such as “liuhaotian/llava-v1.6-vicuna-7b”.
  3. If you select Hugging Face, enter the Hugging Face repository URL. (This option appears only if you’ve chosen Hugging Face in the previous step.)
  4. Enter a name for the model, which will be used for the completion API.
  5. Select the desired GPU resources and count.
  6. Click on “Advanced” to configure optional settings like input_token_limit, auto_shutdown, and prompt_format.
  7. Finally, click “Deploy” to initiate the deployment process.

Deploying a custom vision model is only supported through the API but some vision models that are in base models can directly be deployed through Studio UI

Model Deployment Process:

  1. After initiating the deployment, a model card will be created in the “Model” tab.
  2. It may take some time for the model to be fully deployed and ready for use.
  3. Once the model is ready, you can utilize it in the Tune.ai Playground or through the Chat Completion API.

Deploying via S3

Full path is required which looks like gs:// for GCS and and s3:// for S3.

Deployment States

When a model is deployed on Tune Studio it goes through a series of states which are helpful in understanding the state of the model.

StateDescription
PROVISIONINGThe model is being created on k8s.
ENV_SETUPRunning the init container.
AWAITING_STARTUPStarting the model container.
LIVENESS_CHECKChecking the liveness of the model.
READYThe model is ready for use.
FAILEDThe model is in an error state.
TERMINATEDThe model is terminated and can be restarted.

Troubleshooting

Any model you deploy on tune studio outputs logs when starting up which can be looked at by clicking the model card and going to Server Logs which helps with debugging if a model is not starting up, you have an assortment of log level and filter types that you can use to search for something specific and also also a search bar to search for logs with text itself

The server logs section also shows a log of API calls and any other metric that you might see on a LLM server.

Metrics

Studio stores a host of server metrics that shows you how the deployed model has been used over time with graphs and useful insights inside the Server Metrics tab inside the model card.

The metrics include CPU usage, memory usage, file descriptors, garbage collection statistics, and other system-level information.

API Logs

Logging is turned off by default so you need to go to settings inside the model card and enable Log API Requests, once that is done your logs will start showing up in API Logs tab where you can use the variety of filter to search and sort the call according to your needs.

You can download training data directly from the API Logs using Download.