Tune Chat is our web UI that lets you chat with different LLMs, including popular open source LLMs and your own custom LLMs.

You can use Tune Chat and Tune Studio separately or together. Models you add in Tune Studio also appear in Tune Chat.

Adding a custom model to Tune Chat

To add a custom model in Tune Chat, simply add it in Tune Studio. It will then appear in the list of models you can select in Tune Chat.

Configuring models in Tune Chat

In Tune Chat you can configure the models you chat with directly from the UI. In the model list, choose the settings icon next to the model that you want to configure.

You’ll see options to configure the System Prompt, Temperature, Frequency penalty, Max Tokens, and Stop Tokens.

System prompt

The system prompt guides the LLM output throughout all chat messages. It takes precedence over standard instructions given to the model in each chat, so it can be used to instruct the model to follow a specific style, avoid certain topics, or to control the LLM output in various ways.

Temperature

Temperature affects the output distribution of the model. Higher temperatures mean a more uniform distribution, which means you get more variety in output, and more diverse generations.

If you want more creativity (for example, you are generating poetry), you might get better results with a higher temperature. If you want more consistency (for example, you are generating structured JSON), you might get better results with a lower temperature, but the output might be less interesting and more repetitive.

Frequency penalty

Frequency penalty is sometimes confused with temperature, as both settings can affect how much the model repeats output. Frequency penalty specifically penalises tokens that have already been used, preventing the model generating output that was previously generated.

Max Tokens

Max tokens specifies the max length that the model outputs. A token is similar to a word, but note that generally an output has slightly fewer words than tokens - often 100 tokens is about 75 words. If you set this setting to 10, you’ll get very short outputs from the LLM of about 7-8 words each, and sometimes the response will clearly be incomplete. Using a longer max length lets the model create longer outputs, but these can take longer to produce.

Stop Tokens

If you want the model to stop on a specific word or token, you can enter it here.