AI Tools

Ollama + Open WebUI

Inspired to setup the stack by this Reddit thread: The year is 2024, self hosted LLM is insane

Run model with Ollama via CLI: docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

Download models from model libraries:

Ollama’s own Model Library
Hugging Face -> Use Ollama with any GGUF Model on Hugging Face Hub

Multiple interface options

Interact from CLI: docker exec -it ollama ollama run mistral:7b-instruct
ChatGPT-like interface: Open WebUI. Configuration:
- The first user logs in will be the admin user, this user will config signup of the rest of users
- Use the cog icon to pull a model
- Modelfiles (“personalities”): https://openwebui.com/modelfiles

Access Ollama running on a firewalled machine

Special use case: Ollama is running on a machine which is not accepting incoming connections, but from that machine you can connect the host, where the application is running which want to use Ollama.

Configure SSH GatewayPorts option on the API consumer machine.
Start reverse SSH tunnel from ollama’s host: ssh -N -o 'ExitOnForwardFailure yes' -R *:11444:localhost:11434 <user>@<consumer-host>
Use endpoint localhost:11444 on the API consumer machine.

Man sshd -> GatewayPorts:

Specifies whether remote hosts are allowed to connect to ports forwarded for the client. By default, sshd(8) binds remote port forwardings to the loopback address. This prevents other remote hosts from connecting to forwarded ports. GatewayPorts can be used to specify that sshd should allow remote port forwardings to bind to non-loopback addresses, thus allowing other hosts to connect. The argument may be no to force remote port forwardings to be available to the local host only, yes to force remote port forwardings to bind to the wildcard address, or clientspecified to allow the client to select the address to which the forwarding is bound. The default is no.

Task to start port forwarding (execute on the machine hosting Ollama):

task app:ollama-port-forward

Access Ollama with OpenAI compatible API

Ollama’s OpenAI-Compatible API https://www.reddit.com/r/LocalLLaMA/comments/1apvtwo/ollamas_openaicompatible_api_and_using_it_with/

As of v0.1.24, Ollama’s API endpoint is compatible with OpenAI’s API, i.e. any code that worked with the OpenAI API chat/completions will now work with your locally running ollama LLM by simply setting the api_base to http://localhost:11434/v1

Alternative solution

Deploying LiteLLM Proxy

Useful for providing a single endpoint to multiple services.

Example: litellm --model mistral:7b-instruct --api_base https://ollama.example.com --temperature 0.6 --max_tokens 2048

Example - with Docker: docker run -it --rm --name litellm-proxy -p 8888:8000 ghcr.io/berriai/litellm:main-latest --model mistral:7b-instruct --api_base https://ollama.example.com --temperature 0.6 --max_tokens 2048

Ollama API

https://github.com/ollama/ollama/blob/main/docs/api.md

Examples:

curl http://localhost:11434/api/generate -d '{
  "model": "mistral:7b-instruct",
  "prompt": "Why is the sky blue?"
}'