How to Run LLM Models on Ollama Using GPU¶
Using NVIDIA as an example, this guide explains the specific steps to run large models on Ollama in GPU mode.
1 Install NVIDIA Container Toolkit¶
Using Ubuntu 22.04 as an example (for other systems, please refer to the NVIDIA official documentation)
- Configure the apt source
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \ && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \ sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \ sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
- Update the source
sudo apt-get update
- Install the toolkit
sudo apt-get install -y nvidia-container-toolkit
2 Run Ollama Using GPU¶
# Run the Ollama container in the background mode and allow the container to access all available NVIDIA GPUs on the host
docker run --gpus all -d -v /opt/ai/ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
3 Download Models Using Ollama¶
# Download and run models online
docker exec -it ollama ollama run qwen:7b
4 Add Ollama Models in MaxKB¶
Once the models are downloaded and the model service is running, you can add the corresponding models in MaxKB and use them.