Skip to content

How to Run LLM Models on Ollama Using GPU

Using NVIDIA as an example, this guide explains the specific steps to run large models on Ollama in GPU mode.

1 Install NVIDIA Container Toolkit

Using Ubuntu 22.04 as an example (for other systems, please refer to the NVIDIA official documentation)

  • Configure the apt source
    curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
    && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
        sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
        sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
    
  • Update the source
    sudo apt-get update
    
  • Install the toolkit
    sudo apt-get install -y nvidia-container-toolkit
    

2 Run Ollama Using GPU

# Run the Ollama container in the background mode and allow the container to access all available NVIDIA GPUs on the host
docker run --gpus all -d -v /opt/ai/ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

3 Download Models Using Ollama

# Download and run models online
docker exec -it ollama ollama run qwen:7b

4 Add Ollama Models in MaxKB

Once the models are downloaded and the model service is running, you can add the corresponding models in MaxKB and use them.

Add Model