[Important Security Notice] Fake Qfinder Pro Websites Detected. Learn more >

How to deploy Ollama on the QNAP NAS using Container Station


Last modified date: 2026-05-08

Applicable Products

QTS, QuTS hero

Container Station


Scenario

You want to run large language models (LLMs) locally on your QNAP NAS for private AI chat, code assistance, or document analysis without sending data to the cloud. Ollama is the most popular and beginner-friendly inference engine for this purpose. This tutorial explains how to deploy Ollama on the QNAP NAS using Container Station.


System Requirements

RequirementDetail
QNAP AppContainer Station 3.x or later
NAS Architecturex86_64 (Intel or AMD CPU)
(Only a few ARM-based models are supported.)
Memory 
  • At least 8 GB (for 3B models)
  • At least 16 GB (for 7B models)
Storage Space
  • At least 20 GB of extra free storage space in addition to the model file size.
  • Use an SSD volume if possible.
GPU (Optional)Compatible NVIDIA GPUs (see the compatibility list)
GPU should be set to Container Station Mode in the Control Panel.
Warning
OOM (Out of Memory) Risk
Ollama will attempt to load the entire model into memory by default. If your NAS has only 8–16 GB of RAM, loading a 14B or larger model may exhaust system memory, causing NAS services to become unresponsive or the system to restart.

Data Loss Risk
If you do not mount a persistent volume for /root/.ollama, all downloaded models and configuration will be lost when the container is removed or recreated. Always follow the volume mounting instructions in this tutorial.
Best Practice
  • Check the model size against your available RAM in advance.
  • Set memory limits to cap container memory usage.
  • Start with small models (1B or 3B) and assess system stability before attempting larger models.

Procedure

Method 1: CPU-Only Deployment (No GPU Required)

  1. Create storage folders.

    Open File Station and create the following folder to store Ollama model data:
    /share/Container/ollama

    Screenshot: File Station — creating the Ollama folder under /share/Container/

    Best Practice
    If your NAS has an NVMe SSD cache or SSD volume, create this folder on the SSD. Model loading speed improves by up to 10 times compared to HDDs.

  2. Create a Docker Compose file.

    In Container Station, go to Applications > Create. Name the application ollama and paste the following YAML:

    version: "3.8"
    
    services:
      ollama:
        image: ollama/ollama:latest
        container_name: ollama
        restart: unless-stopped
        ports:
          - "11434:11434"
        volumes:
          - /share/Container/ollama:/root/.ollama
        environment:
          - OLLAMA_HOST=0.0.0.0
          - OLLAMA_KEEP_ALIVE=10m
        networks:
          - ai-network
    
    networks:
      ai-network:
        name: ai-network
        driver: bridge

    Screenshot: Container Station — Application creation screen with YAML editor
    Screenshot: Container Station — Set the memory limit

  3. Deploy the container.

    Click Create. Container Station will pull the Ollama image and start the container. Wait until the status shows Running.

    Screenshot: Container Station — ollama container showing "Running" status

  4. Pull your first model.

    Open the container's Terminal (or SSH into your NAS and exec you Docker) and run:

    # For a lightweight 3B model (recommended for first test):
    ollama pull llama3.2:3b
    
    # For a standard 9B model (requires 16 GB+ RAM):
    ollama pull qwen3.5:9b

    The download may take several minutes depending on your internet speed. A 9B Q4_K_M model is approximately 4–7 GB.

    Note
    • Verify you have sufficient disk space before pulling. Use ollama list to check existing models and their sizes.
    • For ARM-based NAS models, we recommend starting with the <1B model to monitor memory usage. 
  5. Test the model.

    In the container terminal, run:

    ollama run qwen3.5:9b

    Type a prompt and confirm that you receive a response. Type /bye to exit.


Method 2: NVIDIA GPU-Accelerated Deployment

Note

Additional prerequisites for GPU Mode:

  • NVIDIA GPU installed and detected by QTS/QuTS hero
  • GPU set to Container Station Mode in the Control Panel
  • NVIDIA GPU Driver and NvKernelDriver installed from App Center
  1. Use the GPU-enabled Docker Compose configuration.

    Replace the YAML from Method 1 with the following:

    version: "3.8"
    
    services:
      ollama:
        image: ollama/ollama:latest
        container_name: ollama
        restart: unless-stopped
        ports:
          - "11434:11434"
        volumes:
          - /share/Container/ollama:/root/.ollama
        environment:
          - OLLAMA_HOST=0.0.0.0
          - OLLAMA_KEEP_ALIVE=10m
          - NVIDIA_VISIBLE_DEVICES=all
        deploy:
          resources:
            reservations:
              devices:
                - driver: nvidia
                  count: all
                  capabilities: [gpu]
        networks:
          - ai-network
    
    networks:
      ai-network:
        name: ai-network
        driver: bridge

    Screenshot: Container Station — GPU-enabled Docker Compose YAML

    Note
    QNAP's bundled NVIDIA drivers may be older than the latest release. If the container fails to start with GPU enabled, check the driver version with nvidia-smi on the host and ensure it is compatible with the Ollama image version

  2. Deploy and verify GPU access.

    After the container starts, open its terminal and run:

    nvidia-smi

    You should see your GPU model, driver version, and memory information.

  3. Pull a model and confirm GPU acceleration.

    ollama pull qwen3.5:9b
    ollama run qwen3.5:9b

    While the model is generating a response, open another terminal and run nvidia-smi. You should observe GPU memory usage and GPU utilization increasing. 

Result

After completing this tutorial, you will have:

  • Ollama running on your QNAP NAS at http://<NAS-IP>:11434
  • Model data persisted in /share/Container/ollama (survives container rebuilds)
  • A working LLM accessible via the Ollama API

You can test the API from any device on your local network:

curl http://<NAS-IP>:11434/api/generate -d '{
  "model": "qwen3.5:9b",
  "prompt": "Hello, how are you?",
  "stream": false
}'

Important
The OLLAMA_HOST=0.0.0.0 setting exposes the Ollama API on all network interfaces. Do not expose port 11434 to the internet. Use firewall rules or QNAP's network settings to restrict access to your local network only.

Troubleshooting

Container exits immediately after starting

This may be caused by insufficient RAM or GPU driver mismatch. Check container logs in Container Station. Reduce memory limitations or disable GPU mode.

Model pull fails midway 

This may result from insufficient disk space or network timeout. Try to free up storage space. Re-run ollama pull; the system resumes from where it stopped.

Response speed is very slow (1–3 tokens per second)

The model may be running on CPU instead of GPU, or the model is too large for your RAM. Verify your GPU access with nvidia-smi inside the container. Try to use a smaller model.

The NAS becomes unresponsive during inference

This can be an "out of memory" issue: the model is consuming all the system memory. We recommend restarting the NAS. Set a memory usage limit in the application. Or use a smaller model.

"Cannot connect to Ollama" message from Open WebUI

This may be caused by a wrong API URL or Docker network isolation. You can use http://ollama:11434 if you are on the same Docker network.


Was this article helpful?

100% of people think it helps.
Thank you for your feedback.

Please tell us how this article can be improved:

If you want to provide additional feedback, please include it below.

Choose specification

      Show more Less
      Choose Your Country or Region
      open menu
      back to top