How to deploy Ollama on the QNAP NAS using Container Station

Last modified date: 2026-05-08

Applicable Products

QTS, QuTS hero

Container Station

Scenario

You want to run large language models (LLMs) locally on your QNAP NAS for private AI chat, code assistance, or document analysis without sending data to the cloud. Ollama is the most popular and beginner-friendly inference engine for this purpose. This tutorial explains how to deploy Ollama on the QNAP NAS using Container Station.

System Requirements

Requirement	Detail
QNAP App	Container Station 3.x or later
NAS Architecture	x86_64 (Intel or AMD CPU) (Only a few ARM-based models are supported.)
Memory	At least 8 GB (for 3B models) At least 16 GB (for 7B models)
Storage Space	At least 20 GB of extra free storage space in addition to the model file size. Use an SSD volume if possible.
GPU (Optional)	Compatible NVIDIA GPUs (see the compatibility list) GPU should be set to Container Station Mode in the Control Panel.

Warning

OOM (Out of Memory) Risk
Ollama will attempt to load the entire model into memory by default. If your NAS has only 8–16 GB of RAM, loading a 14B or larger model may exhaust system memory, causing NAS services to become unresponsive or the system to restart.

Data Loss Risk
If you do not mount a persistent volume for /root/.ollama, all downloaded models and configuration will be lost when the container is removed or recreated. Always follow the volume mounting instructions in this tutorial.

Best Practice

Check the model size against your available RAM in advance.
Set memory limits to cap container memory usage.
Start with small models (1B or 3B) and assess system stability before attempting larger models.

Procedure

Method 1: CPU-Only Deployment (No GPU Required)

Create storage folders.
Open File Station and create the following folder to store Ollama model data:
/share/Container/ollama
Screenshot: File Station — creating the Ollama folder under /share/Container/
Best Practice
If your NAS has an NVMe SSD cache or SSD volume, create this folder on the SSD. Model loading speed improves by up to 10 times compared to HDDs.

Create a Docker Compose file.

In Container Station, go to Applications > Create. Name the application ollama and paste the following YAML:

version: "3.8"

services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    restart: unless-stopped
    ports:
      - "11434:11434"
    volumes:
      - /share/Container/ollama:/root/.ollama
    environment:
      - OLLAMA_HOST=0.0.0.0
      - OLLAMA_KEEP_ALIVE=10m
    networks:
      - ai-network

networks:
  ai-network:
    name: ai-network
    driver: bridge

Screenshot: Container Station — Application creation screen with YAML editor
Screenshot: Container Station — Set the memory limit

Deploy the container.
Click Create. Container Station will pull the Ollama image and start the container. Wait until the status shows Running.
Screenshot: Container Station — ollama container showing "Running" status
Pull your first model.
Open the container's Terminal (or SSH into your NAS and exec you Docker) and run:
```
# For a lightweight 3B model (recommended for first test):
ollama pull llama3.2:3b

# For a standard 9B model (requires 16 GB+ RAM):
ollama pull qwen3.5:9b
```
The download may take several minutes depending on your internet speed. A 9B Q4_K_M model is approximately 4–7 GB.
Note
- Verify you have sufficient disk space before pulling. Use ollama list to check existing models and their sizes.
- For ARM-based NAS models, we recommend starting with the <1B model to monitor memory usage.
Test the model.
In the container terminal, run:
```
ollama run qwen3.5:9b
```
Type a prompt and confirm that you receive a response. Type /bye to exit.

Method 2: NVIDIA GPU-Accelerated Deployment

Note

Additional prerequisites for GPU Mode:

NVIDIA GPU installed and detected by QTS/QuTS hero
GPU set to Container Station Mode in the Control Panel
NVIDIA GPU Driver and NvKernelDriver installed from App Center

Use the GPU-enabled Docker Compose configuration.

Replace the YAML from Method 1 with the following:

version: "3.8"

services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    restart: unless-stopped
    ports:
      - "11434:11434"
    volumes:
      - /share/Container/ollama:/root/.ollama
    environment:
      - OLLAMA_HOST=0.0.0.0
      - OLLAMA_KEEP_ALIVE=10m
      - NVIDIA_VISIBLE_DEVICES=all
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    networks:
      - ai-network

networks:
  ai-network:
    name: ai-network
    driver: bridge

Screenshot: Container Station — GPU-enabled Docker Compose YAML

Note

QNAP's bundled NVIDIA drivers may be older than the latest release. If the container fails to start with GPU enabled, check the driver version with nvidia-smi on the host and ensure it is compatible with the Ollama image version

Deploy and verify GPU access.
After the container starts, open its terminal and run:
```
nvidia-smi
```
You should see your GPU model, driver version, and memory information.
Pull a model and confirm GPU acceleration.
```
ollama pull qwen3.5:9b
ollama run qwen3.5:9b
```
While the model is generating a response, open another terminal and run nvidia-smi. You should observe GPU memory usage and GPU utilization increasing.

Result

After completing this tutorial, you will have:

Ollama running on your QNAP NAS at http://<NAS-IP>:11434
Model data persisted in /share/Container/ollama (survives container rebuilds)
A working LLM accessible via the Ollama API

You can test the API from any device on your local network:

curl http://<NAS-IP>:11434/api/generate -d '{
  "model": "qwen3.5:9b",
  "prompt": "Hello, how are you?",
  "stream": false
}'

Important

The OLLAMA_HOST=0.0.0.0 setting exposes the Ollama API on all network interfaces. Do not expose port 11434 to the internet. Use firewall rules or QNAP's network settings to restrict access to your local network only.

Troubleshooting

Container exits immediately after starting

This may be caused by insufficient RAM or GPU driver mismatch. Check container logs in Container Station. Reduce memory limitations or disable GPU mode.

Model pull fails midway

This may result from insufficient disk space or network timeout. Try to free up storage space. Re-run ollama pull; the system resumes from where it stopped.

Response speed is very slow (1–3 tokens per second)

The model may be running on CPU instead of GPU, or the model is too large for your RAM. Verify your GPU access with nvidia-smi inside the container. Try to use a smaller model.

The NAS becomes unresponsive during inference

This can be an "out of memory" issue: the model is consuming all the system memory. We recommend restarting the NAS. Set a memory usage limit in the application. Or use a smaller model.

"Cannot connect to Ollama" message from Open WebUI

This may be caused by a wrong API URL or Docker network isolation. You can use http://ollama:11434 if you are on the same Docker network.

Was this article helpful?

Yes. No.

100% of people think it helps.

Please tell us how this article can be improved:

The article is missing important information
The article's solutions do not work
The article is too complicated
The article contains incorrect information
The article is out-of-date

If you want to provide additional feedback, please include it below.