Skip to content

ollama LXC

Overview

Property Value
Hostname ollama
IP Address 192.168.0.231
VMID 108
OS Ubuntu 24.04.3 LTS (Noble Numbat)
Kernel 6.17.4-1-pve
CPU 4 cores
RAM 8 GB
Swap 512 MB
Disk 35 GB (local-lvm, 77% used)
Purpose Local LLM inference server

Running Services

Service Description
ollama.service Ollama LLM server
ssh.service OpenSSH server
rsyslog.service System logging

Open Ports

Port Protocol Service
22 TCP SSH
11434 TCP Ollama API (OLLAMA_HOST=0.0.0.0)

Ollama

Version: 0.15.5 Binary: /usr/local/bin/ollama

Installed Models

Model Size
llama3.1:8b 4.9 GB

Service Configuration

Setting Value Description
OLLAMA_HOST 0.0.0.0 Listen on all interfaces (network-accessible)
OLLAMA_INTEL_GPU true Enable Intel GPU acceleration
OLLAMA_NUM_GPU 999 Use all available GPU layers
OLLAMA_ORIGINS * Allow requests from any origin (CORS)
SYCL_CACHE_PERSISTENT 1 Persistent SYCL kernel cache for Intel GPU
ZES_ENABLE_SYSMAN 1 Enable Intel oneAPI system management

Intel GPU Passthrough

The LXC has /dev/dri/card0 and /dev/dri/renderD128 passed through from the Proxmox host, enabling Intel GPU acceleration for inference. Ollama uses the Intel oneAPI SYCL backend.

/dev/dri/card0       - Intel graphics card
/dev/dri/renderD128  - Intel render device (used for GPU compute)

API Usage

The Ollama API is available at http://192.168.0.231:11434 from the local network.

Example:

curl http://192.168.0.231:11434/api/generate \
  -d '{"model": "llama3.1:8b", "prompt": "Hello"}'

Two Ollama Instances

This homelab runs two separate Ollama endpoints:

Instance Address GPU Use case
Nobara workstation 192.168.0.100:11434 RTX 2060 Super Heavy/fast inference (Karakeep AI tagging)
This LXC 192.168.0.231:11434 Intel integrated (SYCL) Lightweight / secondary endpoint

Lessons Learned

  • Ubuntu instead of Debian: This LXC runs Ubuntu 24.04 rather than Debian 12. Ubuntu's wider hardware support package ecosystem made it easier to set up Intel GPU drivers and oneAPI toolkits.
  • Intel GPU acceleration in an LXC: Requires passing through /dev/dri/card0 and /dev/dri/renderD128 in the Proxmox config, plus the SYCL_CACHE_PERSISTENT and ZES_ENABLE_SYSMAN environment variables for the Intel SYCL backend. Without these, Ollama falls back to CPU-only inference.
  • Disk usage at 77%: With a 4.9 GB model and 35 GB disk, there is room for 1-2 more medium-sized models before the disk fills up. Each additional 7B model requires ~4-5 GB.
  • OLLAMA_ORIGINS=* is permissive: Allowing all CORS origins is convenient for local development but means any page loaded in a browser on the LAN can make requests to the Ollama API. Acceptable for a homelab but worth noting.