ollama LXC¶

Overview¶

Property	Value
Hostname	ollama
IP Address	192.168.0.231 (static since 2026-07-06, was DHCP - see proxmox doc 25)
VMID	108
OS	Ubuntu 24.04.3 LTS (Noble Numbat)
Kernel	6.17.4-1-pve
CPU	4 cores
RAM	8 GB
Swap	512 MB
Disk	35 GB (local-lvm, 77% used)
Purpose	Local LLM inference server

Running Services¶

Service	Description
`ollama.service`	Ollama LLM server
`ssh.service`	OpenSSH server
`rsyslog.service`	System logging

Open Ports¶

Port	Protocol	Service
22	TCP	SSH
11434	TCP	Ollama API (`OLLAMA_HOST=0.0.0.0`)

Ollama¶

Version: 0.15.5 Binary: /usr/local/bin/ollama

Installed Models¶

Model	Size
`nomic-embed-text:latest`	274 MB

Service Configuration¶

Setting	Value	Description
`OLLAMA_HOST`	`0.0.0.0`	Listen on all interfaces (network-accessible). Since 2026-07-06 set via systemd drop-in `/etc/systemd/system/ollama.service.d/network.conf` - a reboot had reverted the binding to localhost-only
`OLLAMA_INTEL_GPU`	`true`	Enable Intel GPU acceleration
`OLLAMA_NUM_GPU`	`999`	Use all available GPU layers
`OLLAMA_ORIGINS`	`*`	Allow requests from any origin (CORS)
`SYCL_CACHE_PERSISTENT`	`1`	Persistent SYCL kernel cache for Intel GPU
`ZES_ENABLE_SYSMAN`	`1`	Enable Intel oneAPI system management

Intel GPU Passthrough¶

The LXC has /dev/dri/card0 and /dev/dri/renderD128 passed through from the Proxmox host, enabling Intel GPU acceleration for inference. Ollama uses the Intel oneAPI SYCL backend.

/dev/dri/card0       - Intel graphics card
/dev/dri/renderD128  - Intel render device (used for GPU compute)

API Usage¶

The Ollama API is available at http://192.168.0.231:11434 from the local network.

Example:

curl http://192.168.0.231:11434/api/generate \
  -d '{"model": "llama3.1:8b", "prompt": "Hello"}'

Two Ollama Instances¶

This homelab runs two separate Ollama endpoints:

Instance	Address	GPU	Use case
Nobara workstation	`192.168.0.100:11434`	RTX 2060 Super	Primary AI inference - Karakeep tagging, Suggestarr LLM (qwen3:8b + nomic-embed-text)
This LXC	`192.168.0.231:11434`	Intel integrated (SYCL)	Backup / secondary endpoint (nomic-embed-text only)

Lessons Learned¶

Ubuntu instead of Debian: This LXC runs Ubuntu 24.04 rather than Debian 12. Ubuntu's wider hardware support package ecosystem made it easier to set up Intel GPU drivers and oneAPI toolkits.
Intel GPU acceleration in an LXC: Requires passing through /dev/dri/card0 and /dev/dri/renderD128 in the Proxmox config, plus the SYCL_CACHE_PERSISTENT and ZES_ENABLE_SYSMAN environment variables for the Intel SYCL backend. Without these, Ollama falls back to CPU-only inference.
Disk usage: Previously at 77% with llama3.1:8b (4.9 GB). After removing it and keeping only nomic-embed-text (274 MB), disk usage dropped significantly.
OLLAMA_ORIGINS=* is permissive: Allowing all CORS origins is convenient for local development but means any page loaded in a browser on the LAN can make requests to the Ollama API. Acceptable for a homelab but worth noting.