ollama LXC¶

Overview¶

Property	Value
Hostname	ollama
IP Address	192.168.0.231
VMID	108
OS	Ubuntu 24.04.3 LTS (Noble Numbat)
Kernel	6.17.4-1-pve
CPU	4 cores
RAM	8 GB
Swap	512 MB
Disk	35 GB (local-lvm, 77% used)
Purpose	Local LLM inference server

Running Services¶

Service	Description
`ollama.service`	Ollama LLM server
`ssh.service`	OpenSSH server
`rsyslog.service`	System logging

Open Ports¶

Port	Protocol	Service
22	TCP	SSH
11434	TCP	Ollama API (`OLLAMA_HOST=0.0.0.0`)

Ollama¶

Version: 0.15.5 Binary: /usr/local/bin/ollama

Installed Models¶

Model	Size
`llama3.1:8b`	4.9 GB

Service Configuration¶

Setting	Value	Description
`OLLAMA_HOST`	`0.0.0.0`	Listen on all interfaces (network-accessible)
`OLLAMA_INTEL_GPU`	`true`	Enable Intel GPU acceleration
`OLLAMA_NUM_GPU`	`999`	Use all available GPU layers
`OLLAMA_ORIGINS`	`*`	Allow requests from any origin (CORS)
`SYCL_CACHE_PERSISTENT`	`1`	Persistent SYCL kernel cache for Intel GPU
`ZES_ENABLE_SYSMAN`	`1`	Enable Intel oneAPI system management

Intel GPU Passthrough¶

The LXC has /dev/dri/card0 and /dev/dri/renderD128 passed through from the Proxmox host, enabling Intel GPU acceleration for inference. Ollama uses the Intel oneAPI SYCL backend.

/dev/dri/card0       - Intel graphics card
/dev/dri/renderD128  - Intel render device (used for GPU compute)

API Usage¶

The Ollama API is available at http://192.168.0.231:11434 from the local network.

Example:

curl http://192.168.0.231:11434/api/generate \
  -d '{"model": "llama3.1:8b", "prompt": "Hello"}'

Two Ollama Instances¶

This homelab runs two separate Ollama endpoints:

Instance	Address	GPU	Use case
Nobara workstation	`192.168.0.100:11434`	RTX 2060 Super	Heavy/fast inference (Karakeep AI tagging)
This LXC	`192.168.0.231:11434`	Intel integrated (SYCL)	Lightweight / secondary endpoint

Lessons Learned¶

Ubuntu instead of Debian: This LXC runs Ubuntu 24.04 rather than Debian 12. Ubuntu's wider hardware support package ecosystem made it easier to set up Intel GPU drivers and oneAPI toolkits.
Intel GPU acceleration in an LXC: Requires passing through /dev/dri/card0 and /dev/dri/renderD128 in the Proxmox config, plus the SYCL_CACHE_PERSISTENT and ZES_ENABLE_SYSMAN environment variables for the Intel SYCL backend. Without these, Ollama falls back to CPU-only inference.
Disk usage at 77%: With a 4.9 GB model and 35 GB disk, there is room for 1-2 more medium-sized models before the disk fills up. Each additional 7B model requires ~4-5 GB.
OLLAMA_ORIGINS=* is permissive: Allowing all CORS origins is convenient for local development but means any page loaded in a browser on the LAN can make requests to the Ollama API. Acceptable for a homelab but worth noting.