Polyglot developer, geometric tessellation fan, ambient DJ.

Migrating my homelab from Swarmpit to uncloud

Just before Christmas, I spent an hour or so setting up uncloud on my homelab, and I am stunned at how easy it was to get working.

The motivation for doing this is because I’ve known for a long time that Swarmpit is basically abandoned. Disappointing, but true. The latest release of DietPi, my preferred distro for my Raspberry Pi and RockChip SBCs, included an update to docker and docker-compose completely broke all operability with Swarmpit. Queue panicked hunting for alternatives and a fortuitous discovery of Uncloud

Here's what I've done:

Host node1
  Hostname node1.tailxxxxx.ts.net
  User dietpi
services:
plex:
  image: linuxserver/plex:arm64v8-latest
# ...
  x-machines:
    - node2
  x-ports:
    - 32400:32400@host
    - plex.suranyami.com:32400/https

And that's about it. No reverse-proxy configuration, no manual entry of IP addresses, everything is just automatically given a letsencrypt SSL certificate and load-balanced to wherever the servers are running.

This is honestly the easiest way to self-host anything I've found.

It's been 2 weeks or so now, and now that I've got the knack of the x-ports port-mapping syntax, I've also managed to get all my other services running everywhere.

Notable edge cases were:

Minecraft

x-ports:
  - 25565:25565@host

Plex

x-ports:
  - 32400:32400@host
  - plex.suranyami.com:32400/https

Needed 2 mappings, one for the internal subnet for use by the AppleTV, because of some idiosyncrasy of the way the native Plex app works with behind the NAT versus over t'interwebz.

Jellyfin

x-ports:
    - 1900:1900@host
    - 7359:7359@host
    - jellyfin.suranyami.com:8096/https

Only outages I've had so far were purely hardware-related: robo-vacuum somehow knocked out a power cord that was already loose… derp. That won't happen again. And, the fan software wasn't installed on my RockPi 4 NAS box, so it overheated and shut down. Fixed that this morning.

global deployment

I'm currently using Netdata to monitor my nodes. It's WAY overkill for what I'm running, but hey, whatever. For this we need to do a global deployment:

services:
  netdata:
    image: netdata/netdata:latest
    hostname: "{{.Node.Hostname}}"
# ...
    volumes:
# ...
      - /etc/hostname:/host/etc/hostname:ro
    deploy:
      mode: global

This is essentially the same as a normal docker-swarm compose file, but because it's not actually docker-swarm, this line is a hack to get the hostname: - /etc/hostname:/host/etc/hostname:ro.

There is also a quirk that (hopefully) might be fixed in future versions of uncloud: the volumes don't get created automatically on each machine. For that I had to execute a bunch of uc volume create commands like this:

c volume create netdataconfig -m node2
uc volume create netdataconfig -m node3
uc volume create netdataconfig -m node4
uc volume create netdatalib -m node2
uc volume create netdatalib -m node3
uc volume create netdatalib -m node4
uc volume create netdatacache -m node2
uc volume create netdatacache -m node3
uc volume create netdatacache -m node4

Replicated deployment

One very nice feature is replicated deployment with automatic load balancing. There's not a lot of documentation about how it works at the moment, so I'm a bit suss on it, but essentially it looks like this in the compose file:

    deploy:
      mode: replicated
      replicas: 4

This will cause it to pick a random set of machines and deploy a container on each, and load-balance incoming requests.

There are caveats to this, of course. The service configuration will need to be on a shared volume, for instance, and some services do NOT behave well in this situation. plex is the worst example of this… if you store its configuration, caches and DB on a shared volume, you are gonna have a very bad time indeed because of race-conditions, non-atomicity, file corruption etc.

Which is a shame, because Plex is the service I'd most like to be replicated. I dunno what the solution is. Use something other than Plex seems like the most obvious answer, but as far as I know the alternatives have the same issue.

Discuss...