Migrating my homelab from Swarmpit to uncloud
Just before Christmas, I spent an hour or so setting up uncloud on my homelab, and I am stunned at how easy it was to get working.
The motivation for doing this is because I’ve known for a long time that Swarmpit is basically abandoned. Disappointing, but true. The latest release of DietPi, my preferred distro for my Raspberry Pi and RockChip SBCs, included an update to docker and docker-compose completely broke all operability with Swarmpit. Queue panicked hunting for alternatives and a fortuitous discovery of Uncloud
Here's what I've done:
- Added a wildcard DNS record pointing
*.suranyami.comto my dynamic DNS address:suranyami.duckdns.org. - Installed
tailscaleon each of the machines (Installation Instructions), and connected them to my free tailnet (free tier allows up to 100 nodes). This gives me a stable URL for each individual machine that I can SSH into without needing to do NAT redirection on the router. For instance, my machine callednode1is available to me (and only me) atssh dietpi@node1.tailxxxxx.ts.net. - Updated my
~/.ssh/configwith entries for all the machines that look like this:
Host node1
Hostname node1.tailxxxxx.ts.net
User dietpi
- Installed uncloud on my laptop:
curl -fsS https://get.uncloud.run/install.sh | sh - Initialized the cluster by picking one of the above machines as a first server:
uc machine init dietpi@node1.tailxxxxx.ts.net --name node1 - Add other machines using
uc machine add dietpi@node2.tailxxxxx.ts.net --name node2 - Deploy services using
uc deploy -f plex.ymlwhereplex.ymlis a subset of adocker-composefile, but with minor changes. For instance, to deploy to a specific machine (which I have to do because I need to redirect port 32400 from the router to a specific machine, because plex is annoying like that), I do this:
services:
plex:
image: linuxserver/plex:arm64v8-latest
# ...
x-machines:
- node2
x-ports:
- 32400:32400@host
- plex.suranyami.com:32400/https
And that's about it. No reverse-proxy configuration, no manual entry of IP addresses, everything is just automatically given a letsencrypt SSL certificate and load-balanced to wherever the servers are running.
This is honestly the easiest way to self-host anything I've found.
It's been 2 weeks or so now, and now that I've got the knack of the x-ports port-mapping syntax, I've also managed to get all my other services running everywhere.
Notable edge cases were:
Minecraft
x-ports:
- 25565:25565@host
Plex
x-ports:
- 32400:32400@host
- plex.suranyami.com:32400/https
Needed 2 mappings, one for the internal subnet for use by the AppleTV, because of some idiosyncrasy of the way the native Plex app works with behind the NAT versus over t'interwebz.
Jellyfin
x-ports:
- 1900:1900@host
- 7359:7359@host
- jellyfin.suranyami.com:8096/https
Only outages I've had so far were purely hardware-related: robo-vacuum somehow knocked out a power cord that was already loose… derp. That won't happen again. And, the fan software wasn't installed on my RockPi 4 NAS box, so it overheated and shut down. Fixed that this morning.
global deployment
I'm currently using Netdata to monitor my nodes. It's WAY overkill for what I'm running, but hey, whatever. For this we need to do a global deployment:
services:
netdata:
image: netdata/netdata:latest
hostname: "{{.Node.Hostname}}"
# ...
volumes:
# ...
- /etc/hostname:/host/etc/hostname:ro
deploy:
mode: global
This is essentially the same as a normal docker-swarm compose file, but because it's not actually docker-swarm, this line is a hack to get the hostname: - /etc/hostname:/host/etc/hostname:ro.
There is also a quirk that (hopefully) might be fixed in future versions of uncloud: the volumes don't get created automatically on each machine. For that I had to execute a bunch of uc volume create commands like this:
c volume create netdataconfig -m node2
uc volume create netdataconfig -m node3
uc volume create netdataconfig -m node4
uc volume create netdatalib -m node2
uc volume create netdatalib -m node3
uc volume create netdatalib -m node4
uc volume create netdatacache -m node2
uc volume create netdatacache -m node3
uc volume create netdatacache -m node4
Replicated deployment
One very nice feature is replicated deployment with automatic load balancing. There's not a lot of documentation about how it works at the moment, so I'm a bit suss on it, but essentially it looks like this in the compose file:
deploy:
mode: replicated
replicas: 4
This will cause it to pick a random set of machines and deploy a container on each, and load-balance incoming requests.
There are caveats to this, of course. The service configuration will need to be on a shared volume, for instance, and some services do NOT behave well in this situation. plex is the worst example of this… if you store its configuration, caches and DB on a shared volume, you are gonna have a very bad time indeed because of race-conditions, non-atomicity, file corruption etc.
Which is a shame, because Plex is the service I'd most like to be replicated. I dunno what the solution is. Use something other than Plex seems like the most obvious answer, but as far as I know the alternatives have the same issue.