K8s storage options
Option 0: Longhorn
Currently I have a 4 node Proxmox cluster that makes up my homelab. On that Proxmox cluster I have a k3s cluster. The k3s cluster is using Longhorn for storage, for a few months that was going well. However recently Longhorn volumes have started to be mounted as read-only. In order to fix the volume I've had to simple scale the deployment/sts down to 0 and then back up to 1.
After doing some research I found this page. When I checked the longhorn manager pod logs I didn't find the exact error that was mentioned, but something similar. Basically Longhorn was complaining that it couldn't remount a volume to read-write.
On that page it mentions that if your network is 1Gbps Longhorn can sustain about 3 volumes that are doing high IO. Currently I have 15 volumes, but they aren't high IO. I'm not sure if that's the issue, but I've seen a lot of people mention Longhorn wants 10Gbps networking. This issue lead me down looking at alternative options to Longhorn.
I did want to mention that I'm using Longhorn's v1 engine. There is info in the Longhorn docs about V2 which is supposed to be better, but I haven't tried it and right now it's not feature complete.
System requirements.
More info in the docs
- 4 vCPUs per node.
- 4 GiB Ram per node.
Here is an example of Longhorn running on a 3 node cluster over 10 hours.
Cpu usage around 0.2 cores.
Mem usage around 2.8 GiB.
Pros:
- By a big name (SUSE).
- Docs are good.
- Popular.
- Nice GUI.
Cons:
- High system requirements.
- Seems to really require 10Gbps networking, or you will run into RO issues.
Option 1: OpenEBS With Mayastor.
OpenEBS is a CNCF project that now has the Mayastor storage engine that is supposed to be faster than Longhorn. After looking into the project more I noticed 1 big issue that might cause me issues in the future.
The docs mention that Mayastore requires a disk to be used 100% by the Disk Pool. Right now my k3s nodes are VMs, which means I can easily create a virtual disk and use that for the disk pool. However I had thought in the future maybe I would just install k3s on the bare metal. If I did that I would have to move away from raid1 and use 1 disk for the OS and the 2nd disk for the disk pool.
A 2nd issue I noticed was that once a disk pool is created it can't be expanded. That would be an issue if I used smaller virtual disks that I want to expand later. It might mean that I would just have to add new disks, new Disk Pools, and then move the data to the new pools, but I couldn't say for sure.
Another interesting thing I found was that OpenEBS PV volume can be brtfs. I'm not sure what the benefits are, but it's interesting to see a COW filesystem being used for a PV.
OpenEBS with Mayastor seems to boast better performance than Longhorn, and the system requirements in the docs are lower. Currently Longhorn is the workload that uses the most resources in my small k3s cluster.
I found this blog post that compares Longhorn and Mayastor (and others), but it was early in Mayastor's development.
System requirements.
More info in the docs.
- 2 CPU cores per node.
- 1GiB Ram per node.
- Min 2Gib of 2MiB-sized HugePage support.
Pros:
- 1 Report shows that Mayastor is faster than Longhorn, and Piraeus/LINSTOR.
- Ability to use local storage for things like redis-operator, cloudnative-pg, etc.
- Less resources required than Longhorn.
Cons:
- Mayastor requires a disk to be used 100% by the Disk Pool.
Option 2: Piraeus with LINSTOR
Piraeus is another CNCF project that uses LINSTOR as the backend, LINSTOR uses DRBD.
Piraeus seems to work with by using LVM or ZFS on the host system to store the data. More info in the docs. By the way, one thing I like about OpenEBS over Piraeus is the documentation. OpenEBS has a lot of documentation, and it's easy to find what you're looking for.
I thing I found in the Piraeus docs that I did not like is that when you specify the devices to use for the storage
pools it [specifically mentions(https://piraeus.io/site/docs/storage/#preparing-physical-devices) that you should use the
device name, not any symlinks like /dev/disk/by-id/...
. Funny enough in the OpenEBS docs it mentions you should use
the device ID instead of the device name. Using the ID makes more sense since the name can change. In fact I had
this problem with one of my k3s VMs already. As a result Longhorn was failing because the wrong virtual drive
was mounted to /var/lib/longhorn
.
Another benefit of LINSTOR is that it seems that you can use LINSTOR with Proxmox too. Here is a doc on how you would use an Existing LINSTOR cluster: https://github.com/piraeusdatastore/piraeus-operator/blob/v2/docs/how-to/external-controller.md . So you could have LINSTOR on your Proxmox hosts, use it for Proxmox and k8s.
System requirements.
I couldn't find any system requirements for Piraeus, and not really for LINSTOR, other than here. On that page is says 32MiB of ram per 1TiB of storage. Currently I have 260GiB of storage, but I plan to add more. If I go with half a TiB, that would mean I only need 16MiB of ram. Seems very low, it would be interesting to see what it uses in practice.
Pros:
- LINSTOR can be used with Proxmox (and other tools).
- Can use replicated storage with LVM/ZFS on top of a drive, instead of requiring a dedicated drive.
- System requirements seem very low.
Cons:
- Documentation is not as good as OpenEBS.
- Not as well-known.
Next Steps
I think the next steps are for me to try out OpenEBS and Piraeus/LINSTOR. I think this really shows what happens often in my day job. Multiple technical solutions exist, promise lots, but once you dig into the docs, and get real world experience, you find out that there are trade-offs.
Seeing as my k3s nodes are in VMs, I think I will try OpenEBS first. In the future if I decide to move to bare metal I'll give Piraeus/LINSTOR a try.