How do you handle storage and load balancers when running k3s yourself on cloud VMs? Do you install the cloud provider's controllers? If yes, how is this better than the EKS model?
Do you have any automation for Day2 such as certificate updates for the control plane and kubelets, etc.?
How does this approach solve for the mysterious connection disruptions with RDS? Both EKS managed nodes and EC2s share the same networking model, right?
Not meaning to be cynical but would love to understand how you handle these with k3s.
Great questions. These are exactly the operational details that matter. On storage and load balancers, yes, we use the cloud provider’s CSI drivers and CCM. That does create a dependency similar to EKS, but the tradeoff is flexibility and cost control. We decide when to upgrade, avoid the per-cluster EKS fee, and aren’t tied into AWS-specific patterns. We’ve even mixed in MetalLB for internal services and local-path provisioning where it made sense. I’ll probably do a deep dive post on this because we’ve picked up some interesting patterns along the way.
For Day 2 operations, K3s handles a good chunk of cert rotation itself. Beyond that, we rely on Ansible/Terraform to make sure replacements and checks are automated. Our playbooks cover control plane certs, node cert validation, and etcd backups. The main difference from EKS is we own the whole process instead of leaving it to AWS’s black box, and that comes with both responsibility and freedom.
On RDS, you’re right. K3s doesn’t magically fix AWS networking quirks. We’ve had to use pooling with proper keepalives, RDS Proxy, and some tuning at the CNI level to smooth things out. The value for us is that we can actually trace through the full stack when things go wrong, instead of waiting on a support ticket.
The honest take is that running K3s on VMs doesn’t remove these issues. It just shifts where you handle them. For us, that tradeoff is worth it because we’ve built the muscle to manage it. For other teams, EKS’s abstraction might still be the better call.
On a curious note, are you running into specific pain points on EKS that made you ask?
I prefer a managed solution since I don't have to worry about ensuring it is running all the time. However, during failures, as you mentioned, it becomes hard to troubleshoot since a lot is behind the curtains. Even then, if I'm paying AWS enough, I have a neck to choke :)
My primary concern with EKS is the cost. If I want a small dev cluster, I have to run it at-least 2 AZs and the inter-az cost is where most of the money goes.
How do you handle storage and load balancers when running k3s yourself on cloud VMs? Do you install the cloud provider's controllers? If yes, how is this better than the EKS model?
Do you have any automation for Day2 such as certificate updates for the control plane and kubelets, etc.?
How does this approach solve for the mysterious connection disruptions with RDS? Both EKS managed nodes and EC2s share the same networking model, right?
Not meaning to be cynical but would love to understand how you handle these with k3s.
Great questions. These are exactly the operational details that matter. On storage and load balancers, yes, we use the cloud provider’s CSI drivers and CCM. That does create a dependency similar to EKS, but the tradeoff is flexibility and cost control. We decide when to upgrade, avoid the per-cluster EKS fee, and aren’t tied into AWS-specific patterns. We’ve even mixed in MetalLB for internal services and local-path provisioning where it made sense. I’ll probably do a deep dive post on this because we’ve picked up some interesting patterns along the way.
For Day 2 operations, K3s handles a good chunk of cert rotation itself. Beyond that, we rely on Ansible/Terraform to make sure replacements and checks are automated. Our playbooks cover control plane certs, node cert validation, and etcd backups. The main difference from EKS is we own the whole process instead of leaving it to AWS’s black box, and that comes with both responsibility and freedom.
On RDS, you’re right. K3s doesn’t magically fix AWS networking quirks. We’ve had to use pooling with proper keepalives, RDS Proxy, and some tuning at the CNI level to smooth things out. The value for us is that we can actually trace through the full stack when things go wrong, instead of waiting on a support ticket.
The honest take is that running K3s on VMs doesn’t remove these issues. It just shifts where you handle them. For us, that tradeoff is worth it because we’ve built the muscle to manage it. For other teams, EKS’s abstraction might still be the better call.
On a curious note, are you running into specific pain points on EKS that made you ask?
Thanks for providing a detailed answer.
I prefer a managed solution since I don't have to worry about ensuring it is running all the time. However, during failures, as you mentioned, it becomes hard to troubleshoot since a lot is behind the curtains. Even then, if I'm paying AWS enough, I have a neck to choke :)
My primary concern with EKS is the cost. If I want a small dev cluster, I have to run it at-least 2 AZs and the inter-az cost is where most of the money goes.
Learnt a lot , Because of you i became a big fan of k3s , and also what do you think about the k0s ??