Kubernetes operators have been used more commonly throughout the Kubernetes
community to automate workflows, manage applications and infrastructure.
These usecases have produced complex operators which need to be tested properly
to ensure that they are working as intended.
Testing operators in practice has been a difficult task - unit tests can only
cover a limited scope of logic (even with mocks like
e2e tests (while offering a wider scope) are plagued by a common issue:
To understand why e2e tests for kubernetes operators are prone to flapping, we first need to understand what a kubernetes operator exactly is and which components it communicates with. The operator pattern shows us two reasons for the increased flappiness:
- Hard dependencies on external components
- Eventual consistency
Kubernetes offers some concepts to improve the stability of our operators. Finalizers to prevent most issues with missed deletion events and deletion inconsistencies. ResourceVersion as a way of preventing most race conditions between operators and finally the CR status as an effective means to failing fast during tests. Working on our own operators at giantswarm we additionally implemented some best practices to make testing more reliable:
- Always collect all possible logs during tests
- Do not wait more than a few seconds during reconciliation
- Do not get complacent about flapping, issues are often actual issues in your implementation.
Using these techniques and best practices, we were able to improve the reliability of our e2e tests massively. The goal of an easily and reliably testable kubernetes operator is unfortunately not quite reached yet.
Marcel is a Platform Engineer at Giant Swarm where he works on Kubernetes Operators and focuses on all things Release Engineering. He is passionate about infrastructure automation in distributed systems and building fault-tolerant software. Marcel is excited to share the learnings he has made along the way. During the last two years he has been closely following Kubernetes upstream developments such as Cluster-API and KIND. read more