Kubernetes FinOps: A Platform Engineer’s Perspective

Kubernetes FinOps: A Platform Engineer’s Perspective

john cobb ko52yOzjTQ unsplash

Image by john cobb ko52yOzjTQ unsplash

Kubernetes has delivered on its promise of allowing organizations to build, deploy and manage applications at scale with speed and efficiency. However, managing the costs of Kubernetes has become a challenge. According to the CNCF FinOps Report, most organizations are seeing Kubernetes costs increase by 20% or higher. Enterprises have struggled to balance rapid adoption with budgets and security considerations. The cultural changes, new methodologies and changing organizational structures provide additional layers of complexity with ownership and execution responsibilities. 

FinOps

The FinOps Foundation’s survey shows that around 62% of respondents are just starting to work through cost monitoring of their cloud environments, and still far away from implementing cost management controls. In extreme cases, this has not just stalled adoption but forced them to go back to square one. 

This is where FinOps comes in. FinOps, short for Financial Operations, is a set of practices that help organizations to manage and optimize their cloud costs. It involves collaborating with different teams within an organization, such as DevOps, finance, and business operations, to align the financial and technical aspects of cloud infrastructure. 

When Enterprises think about going cloud-native, FinOps typically comes as an afterthought only after having burnt their fingers  a few times over. Part of the reason is that traditionally, the responsibility for driving operational and cost efficiencies lies with the operations teams, which come much later into the picture. And the realization comes only with sticker shock of the growing monthly cloud bill that sends everyone scrambling. The solution is to include FinOps and governance best practices right at the planning and design phase. 

Applying FinOps to Kubernetes

Scaling Kubernetes environments efficiently requires that necessary constructs for financial and security governance are included right on Day-Zero. This is not possible without a focused effort and this is the key reason behind genesis platform engineering teams within enterprises, who are now taking the responsibility to deliver the secure developer-ready Kubernetes environments while driving cost efficiencies. The Gartner 2023 trends report calls out platform engineering as one of the most significant trends they see in scaling the adoption of cloud-native architecture. Here are some key considerations based on lessons learnt while helping platform engineering teams build and scale their Kubernetes environments: 

1. Establish a cost allocation strategy – Organizations need to have a clear understanding of how to allocate the costs of their Kubernetes environments. This includes defining the usage of different clusters and applications, and deciding on the appropriate billing and chargeback mechanisms. Building a resource labeling strategy that aligns with financial standards ensures that the resource costs are allocated to appropriate groups and that data can be used by both finance and engineering groups. By doing so, organizations can effectively track and allocate costs.  This requires central governance to ensure right labels are used across resources. This responsibility is increasingly being taken by platform engineering teams as they become the bridge between the finance department, security and development teams.

2. Track usage at cloud, cluster and application level
– This can be achieved through the use of tools such as Kubernetes resource metrics and cost reporting tools like CNCF’s Opencost. By tracking usage, organizations can identify areas where resources are being underutilized or overutilized, and make adjustments to optimize usage. These metrics can also be used to automate actions using cost policies –  triggers for resource scaling up or down, action when cost threshold is reached etc. The Platform Engineering team plays the central role in coordinating and driving the tooling for this.

3. Implement proactive resource optimization – Organizations need to understand the user requirements and application architecture, and tailor resource usage based on their requirement. E.g. while autoscaling (node, horizontal pod autoscaling, vertical pod autoscaling) is a must, not every application can take advantage of it. Same is true for spot instances. Namespace-as-a-Service is great for on-demand developer environments but may not be the right solution for production environments. That said, there are many optimization strategies that can be adopted upfront that will ensure optimal resource usage. Platform Engineering teams are using Policy-as-code, custom scripting and webhooks to implement these strategies.

4. Use Kubernetes constructs to optimize cloud resources – Kubernetes provides many capabilities which have to be implemented by default to ensure optimal resource utilization. E.g. ensure namespaces are created with quotas for CPU/memory by default. Ensure every pod has requests and limits by default. This ensures that every application is guaranteed to use the available resources and not cause the “noisy neighbor” problem and impact other applications. This requires applications to be sized upfront and also allows for guard rails to be implemented to ensure that application from one environment to the next is compliant with the resource requirements set for them. The platform engineering team needs to have guard rails in place to ensure that these configurations are validated or added as early in the development cycle as possible.

5. Implement Kubernetes multi-tenancy for optimal resource utilization – Kubernetes allows namespace to be used to isolate not only applications but also OS level resources like CPU, memory and storage. Leverage multi-tenancy best practices to isolate resources, networking and access for different applications. Think about leveraging Namespace-as-a-service for delivering Kubernetes ready environments while optimizing cloud costs. Policy-as-code can be leveraged to implement multi-tenancy in the clusters.

6. Implement resource management hygiene –  Are you still running pods and applications that you are no longer being used? Did a developer turn up an environment and forgot to shut it down? These issues contribute significantly to the cloud costs and require automated approaches to manage costs on a real-time basis. These have to be implemented both at the Kubernetes level and at cloud/infrastructure level. While platform engineering teams have used IAC tools to manage resource hygiene, increasingly policy-as-code is becoming the go-to option to address these use cases.

Conclusion

Implementing FinOps for Kubernetes environments requires careful planning for the way resources are made available, cost monitoring, and implementing proactive approaches to optimize cost management – leveraging Kubernetes-native capabilities for platform engineering needs.

While tools like Cloud Custodian can automate resource and cost management at cloud level, Kubernetes Policy management with Kyverno is a powerful solution to proactively manage cloud costs. Nirmata’s Policy Management solution provides FinOps capabilities to monitor and proactively optimize Kubernetes cost.

Policy Management with Nirmata

Nirmata offers Kubernetes governance with Policy Management as the key pillar. Our cloud native policy management solution, powered by Kyverno, facilitates the autonomy, agility, and alignment necessary for DevSecOps teams, by automating the creation, deployment, and lifecycle management of policy-based intelligent guardrails.

Nirmata delivers policy insights, reports, tamper-detection, alerts, and collaboration by integrating with external tools, processes, and workflows. Nirmata offers an Enterprise distribution of Kyverno and SaaS based Nirmata Policy Manager. A free trial for both products are available. Let us know what you think about Nirmata’s products.

If you are interested in scaling and automating Kubernetes using policies, check out our new eBook: The Ultimate Guide to Policy-based Governance, Security & Compliance for Kubernetes.

An in-depth look at Kubernetes security and compliance challenges and solutions
Can ChatGPT be used to write Kyverno policies?
No Comments

Sorry, the comment form is closed at this time.