Optimizing Kubernetes Security with Kyverno: A Deep Dive into Robinhood’s Implementation

Optimizing Kubernetes Security with Kyverno: A Deep Dive into Robinhood’s Implementation

Screenshot 2024 12 02 at 9.06.35 PM

In the ever-evolving world of Kubernetes, ensuring security and compliance at scale is a critical concern for many organizations. This is especially true for companies like Robinhood, where strong policy enforcement is essential due to strict security and compliance requirements. In a recent KubeCon presentation, Robinhood’s engineering team shared their journey of integrating Kyverno into their Kubernetes environment, explaining their decision-making process, migration from legacy systems, and how they tackled operational challenges along the way.

The Challenges Before Kyverno

Before adopting Kyverno, Robinhood relied on Kubernetes’ native Pod Security Policies (PSPs) and RBAC (Role-Based Access Control) to manage security and enforce policies across their clusters. While these tools provided a solid foundation, they had several limitations:

  1. Deprecation of Pod Security Policies: As Kubernetes deprecated PSPs in recent versions, Robinhood faced the challenge of transitioning to a new solution.
  2. In-house Admission Server: Robinhood had developed a custom admission server to enforce more granular security policies, but this solution was difficult for other teams to adopt and maintain. Many infrastructure teams found it hard to contribute due to the complexity of the system, especially when it came to understanding how informers and other components worked.

Why Kyverno?

The Robinhood team evaluated several options for replacing their custom admission server and Pod Security Policies, with two leading contenders being Opa (Open Policy Agent) and Kyverno.

  • Opa: While Opa met all of Robinhood’s security requirements, it used the Rego policy language, which was unfamiliar to their engineers. The team felt that the learning curve for Rego would be steep compared to their existing Go-based code.
  • Kyverno: Kyverno stood out for its flexibility, maturity, and ease of use. The policy language is written in YAML, which was already familiar to Robinhood’s engineers, making it much easier to adopt. Furthermore, Kyverno provided a robust auditing tool to check existing resources and policies before enforcing them.

The Migration to Kyverno

The migration process was methodical and phased to ensure that policies were correctly enforced without disrupting operations. The team followed these key steps:

  1. Evaluation & Testing: Robinhood’s engineering team began by evaluating Kyverno and testing it in an audit mode. This allowed them to run policies without actually enforcing them, ensuring they wouldn’t accidentally break existing workloads.
  2. Policy Migration: Once the team was confident in Kyverno’s ability to meet their needs, they migrated their existing PSPs into Kyverno policies, continuously validating them with integration tests.
  3. Decommissioning Legacy Systems: After successfully migrating to Kyverno, the team began to phase out their in-house admission server and decommission Pod Security Policies, making Kyverno the primary policy management tool for their clusters.

Key Benefits of Kyverno

Kyverno offered several advantages over Robinhood’s previous approach:

  • Granular Policy Enforcement: Kyverno enabled fine-grained control over Kubernetes resources that was not achievable with native Pod Security Admission.
  • Easier Policy Writing: Kyverno’s policy language is written in YAML, a format familiar to many Kubernetes engineers, making it easier to write and manage policies.
  • Comprehensive Testing Framework: Kyverno comes with a robust testing framework, including a CLI tool for unit tests and integration testing to ensure that policies are properly enforced.
  • Active Maintenance: Kyverno is actively maintained, with responsive support from its community, ensuring that Robinhood could rely on a constantly improving tool.

Break Glass Scenarios and Testing

As with any system, there were scenarios where Robinhood needed to ensure they could quickly resolve issues. To address this, they implemented break glass scenarios—procedures for quickly disabling policies or reverting configurations if something went wrong, ensuring that they could keep operations running smoothly even in emergencies.

They also used a combination of unit tests and end-to-end tests to validate their policies before and after deployment, ensuring that policies would not inadvertently break anything.

A Look Ahead: Kyverno’s Reporting System

One of the exciting new developments with Kyverno is the introduction of the Kyverno Report Server, which addresses the scalability challenges of managing reports at large scale. In Kubernetes environments with a large number of resources, generating and storing admission reports can quickly overwhelm the storage system, leading to performance degradation.

The new report server moves report storage out of ETCD, which is the default Kubernetes storage backend, and allows users to store reports in external databases like PostgreSQL. This prevents excessive strain on ETCD, which could otherwise impact cluster performance. This shift not only improves performance but also allows users to query reports using SQL, offering more flexibility and ease of use.

Conclusion

Kyverno has proven to be a valuable tool for Robinhood, enabling better policy enforcement, simplifying policy writing, and improving scalability. The introduction of the report server further enhances its capabilities by addressing the challenges of managing large-scale clusters. With its active maintenance, ease of use, and robust features, Kyverno is an excellent choice for teams looking to improve their Kubernetes security posture while minimizing the operational burden.

Robinhood’s experience demonstrates the importance of choosing the right tools to scale security practices effectively, and Kyverno has proven itself as a reliable solution in their journey to secure their Kubernetes clusters.

To learn how to automate security and operations, check out Nirmata Control Hub today. 

Securing Platform Engineering with Multi-Tenancy and Micro-Segmentation
5 Essential Policies to Implement in Your Kubernetes Cluster with Kyverno
No Comments

Sorry, the comment form is closed at this time.