While we often talk about the design, development and operations phases in an application’s lifecycle as if they were clearly defined, totally discreet events, that isn’t really true. Organizations that are successful with Day 2 operations don’t wait until the application has been deployed to start thinking about observability, upgrades, and governance policies.
In fact, companies that don’t consider what an application running on Kubernetes needs on Day 2 often find that the Kubernetes initiative doesn’t get past Day 1, or at least never expands organization-wide. Enterprises can’t afford to have applications in production only to find that upgrades cause downtime or that there’s not enough monitoring and logging to recover from an error.
The best way to ensure Day 2 operations are as smooth as possible is to consider them at the very beginning of any project. Here’s what to keep in mind to make sure Day 2 problems don’t threaten to derail the Kubernetes initiative.
Don’t underestimate Day 2
The most common misconception about Kubernetes Day 2 operations is a dramatic underestimation of how complex the Kubernetes environment is and what it will take to manage that complexity.
Organizations usually start out with a very small Kubernetes pilot project, perhaps with a single team and a single cluster. At this stage, a robust Day 2 Management solution might not be necessary and no one will have considered what needs to happen to manage critical processes for production. Even when the small pilot project is in the Day 2 phase of its life, it might be so small and/or non-mission-critical that it’s possible to manage manually or with non-purpose-built tools.
As organizations look to expand their Kubernetes pilots, the stakes increase dramatically. Complexity in Kubernetes increases exponentially as single applications scale, as additional applications and clusters are added and as organizations adopt hybrid cloud and multi-cloud approaches. Organizations who base their expectations about what’s needed for Day 2 operations on the needs of small, limited-scope initial applications are likely to see the entire Kubernetes project scrapped.
But what exactly does it mean to think about Day 2 at the design phase? Here are some ways that organizations can work Day 2 planning into the design.
Tooling and processes
Anticipating what the operational needs of an application will be as early as possible is the best way to bring in the right tools and design the right procedures to ensure that those operational needs will be met.
The application design process includes choosing the right tools, creating workflows, and establishing organizational guardrails around best practices. At the design phase, here are some of the questions you should be asking yourself about how you will handle operational tasks like monitoring, troubleshooting, remediation, and upgrades:
- Does your cloud provider have built-in capabilities to support this task?
- Do your Day 2 management tools need to be cloud and infrastructure agnostic?
- Will your VM-based tools be able to handle cloud-native workloads?
- Do you need an additional tool, either because your current solutions do not support Kubernetes workloads or because you need functionality that goes beyond what they provide?
- If your Kubernetes cluster is on-prem, do you currently have tooling that will provide sufficient support for Kubernetes’ cluster and workload operational tasks?
- What level of automation do you need?
- How will you manage complexity and skill gaps?
Answering these questions should point you to the tools and processes you need to have in place to make sure the Day 2 operations are successful.
Operationalizing Kubernetes becomes dramatically more challenging when every team is allowed infinite choices over configurations and tools. Yet in an enterprise setting with multiple teams spread over different locations and time zones, this is almost certain to happen unless a central team proactively develops and enforces guardrails around configurations and tool choice.
Consistency is key to smooth operations at scale, but requires that organizations think about tools and governance policies before the application is developed.
Ensure a tight feedback loop
Especially for companies at the beginning of the Kubernetes journey, the biggest challenge to designing for Day 2 is simply that they don’t know what they don’t know, and don’t have enough experience to anticipate the Day 2 needs. No matter how experienced an organization is with Kubernetes, there will always be something unexpected. The best way to counter this is to ensure a tight feedback loop as well as robust testing before deployment. This includes:
- Testing the upgrade process pre-deployment
- Testing the monitoring tools to determine if you have enough visibility
- Testing troubleshooting and automated recovery from typical failures
- Testing automated cluster and workload scaling
- Testing your alerting and notification systems
- Ensuring that the cluster can support the amount of containers you anticipate running in production.
It’s also important to do this testing dynamically, including having application developers using Kubernetes and providing feedback about how your set-up works. It’s possible that the application requires a different type of storage or different network rules than originally designed. As long as the feedback loop is kept short, it’s easy to incorporate those requirements into the guardrails the central team provides.
Keeping the feedback loop tight requires consistency, attention to process, and automation. An organization that handles too many tasks manually will always have trouble keeping the feedback loop short, because the manual effort will always add a time lag.
A centralized platform like Nirmata can help organizations tame Kubernetes’ complexity while also simplifying Day 2 tasks. Download the technical brief to see how it could help your organization plan for Day 2 Kubernetes from the very beginning.