DevOps has been a movement across the tech world and started playing a crucial role in the Business success and sustainable growth. At ShieldSquare, we evangelize the DevOps philosophy as we unanimously believe that DevOps is one of the strongest pillars of our product. This post is just a glimpse of a small success story that briefs on how did we strategize in tackling the burgeoning data growth in our organization.
As our business is growing in multiple folds, the data we handle also started inundating by giving rise to a challenge of scalability. Our power-packed DevOps team geared up in exploring various options to meet this challenge. From the technical perspective, our major pain point was scaling different version of modules on demand and managing blue-green deployment. In our traditional model of scaling, we normally tend to upgrade the infra capacity based on binaries we wanted to deploy on servers (VMs) which eventually put us in the difficult situation when it comes to handling traffic spikes as we need to prepare servers to run different modules. On the other hand, one advantage is that we purely run on decoupled services, each application service do the transformation on data and transformed data is consumed by next service layer. We wanted to leverage this advantage and were looking for the technology that fits our requirement. In other ways, we were in pursuit of moving our infra-management in auto-pilot mode.
Eventually, we have shortlisted the popular project managed by Google Cloud Platform (GCP) for orchestration management, Kubernetes to address our growing challenge after exploring all the possible options. With vibes on our nerves, the team has started the POC in kubes and in just two weeks the staging environment was ready to visualize the power of this orchestration framework. Following are the major concepts involved in Kubernetes that convinced us from the POC that this would be the best suitable option for our environment migration.
Microservices aims to tackle the problems of managing modern application by decoupling software solutions into smaller functional services that are expected to fail. This helps in quick recovery from failure on smaller functional units and also making your release cycle faster. More on Microservices here.
Containers gave a perfect alternative to our virtual machines based system and drove software packaging methods in more developer friendly way. Docker container is comparatively very smaller than virtual machines (VMs). Docker containers can run on a developer’s local laptop, on physical or virtual machines in a data center, on cloud providers, or in a mixture of environments. More on Containers here.
Kubernetes is an open-source platform for automating deployment, scaling, and operations of application containers across clusters of hosts, providing container-centric infrastructure. Defining correct resources in Kubernetes is crucial. More on Kubernetes here.
The Game is ON
Successful pre-production/staging setup fueled our confidence to move ahead in migrating our current environment towards the direction of Kubes. We started with Google Container Engine (GCE) to get things working quickly. We started with a cluster with few X no of nodes, each Node with a specific configuration in default pool to run stateless components. Each container has its own requirements for resources (ie, CPU, RAM, disk, network etc), there comes requests & limits in kubes. This helps in keeping nodes healthy. Many times, due to bad limits or not defining limits on resources, your pods could go crazy at utilization, eat any resources which lead to node starvation and node goes in [Not Ready/Unresponsive] state due to resource exhaustion. We had this multiple times at early stage and now we had fine-tuned each pods resources based on its hunger behaviour.
How do we monitor Kubes?
We have our custom monitoring setup to keep an eye on Nodes. We run a heapster responsible for compute resource usage analysis and monitoring of container clusters, hooked with influxdb that consumes reporting done by heapster and we visualize graphs in grafana.
Um! What about logs?
Right, logs are undoubtedly very important. We have ELK (Elasticsearch, Logstash, Kibana) to consume logs reported by containers running in cluster. Containers is running with log4j configured to push logs through logstash to Elasticsearch.
Data stores and Kubernetes
Kubernetes is more about running stateless containers. At ShieldSquare, we mostly use kubernetes to run tools that don’t need persistency ie, caching, temporary data. It’s not ideal for stateful components like, MySQL or MongoDB or any other engine that extensively work with data at rest. So, we are not running our production data stores inside Kubernetes. However, Kubernetes have support for stateful containers management too but still lot of work is being pushed by kubernauts to make it more stable. The important learning is that you don’t have to run everything in Kubernetes and only orchestrate applications which are stateless to keep it fail-safe.
Note: Some configuration in GCE should be taken care, like auto-upgrade kubernetes version. If you are running RabbitMQ, Redis or any other message queue as service that needs uptime, better you turn off auto-upgrade because kubernetes new version release will schedule all your node for maintenance, however it rolls back the updates one by one but could affect your production system. Else, if you are fully stateless, you can keep the default or skip this warning!.
Pretty much all above understanding are based on what we learned in last six months of kubernetes running in production. Looking at our deployments today, Kubernetes is absolutely fantastic in Auto-pilot and doing self-healing jobs for us. We are running more than thousands of pods in the cluster together and currently managed to process multiple Billions of API calls per month and are pushing more to handle. After this exercise, our complete production environment’s stability has significantly improved with the least possible manual intervention.
Kubernetes lifted a lot of server management and helped in faster deployments and system scalability. Adaptability is much quicker, most of the security-related concerns are being managed by GCP. Kubernetes aims to offer a better orchestration management system on top of the clustered infrastructure. Development on Kubernetes has been happening at rapid speed, and the community of Kubernetes has grown bigger. Why don’t you join this momentum?
At ShieldSquare, we always try to be on top of all the latest tech trends to quench our DevOps revolution thirst. The accolades from the peers & the management have driven us to start evangelizing our learning to the Global DevOps ecosystem. This post is just a start. Many of our learnings and successful experiments will be worded in this thread.