Time goes so fast as an intern at Nirmata! This summer, we had the opportunity to work with two excellent interns from Berkeley: Aastha Upadhyay and Jacob Yim. Before they left, They wrote a blog post describing the work they achieved here at Nirmata. We have broken it into a blog series, highlighting all their amazing work!
Astha and Jacob, thank you so much for your contribution to Nirmata. Good luck at Berkeley!
Hello readers! In this post, we’ll share our journey as two computer science students at UC Berkeley who had the opportunity to intern at Nirmata this summer.
Over the past three months, we learned about distributed systems and microservice-style architecture under the guidance of our wonderful mentors. The goal of our internship was to work on replacing Zookeeper and Curator for the Nirmata platform. Zookeeper and Curator are used in three key areas by Nirmata: distributed locks, leader election, and distributed workflows. This article will detail our process for first understanding each of these topics through preparing demos, how we went about designing a solution, the pain points we addressed, testing our applications, and the next steps. We will also share our personal experiences while working at a startup for the first time, an experience that we recommend to all aspiring software engineers!
Although we had conceptual knowledge of distributed locks and leader election, the best way to fully grasp these topics and how it is used in the Nirmata platform was to approach the code in a hands-on manner by preparing a demo for each topic. Java is a comfortable programming language for both of us, which was a great starting point for this exercise; however, the real learning started with familiarizing ourselves with Kubernetes, Docker, and how distributed systems work under the hood. This was a truly fascinating process, as we could clearly understand how vital distributed systems are to a lot of different services we use every day!
We wrote two Java applications, modeled after existing code in the Nirmata Bitbucket, and Dockerized their contents in order to deploy the apps in a sample cluster. The first application repeatedly acquired and released Curator-based distributed locks on multiple pods and verified that each pod was acquiring locks at a similar rate. The second was a leader election app that ensured that only one instance at a time is successfully marked as a leader to execute a given task.
At last, we were ready to move on to the next task of replacing the existing Zookeeper and Curator implementations of leader election and distributed locks. I (Jacob) would work on investigating distributed locks using MongoDB and Aastha would look into using Fabric8 for leader election—both with the intention of replacing the underlying Zookeeper/Curator implementations.
Here is a summary of my (Jacob) project, enjoy!
Distributed Locks Using MongoDB
When planning for the replacement of our current Curator-based distributed locks, we decided to run with MongoDB as our backing database. MongoDB, which is capable of handling requests at sub-second frequencies, meets our performance requirements, with the added benefit of already being used by several Nirmata services. A MongoDB-based distributed lock is acquired by creating a document within a database for locks. Other processes attempting to acquire a lock on the same resource will refer to the same document, which indicates that the resource is already locked.
I began by researching existing open-source Java implementations of MongoDB-based distributed locks. I managed to find several, and assessed them for ease of use and performance, running tests to measure the speed of locking and unlocking operations. I also looked for features like reentrancy, document cleanup, and blocking behavior. Ultimately, I found Coditory’s Sherlock distributed lock library to be the easiest to use and among the fastest, acquiring and releasing locks in milliseconds (about three times faster than with Curator). The Sherlock library also included necessary reentrant locks and document cleanup. However, these distributed locks lacked the same blocking behavior as the Curator locks: when two processes try to acquire the same lock, one process should wait until the other process releases the lock. Instead, Sherlock fails to acquire unavailable locks without waiting. To add this blocking behavior, I wrote a distributed lock implementation, MongoLock, using Sherlock. MongoLocks are created using a MongoLockService. When a MongoLock is acquired, it attempts to acquire a Sherlock on a set retry interval, with a specified timeout duration. This effectively gives MongoLock the same behavior as a Curator lock.
With MongoLock completed, I was ready to begin testing. I started off by writing unit tests at the most basic lock level, then at the transaction level, where transactions are used to perform changes across MongoDB documents. I tested that transactions could occur concurrently, where performing two transactions on the same object would cause one transaction to wait until the other completes. Finally, I tested replacing the Curator locks with my MongoLock in the Nirmata catalog service. This involved initializing a MongoClient and injecting my MongoLockService in place of CuratorLockService, as well as updating the MongoDB API to be compatible with the newer version used in Sherlock. After making these changes, I could verify that the catalog service was running, locks were being acquired and released, and MongoDB lock documents were created and deleted, demonstrating that this MongoDB implementation works as a replacement for Curator.