Block 5181126 and the Rainbow Road
How a team of pseudonymous operators stifled chaos on the Secret Network.
Operating a decentralized network involves some level of change mismanagement peril. No infrastructure project has their software delivery lifecycle perfected, which can lead to unexpected breaking changes, coordination issues, or conflicts simply slipping through the cracks—all of which can cause serious network outages. Outages of this sort should create chaos, because no centralized authority bears the onus of deciding what needs to happen and who should do it. But amazingly, chaos often doesn’t ensue.
Not when the operator community springs into action.
Blockchain operators represent diverse private interests, but their commitment to the public network is undeniable. On a decentralized network, it takes many deeply committed, independent actors to resolve critical issues when they occur. A recent incident on Cosmos’ Secret Network illustrates how powerful that commitment can be. I wanted to take the opportunity to briefly describe the chain halt that occurred on Secret Network on September 12th and share the idea it crystalized for me: the strength of the community, and the power community members have to directly impact the ability of the network to recover.
So, what happened? On May 25th, version 1.3.1 of the Secret Network’s node software was released. It was highlighted as recommended—but not mandatory—to remain online and functional as a validator: “Note: Upgrading is recommended to all node types (especially validators) due to the new mempool features.”
The software changes did however contain a breaking change that when manifest caused the split in consensus. Because the status was recommended, not required, a portion of the validators within the active set didn’t upgrade. Everyone has different pressures and different levels of attentiveness to such non-mandatory upgrades, so this is to be expected to some degree. At block height 5181126, a divergence in consensus on the correct next block hash occurred within the validator set. Those on v1.3.0 thought one thing, and those on v1.3.1 thought something different. As such, things couldn’t move forward—as 67% of the stake in the network couldn’t agree.
This is when the magic happened. Springing into action over a dedicated Telegram channel, the active community members aligned on a strategy to ensure enough validators upgraded to v1.3.1, pushing it over the critical 67% threshold of nodes running the latest version of the software. And, in this Telegram channel, everyone is anonymous. No one knows the identities of their fellow operators.
There’s the fella from Saturn, the one with the Sauce, the Father of Nodes, the Orange psychedelic guy, the Ghost of block halts past, the Brit with the gradient red logo, the purple bee… the list goes on! Within minutes of a chain halt, the usual crowd is on Telegram helping to mitigate the sinking feeling we have when five high alerts simultaneously pop up on our phones, reminding us that our infrastructure is flat-lining. The mission is unspoken but universally understood. Operators know we are in it together.
By the time an additional 5-10 validators come online, Cashmaney, the SLabs team, and Jacob have working theories and patches to test. Energy and momentum builds, as more and more operators begin to offer resources for testing. Finally, the recovery is visibly moving. Our group chat is a frenzy of activity, with jokes plopped in between nuggets of technical gold. Two or three hours from the halt, a sequence of recovery steps is pinned to the channel for stragglers. Operators begin to run the upgrade, nearly pushing the network over the consensus threshold. Then, at last…
HERE COMES LUIGIIIII—with the final upgrade.
And just like that, the network is moving again, and red fades to orange, then to green. The dashboards smoothly say:
“William—relax.”
For many of us in infrastructure, this scene is all too familiar. I’ve been in crazy disaster scenarios, from foreign exchange outages to bing-bang data center migrations, and beyond. The one thing that gets you through those times is the team of people you work with to solve problems. Amid the silence and uncertainty of systems that are offline or broken, there’s joking and camaraderie. And on decentralized networks, our comrades are Telegram characters—legendary developers who we haven’t even met. The network only recovers when we agree to own it together.
So, what’s the point of all this? The point is the people—the human beings working passionately to make a network work. That’s the thing of beauty, a thing which companies try to build for decades but, in crypto, forms organically without any company at all. Perhaps this is an easily overlooked emergent outcome of decentralization. Good, smart people band together that otherwise wouldn’t, and work together to solve problems.
They are the online validator community, the foundation on which this new paradigm is built. Without them, none of it is possible.
This should not be construed as investment advice. Read our full disclosures here.