the blog of napsy

A Story About a 24/7 Backend Platform

When I started my career at Visionect back in 2008, I was 20, first time in a (relative) big city and was visiting first year of the computer science faculty. As my first assignment I was to create a rendering backend that generated image snapshots from web pages. There were two major decisions to make: what web engine to use and what programming language to program in?

I’ve looked into Mozilla Gecko engine and it didn’t like what I saw: the engine was entangled with the UI and there was no clear way how could one integrate the engine inside a separate app. The second option was WebKit, two years after the initial release but it looked promising, the source tree was very clearly organized and the whole library usage was extremely simple.

Which programming language to use was almost implicit: since I work inside the Linux domain, C seemed the best fit. There were slight tendencies to use C++ but later didn’t regret the final decision.

First Problems

After the initial prototype was done, I began to implement the whole infrastructure, which means multiple concurrent services talking to each other. I’m sure that all C programmers would agree this is indeed a tedious task to implement correctly and with minimal bugs. After the initial releases, stability problems showed up. The services would randomly crash, take up all system memory or just stopped working and hang.

Tools To The Rescue

I can’t count how many times certain tools spared me hours of desperate debugging, trying to find weird synchronization bugs and memory leaks that appear in certain corner cases. Most of the time, when problems showed up, I would fire up a gdb debugging session, analyze the code, create breakpoints and follow the program execution. But this is C land with all its glorious memory management issues, dangling pointers and thread synchronization mechanisms. I would like to take this opportunity and praise valgrind for saving me all the time and how I appreciate the existence of this piece of software.

Tools are great but if development is done wrong, no software on the world will save you from your ultimate fate: code is tightly coupled, logical organization is all wrong, missing abstractions, no unit tests. In short, the project becomes unmaintainable and comes to a point where half of your work becomes fixing old bugs and refactor code.

This is exactly what happened to me and I blame two things: missing experiences in software development (remember, I was in my early 20s) and the programming langage was really never designed to write highly multi-threaded and distributed software.

Maintaining Broken Software

I was in hell that I created for myself: the software was in production and I alone had to maintain the existing code and implement new features, spending half the time removing bugs and finding out weird corner cases where things would just lock up. Implementing new features was especially hard since I had to refactor many pieces of code to fit in new ideas.

After many bug-fix releases, the whole code was really stable, relative bug-free and ran 24-hours a day without any outstanding memory leaks. But I couldn’t get rid of the feeling the code fundations were just a brick away from falling apart.

A New Hope

In late 2008, Google announced a new open-source programmign language, named Go and designed by Robert Griesemer, Rob Pike and Ken Thompson. For those that don’t know the mentioned names, I suggest googling them as they are “rock stars” of programming with a deep understaning about software engineering and one can learn a lot from these folks.

The language quickly came to my attention as the main features were garbage-collected memory management, native support for concurrent programming and a rich standard library. After reading the language specs it seemed the language could be learned within a week. After playing with Go on my spare time, I decided to try to rewrite one of the platform services from C to Go.

Rewriting The Software And The Brain

One of the services implements a networking protocol that communicates with hardware over TCP/IP. After two weeks working on the code, the prototype just seemed to work and I never even thought about using a debugger at all. Two things struck me at that moment:

Getting used to test-driven developement, as the language advocates, took some time since this is also a matter of discipline and it really helped to recognise patterns in code that pointed out how the general design of the architecture should look like.

Design Decisions

In later years, idioms how a programmer should write code in Go started to appear on the internets and interfaces really started to get the attention they deserve. They are simple, implicit and in contrast to other languages that support interfacing, are really small when defined.

The best example of good interface design is the “io” package in the standard library. There’s a small subset of interface definitions that clearly define behaviour:

Both interfaces together define only three methods: Read() in ‘Reader’ and Write() and Close() in ‘WriteCloser’

This is an important observation that taught me to separate object behaviour into as many interfaces as possible. Later when writing unit tests, object mocking becomes much simpler if we test a method that read from an IO can accept an arbitrary object that only has to implement the Reader interface.

I took this knowledge back to the platform as there’s a service that receives responses from several possible sources. Knowning all responses must go to a target:

func ResponseReceiver(r Response) { ...

The ‘Response’ interface is really small:

type Response interface {
    Target() string
    Payload() []byte

The response receiver now doesn’t care what was the source of the response as long it can get the binary payload of the reponse and knows to whom to send.

Is There A Happy End?

In the end, a decision was made to rewrite every component of the platform from C to Go and I don’t regret that step for a second.

Time spent debugging the software decreased significantly, I would say I spent about 20% to 30% debugging and bug-fixing and the rest of the time implementing new features. While memory leaks are still possible but they appear rarely, deadlocks or race conditions can be avoided using buit-in mechanisms in the language (but not always). And if they appear, the language tooling is well prepared to debug such software.

Also there are many other sugars that come along with switching to Go, counting automatic documentation generation, profiling tools, etc.

It’s been 3 years since the platform is entirely in Go and it runs happily on cloud infrastructure round the day.