Sometimes I think our industry is pretty unique for having constant inflow of people reinventing the square wheels all the time. Or, to be a bit more specific, some new fad appears and newcomers flock in just learning this new fad, without bothering to learn basics or fundamentals, or even to figure out when and where this fad is applicable. The most recent and egregious example of that is, of course, vibe coding subreddit which has a ton of people who claim to not know anything about programming, but they are very proficient in vibe coding! In most cases it goes as well as you’d expect, with vibes of a pickup-focused FIDO conference from 1990-s, in which men who don’t know how to not scare a woman away are teaching other men how to not scare a woman away.
That was a clickbait, now that I have your attention, let’s do some boring stuff. Specifically let’s try to figure out how to build reliable distributed systems, using zero AI! You might have noticed how it’s now the norm in our world that systems just go bad, and you need to reboot or reset them, and nobody knows how to fix stuff? Right.
Before we start I highly suggest reading just one book: Philosophy of software design by John Ousterhout.
The need for modularization
As it appears, there’s only one way to write software which you can understand and modify - to make it modular. It’s true because Ousterhout says so! Why are you not believing me now???
Consider this simple mathematical expression:
|
|
This is what you’d write in most of the modern programming languages, and then it’ll be the job of the programming language to figure out which are the correct sets of instructions for the given type of x, on which CPU hardware, etc. This is the simplest to express (but not simplest internally) form of modularization and it is also the simplest example to show that it is the only way to make programs maintainable by humans. If we don’t have this, then I’d need to write all these different types of divide by hand, and then how I’m supposed to change it or even understand that I’m dividing? I’d had to write a comment somewhere saying what this mess of assembly is doing.
Same goes to the higher forms of modularization as well - which is everything upwards of functions:
|
|
In this example we let CopyFile figure out how to handle different URLs - and it’ll probably do it by subclassing URL handler and using interfaces etc.
So it is obvious that we need modularization to make it possible to write things with adequate velocity. We also need correct modularization which hides unnecessary stuff and exposes required stuff, and you can read all about it in a book referenced above. But that is just for interfacing between human and machine.
What if I told you we need modularization for machines as well?
Before we do so, let’s do a refresher on how modularization basically works (on modern hardware, with compiled languages, most of the time).
Local function calls
Modern CPUs, when you look at them from the “consumer” point of view, eat code instructions from memory and execute them. The instructions could be primitive (data copying, math) after which the execution pointer (usually a CPU register) will move to point at the next instruction in line. Or it could be control flow instructions which will modify execution pointer (JUMP or B code), sometimes they can be combined or used in conjunction with conditionals (for example, JNZ - jump if not zero). And, because the only way to do modules is to do function calls, all modern CPUs have an equivalent of a CALL instruction, which saves the address of a (normally) next instruction somewhere and then jumps into destination, so then destination at some point can execute RET and return the control. Now if you don’t know what stack buffer overrun yet, go read about it but please return.
So with all that in mind, in the example with x/2 when, for example, CPU doesn’t support the floating point operations the compiler could have a function fp_div somewhere in standard library and then call it. Or, if x is of a custom type and we’re using c++, the compiler could call our operator/.
But we’re not done yet with this control flow. What is happening now is blocking call - the execution of a linear, imperative program is blocked until the control is returned. In provided example with c++ and division our execution core is a CPU core and the contents of the function are executed on the same core. But it doesn’t have to be - other function can launch calculations on other CPU cores, or sleep for 1 second - from the point of view of the function caller we are blocked until the function returns.
Why does it have to be blocking? Actually it doesn’t, we just defined it poorly. Imagine this code fragment:
|
|
In this example, we don’t need value1 until we start computing value3. We can write a framework or a language which would automatically handle such cases and spawn calculations in parallel in background. But our primitive imperative languages can’t do it by themselves, and even if we have such a framework we need to know what we’re doing.
So, to summarize:
- The only thing we need for making software is ability to define functions and call them.
- Most primitive way to do that is implemented in hardware on most modern CPUs.
- Function calls in imperative program are blocking the execution of a said program, until they return with a result.
WTF is an API
As we learned, it’s only with modular systems and abstractions we can build something great, because we’ll be building it standing on the shoulders of
Strictly speaking, even the bare CPU with no operating system or library support has an API. It consists of the CPU instruction set which you can use, possibly, to write more complex functions, which in turn will be used to write more complex functions. The SOCs we get today usually come with an SDK which includes at least a compiler and a library to control various embedded things in that CPU, without resorting to assembly (functions like “go to deep sleep” or “initialize wifi” for esp32). But, on top of that, the operating system will offer you APIs to read/write files, do things with network, write things on the screen, etc. And your language will have a standard library which will usually plug into these OS APIs. It’s APIs all the way down and it’s pretty neat, until it doesn’t. And when it doesn’t it means we have an error somewhere which we need to handle. And if the error happens during long-running operation, we need to make sure our long-running operations are designed and implemented correctly. And that’s what we are going to cover next - errors and long-running operations.