The second hardest bug I have ever encountered, an essay on C: The Portable Assembly

C is a special language. I’ve always understood that but I used to think it was in reference to C’s history. Reading tutorials as an intrepid learner I kept coming across references to C as the syntactical parent to X language’s syntax. What I did not know is that C is fundamentally different from all other modern languages.

C is a portable assembly, to borrow a common description. I’ve heard people refute the description, citing C’s expressive power and structured coding. I am forced to agree with their argument but not their conclusion. C does indeed look nothing like assembly  yet that difference is only on the surface, and the surface is easy to scratch through. C abstracts away the triviality of assembly but leaves the programmer exposed to the intricacies.

For generic programming C++ has templates, Java has Generics, yet C only has void pointers. Void pointers are a curtain beyond which the compiler cannot perform any type checking. For libjtapi I needed a generic dictionary data structure, which meant I needed void pointers. Thus over the weekend I found myself programming in C but debugging like I was in assembly.

The bug itself is less significant that what it took to debug. A story which might illustrate why C is like no other modern language.

Of course the bug was my fault. There is a saying that poor programmers doubt their tools, something I have seen before. Despite knowing this truth I was almost ready to test my code with a different compiler thinking GCC at fault.

If I ignored the bug it disappeared, that is not how bugs are supposed to work. A Heisenbug.

The bug was sneaky, it hid itself. If I were to comment out the error case that checked for it then of course libjtapi would never complain, that is an obvious result. The logical result of doing so would be unreported data loss. That is how bugs work. Yet if I did comment out the check everything worked as intended and no data was lost. Line for line the input was the same as the output, just as it should be.

A next logical step would be tracing print statements showing what was occurring. In modern languages you can even dump complete data structures, in C you have to settle for printing primitives like integers. Yet even a single print statement would throw the code into a state of data loss. Data that was supposed to be in a dictionary was being reported non-existent between two layers of abstraction.

I needed to find the bug’s on/off switch. Knowing what caused  the bug means I would know the bug’s type. It means I would know where to look.

My next step was to add intention time delay. A sleep(1), which pauses for one second, in the right spot would cause breakage in a different dictionary, progress. A sleep(0.001) call would bring us back to the land of functionality,  more progress. I should make a side note here, this was not the next logical step of debugging. Libjtapi is not multi-threaded and memory writes are synchronous, thus it was impossible that time might cause the bug.Yet my brain getting ever more tired was not willing to accept this.

In Calgary darkness is warded off by city lights

Programming at night is often a bad idea, double so for debugging. By now it was getting late and this bug was becoming mythical. The simple act of checking made it leap into existence and voodoo acts of timing would ward it off. A sense of dread took form, I knew this type of bug from assembly, my bug was somehow related to the intricacies of hardware. No amount of stackoverflow is doing to solve a fundamental misunderstanding of how my hardware works.

In the morning I took a different approach, maybe some variable were not getting initialized. Initializing one variable appeared to yield progress, everything worked. By know I was getting wise to the bug’s methods. I proceeded to add explicit initialization to other variables. The code responded in turn to switch between breaking in two places and not breaking anywhere.

This was real progress. My changes amounted to consuming room in lower memory space, expanding my binary. Which pushed all memory addresses, and thus pointers, up in memory. Thus the bug was somehow related to memory alignment. Memory alignment is a hardware restriction, that datatypes can only be read from an address which is divisible by N, where N is based on the datatype’s size. Some void pointer was pointing to an address illegitimate for the datatype.

Libjtapi keeps metadata in arrays prior to dynamic caching

The question thus became which pointer? Before I could answer that I needed to identify the structure into which said pointer pointed.

With the intention of narrowing the candidates I added global char variables to pad the binary. This let me cycle though the bug’s three states. The idea being to find some number with which to identify the structure’s size, and it worked. I got 3. But before I could put that into use I got the best hint possible, libjtapi started to segfault. The illegitimate pointer was getting nudged just outside the boundary of libjtapi’s mapped memory. With this giant arrow I squashed the bug with ease.

You can see the current fixed code at the top of this post. C is not at fault for the bug, I am. Yet it is not to C’s credit that I lost a day of productivity. Sure an experienced C programmer could have avoided the bug. They would have noticed the invalid void cast that caused it. Yet so would an experienced assembly programer.

5 thoughts on “The second hardest bug I have ever encountered, an essay on C: The Portable Assembly

    1. danieru Post author

      I should also mention that map is a global array in the data segment (not heap or stack).

  1. anonymous

    Kid, you’ve a lot to learn still. I wandered here from a comment of yours in HN. This entire thing was jibberish.

    PS: symbols beginning with underscores are reserved in C. The compiler won’t complain but the language spec says:

    [From C99 7.1.3]
    All identifiers that begin with an underscore and either an uppercase letter or another underscore are always reserved for any use.
    All identifiers that begin with an underscore are always reserved for use as identifiers with file scope in both the ordinary and tag name spaces.

    1. danieru Post author

      Do you have any idea what it caused the bug? My best guess is still an alignment issue since I was doing odd things with casting.


Leave a Reply

Your email address will not be published. Required fields are marked *