After updating this crate to use the latest Rust allocator API, some tests would hang. Digging in with gdb revealed they were deadlocked, trying to lock a mutex they already held. Although I was 99% sure this was a bug in my code, jemalloc has had its fair share of deadlock bugs in the past so I held a glimmer of hope I could find something really interesting.

The tests didn't hang with the system allocator and valgrind saw nothing wrong. So I dug into the code and learned a lot about jemalloc's internals; especially I learned that there's a lot of tricky locking in the thread cache code, which surprises me since the point of the thread cache is to reduce synchronization overhead, but I guess I was looking at all the slow paths. I observed that modifying lg_tcache_max (the largest size-class kept in the thread cache) changed at what point it deadlocked, but not in a predictable way (making a table of values to progress, there was a general trend that the larger the classes allowed, the more progress it made, but not consistently).

One of the interesting things was how predictable the failure was. Thinking I might be doing something to corrupt jemalloc's structures badly enough to corrupt the lock, I scripted gdb to print the state of the lock at the points where it was locked and unlocked in the function where it deadlocks, which revealed that the locks looked fine in that function. Unfortunately, because everything was optimized out, it was hard to introspect the structures I thought might have been getting corrupted; I started building rust with a debug, assertion-laden jemalloc, but I knew it would take all night on my laptop.

Before I called it a night, I scripted gdb to print all the calls to mallocx , rallocx , and sdallocx , and their return values, and started putting together a sed script to transform this into a C program that made the same sequence of allocations and deallocations.

When I woke up the next morning, I realized there was something suspicious about the deallocations logged. I tested with the system allocator, but with various allocators LD_PRELOAD 'd, including the same version of jemalloc that Rust was using; none of these hung. So I asked myself, what's the difference?

Of course, Rust is using the sized deallocation API ( sdallocx ), and the system allocator will be going through malloc and free and not passing any sizes. Looking again at how my library was calling dealloc, I spotted the bug instantly; I was claiming some things I was freeing were smaller than they actually were. Changing this fixed the bug. I took a look through the paths that sdallocx would take if given smaller sizes, and it looks like if my build of Rust with jemalloc assertions enabled had finished compiling, it would have detected my mistake, but otherwise, supplying the wrong size would cause havoc in the thread cache later.

You might say that I should have known to look at these things first, but such is the nature of bugs.