Deadlocks in Practice: Don’t Hold Locks While Notifying

posted by Craig Gidney on December 22, 2013

In this post: a subtle way that reactive/event-based code can deadlock, how it affects ReactiveCocoa (obj-c) and Reactive Extensions (.Net), and ways to work around the problem.

Realization

Today I noticed that ReactiveCocoa, the Objective-C port of .Net’s Rx (Reactive Extensions), implements different semantics for its BehaviorSubject.

This would be moderately interesting, except the change in semantics introduces a bug. Even more interestingly, there’s a different bug in the original Rx implementation. Then, to top it off, the bugs might actually be intentional design decisions!

But before I can talk about all that I need to explain what the heck a “BehaviorSubject” is.

Behavior Subject

A BehaviorSubject is an observable value. I have no idea why you would ever call an observable value a behavior subject (it’s from the observer pattern), but we’re stuck with the name now.

Behavior subjects must support four operations: updating a value, sealing the value as ‘complete’, sealing the value with a failure, and subscribing to be told about those things. Note that the operations must be thread-safe. In Rx the operations are called OnNext, OnCompleted, OnError, and Subscribe. In ReactiveCocoa they’re more appropriately called sendNext, sendCompleted, sendError, and subscribe.

When you subscribe an IObserver/RACSubscriber to a behavior subject, the observer/subscriber’s OnNext method is immediately invoked with the current value stored in the behavior subject. Additionally, whenever the value changes in the future, the OnNext method will be called again with the new value. Completion and failure are also forwarded appropriately. This allows subscribers to track the latest value in a behavior subject and react to it.

Clear? Alright, let’s talk about syncing values.

Syncing and Deadlock

Suppose you’re trying to keep two values synchronized. For example, maybe you want to display a up-to-date value while also allowing the user to edit it (so you’re syncing a displayed value and a back-end value).

BehaviorSubject makes this almost easy, but there are subtleties to it. You might start with something like this:

var observableValue1 = new BehaviorSubject<int>(value: 0);
var observableValue2 = new BehaviorSubject<int>(value: 0);

// keep values synced (when one is changed, update the other)
// HORRIBLY WRONG
observableValue1.Subscribe(observableValue2.OnNext);
observableValue2.Subscribe(observableValue1.OnNext);

A reasonable first attempt, but as soon as you run the program you’ll find that any update triggers another update and that update triggers another and so on forever in a loop until the stack overflows and the program crashes. You need to prevent the updates from cycling back somehow.

In a real distributed system you’d solve this with timestamps or vector clocks or whatever, but we’re going to hack around the problem by only sending updates when the value changes. There’s already a method that does exactly that: DistinctUntilChanged (I’d have called it WhenDifferent, but I digress). Let’s throw that in there:

var observableValue1 = new BehaviorSubject<int>(value: 0);
var observableValue2 = new BehaviorSubject<int>(value: 0);

// keep values synced (when one is changed, update the other)
// ~~assumes single thread~~
observableValue1.DistinctUntilChanged().Subscribe(observableValue2.OnNext);
observableValue2.DistinctUntilChanged().Subscribe(observableValue1.OnNext);

The above will work, but only for a single thread. Once multiple threads enter the picture all bets are off.

First, there’s the issue we’re not worried about. If the stars align and two threads’ updates bounce back and forth just right they can keep undoing the other thread’s work. The result is a live-lock, at least until the stack(s) explode(s).

Second, and more importantly, DistinctUntilChanged isn’t guaranteed to be thread safe. I used to think it was, but the Rx source code doesn’t lie (I’m not sure whether or not the ReactiveCocoa implementation is thread safe). If we want our data syncing code to not do weird things when multiple threads are involved, we need to synchronize the notifications being given to DistinctUntlChanged.

Naturally there’s already a method to to synchronize notifications so they don’t overlap: Synchronize. It just grabs a lock before sending any notifications and releases it afterwards. Let’s use it:

var observableValue1 = new BehaviorSubject<int>(value: 0);
var observableValue2 = new BehaviorSubject<int>(value: 0);

// keep values synced (when one is changed, update the other)
// SUBTLY WRONG
observableValue1.Synchronize().DistinctUntilChanged().Subscribe(observableValue2.OnNext);
observableValue2.Synchronize().DistinctUntilChanged().Subscribe(observableValue1.OnNext);

Just to be sure that’s alright, let’s test it out. We’ll need a way to spam out values:

public static IObservable<int> SpamCountForever() {
    return Observable.Create<int>(observer => {
        NewThreadScheduler.Default.Schedule(() => {
            var counter = 0;
            while (true) {
                observer.OnNext(counter++);
            }
        });

        return Disposable.Create(() => {
            // Huh?
            // You want it to stop?
            // What about 'forever' did you not GET?
        });
    });
}

and then we spam into both observable values:

SpamCountForever().Subscribe(observableValue1.OnNext);
SpamCountForever().Subscribe(observableValue2.OnNext);

I’m not sure if this even counts as syncing data anymore, since the two threads spamming values are basically fighting over what the value should be. Regardless, if you try to run this code you’ll find it has a serious bug: it deadlocks.

The two observables are being driven by separate threads. Thread 1 is pushing new values into observableValue1, which triggers a notification that gets forwarded through Synchronize and DistinctUntilChanged to update observableValue2. Simultaneously, Thread 2 is pushing values into observableValue2, which triggers notifications that get forwarded through a different invocation of Synchronize and DistinctUntilChanged to update observableValue1. Now, due to passing through Synchronize, Thread 1 is currently holding a lock. The same is true of Thread 2, with a separate lock. And they’re about to update each others’ values and trigger notifications that cause them to try to grab each others’ locks…

Ugh. They’re acquiring locks in different orders, and the prize for that is doing nothing ever again forever until time stops. At which point you’re still doing nothing, but in a philosophically different sense at least.

The root problem here is that the Synchronize method holds a lock while sending out a notification, but notifications can do anything and that includes causing another thread to wait for the lock being held.

Caution

Realizing that raising events / invoking callbacks / sending notifications while holding a lock can create deadlocks is kind of depressing. Holding a lock while sending off a notification is really convenient. It solves so many ordering and overlap issues.

But every time you hold a lock while sending out a notification, you are introducing an invisible constraint on your entire program. If the graph of who-notifies-who-while-holding-a-lock contains a cycle, that’s a deadlock. Maybe that constraint won’t ever get violated, but if someone ever does hook the blue box up to the red box (and why wouldn’t they? What indication would they have that it’s a bad idea?) then woe is you because deadlock bugs are not fun to track down.

So this is the part of the post where I warn you: watch out for this problem. If you ever see an event being raised, a callback being run, or any sort of notification-that-can-trigger-arbitrary-behavior happening inside a lock then PAY ATTENTION because you’re staring at half of a deadlock bug.

lock (syncRoot) {
    ...
    invokeSuscribers(); // todo: HUGE RED BLINKING WARNING LIGHTS
    ...
}

Is there a way to get the notification outside of the lock? Do it. (See also: re-entrancy)

Note that properly extracting a notification from within a lock can be quite difficult. Presumably there’s a reason the notification was placed inside the lock in the first place. For example, in the case of BehaviorSubject, doing nothing besides moving the notification outside of the lock will break eventual consistency. Subscribers assume the latest value is the last value they received, and we want any subscribers that currently think an old value is the latest value to eventually find out about the newer value. But if we release the lock before sending the “here’s the latest value” notification, then the evil imp that arbitrarily delays threads can ruin our day by re-ordering “the latest value is ‘intermediate’” to come after “the latest value is ‘done’” and so subscribers end up discarding the latest value for a stale one.

There are techniques that avoid deadlocks while maintaining eventual consistency, but first let’s look at how this issue shows up in Rx and ReactiveCocoa.

Rx vs ReactiveCocoa

The distinction between Rx and ReactiveCocoa’s respective implementations of BehaviorSubject is exemplified by how they handle updating a subject’s current value.

Here’s the implementation of ReactiveCocoa’s BehaviorSubject’s sendNext message:

- (void)sendNext:(id)value {
    @synchronized (self) {
        self.currentValue = value;
        [super sendNext:value];
    }
}

Notice the notification being sent inside a lock? The lock is there to ensure the last value given to subscribers is in fact the latest value (i.e. eventual consistency), but it introduces a potential deadlock. If you wire BehaviorSubjects together in cyclic ways, and there’s multiple threads, your program is liable to just stop making progress.

Here’s the implementation of Rx’s BehaviorSubject’s OnNext method:

public void OnNext(T value)
{
    var os = default(IObserver<T>[]);
    lock (_gate)
    {
        CheckDisposed();

        if (!_isStopped)
        {
            _value = value;
            os = _observers.Data;
        }
    }

    if (os != null)
    {
        foreach (var o in os)
            o.OnNext(value);
    }
}

See how Rx is taking a snapshot of the value (and the observers, because they’re in a persistent collection), then releasing the lock, then forwarding notifications only after exiting the lock? The authors probably did that because they’re aware of the dangers of notifying while holding a lock. The problem here is that they’ve sacrificed eventual consistency. The notifications may be re-ordered with respect to the values being stored, so subscribers may end up with the wrong “latest” value. Instead of “the latest value is X” messages, this code gives us less useful “at one point the value was X” messages.

So each version has its own distinct bug-or-maybe-design-decision. It’s possible to fix both.

Solutions

I know two deadlock-avoidance techniques that will work in the case of BehaviorSubject, while maintaining eventual consistency. They are: 1) queue/drain, and 2) check-after-notify.

The “queue/drain” approach involves enqueueing notifications-to-send while holding the lock, then trying to acquire responsibility for draining the queue while not holding the lock. The synchronization around who’s responsible for draining the queue prevents notifications from being re-ordered, ensuring subscribers see the latest value last.

I’ve used this technique in a past post. The ActorSynchronizationContext described in Emulating Actors in C# with async/await is based on queue/drain. Adapting the code slightly makes it usable for our current case:

private readonly ConcurrentQueue<Action> _pending = new ConcurrentQueue<Action>();
private int _pendingCount;

// call this while holding the lock
public void Enqueue(Action thingToDoInOrderButOutsideLock) {
    _pending.Enqueue(thingToDoInOrderButOutsideLock);
}

// and this after releasing the lock
public void TryDrainQueue() {
    // try to acquire responsibility for draining the queue
    if (Interlocked.Increment(ref _pendingCount) > 1) return;
    
    // keep draining the queue until you release responsibility for it
    do {
        // perform a queued action
        Action a;
        _pending.TryDequeue(out a); // always succeeds, due to usage of _pendingCount
        a.Invoke();
        
        // try to release responsibility for draining the queue
    } while (Interlocked.Decrement(ref _pendingCount) > 0);
}

The downside of queue/drain-ing is that notifications can migrate across threads. This means you can’t rely on subscribers having been notified by the time the OnNext/sendNext call has returned (they’ll eventually be notified). Both Rx and ReactiveCocoa currently guarantee that your notification has been received by the time OnNext/sendNext returns (assuming no schedulers along the way), and violating that property could conceivably break existing code.

The other approach, “check-after-notify”, changes what a notification from a behavior subject tells subscribers. Instead of telling them the latest value, notifications would instead instruct them to re-check the latest value (subscribers would either know about the subject, or be given functions that go and get the current value).

The downside of check-after-notify is that single values may be observed repeatedly, and intermediate values may be missed entirely. That means check-after-notify only works when you truly only care about the latest value, as opposed to the sequence of values leading up to it. Another downside is that behavior subjects would no longer work like other observables.

My personal preference is queue/drain. I think its downsides are preferable when compared to introducing subtle deadlocks or losing eventual consistency. I particularly like how queue/drain resembles the actor model. That being said, there may be issues that I haven’t thought of. My impression of the authors of Rx is that they’re great at what they do, so I wouldn’t be surprised if they’ve already thought this through more extensively and have a good reason for their choice.

Summary

Holding a lock while sending notifications is half of a deadlock bug. It’s also really convenient.

Not holding a lock while sending notifications can sacrifice various consistency guarantees.

ReactiveCocoa’s observable value (BehaviorSubject) holds a lock while sending notifications. Rx’s does not, but sacrifices eventual consistency. Both outcomes are questionable, given other available possibilities.

Beyond BehaviorSubject, I’m pretty sure holding locks while notifying is the root cause of both this ReactiveCocoa deadlock bug and this Rx deadlock bug.

Discuss on Reddit

My Twitter: @CraigGidney

  • Romoku

    This looks like a good case for software transactional memory. The root of the problem is two threads trying to acquire the other threads lock. In SQL server this case results in a deadlock victim. For STM the first set of operations would be applied then the next set of operations would be applied.

    For this case:

    observableValue1 changes and begins a transaction which sends out notifications.
    Once the transaction is complete then observableValue2 can starts a transaction to send out notifications.

    This avoids the deadlock while keeping the state consistent between both objects. At least that is my understanding on STM.


Twisted Oak Studios offers consulting and development on high-tech interactive projects. Check out our portfolio, or Give us a shout if you have anything you think some really rad engineers should help you with.

Archive

More interesting posts (1 of 26 articles)

Or check out our Portfolio.