Perishable Collections: The Benefits of Removal-by-Lifetime

posted by Craig Gidney on January 8, 2013

Last week I talked about using lenses to decouple shared control. However, the example I gave was so simple that a “collection” of lenses could be represented with a single integer. This was on purpose, because having such a collection hints at this week’s content. So, this week, I’ll be starting with a collection-of-lenses example in order to lead into what I call a perishable collection, that automatically removes its items based on lifetime tokens, and discuss how such a collection affords transformed views of itself.

Add/Remove

Suppose we want to create a controller, to be used by multiple unknown components, for a music player’s volume. We decide to do this with lenses that apply maximum volumes (as opposed to scaling), effectively exposing the ability to invoke “That’s too loud! Don’t play it louder than X!”. Assuming volume goes from 0 to 100, here is such a controller:

class VolumeController {
    private readonly List<double> _maxVolumes = new List<double> {
        100 // default volume
    };
    public double Volume { get { return _maxVolumes.Min(); } }
    public void AddMax(double max) {
        if (max < 0 || max > 100) throw new ArgumentOutOfRangeException("max");
        _maxVolumes.Add(max);
    }
    public void RemoveMax(double max) {
        if (!_maxVolumes.Remove(max))
            throw new InvalidOperationException("No such max was added");
    }
}

There are three problems with this controller: usability, efficiency, and thread safety.

Usability. Every AddMax call must be followed by an eventual RemoveMax, every RemoveMax must be preceded by an AddMax, and the same maximum must be given to both. Failing to meet these constraints creates a bug. This is not a mountain of responsibility, but it’s easy to get wrong. Making it as easy as possible to get things right will decrease the chances of a bug permanently clamping the volume to 0.

Efficiency. The collection of lenses is a list. Removing a lens requires scanning through the list, which takes linear time. We can do (much) better than this.

Thread safety. When you intend to be exposed to multiple unknown components, it’s reasonable to expect some of them to be calling you from other threads. It’s not necessary to provide thread safety, but it’s a good idea in this sort of situation. At the moment, one thread removing while another thread adds can put the list into a permanently inconsistent state.

All three of these problems are solvable.

Implicit Removal

Here’s a standard trick for simplifying add+remove: get rid of the remove method, and have the add method return an IDisposable or take a CancellationToken. Disposing the result or cancelling the given token corresponds to asking for what was added to be removed. This inherently solves two aspects of the usability problems: it’s no longer possible to remove before adding, and it’s no longer necessary to worry about giving the same value to both calls.

A CancellationToken is a thing that can become cancelled, and allows you to register actions that run when it occurs. You call “token.Register(action)”, and the token will run your action when it is cancelled. If the token is already cancelled then the action runs right away. If the token is not yet cancelled then the action will be run as soon as it is. If the token is never cancelled, then the action will never be run. Sortof like a task, but without return values or exceptions.

In general, I prefer asking for a cancellation token, as opposed to returning an IDisposable. A method taking a token can support being stopped before it completes, can return a result, and can pass along the token to methods that it happens to call (without additional wiring code). There’s also the nice detail that CancellationToken is a struct, so you don’t need to deal with null (the default token is immortal instead of unusable). IDisposable certainly has its own advantages, but I can’t think of any compelling ones.

Here’s the result of modifying the volume controller to ask for a cancellation token:

class VolumeController {
    private readonly List<double> _maxVolumes = new List {100};
    public double Volume { get { return _maxVolumes.Min(); } }
    public void AddMax(double max, CancellationToken lifetimeOfLimit) {
        if (max < 0 || max > 100) throw new ArgumentOutOfException("max");
        _maxVolumes.Add(max);
        lifetimeOfLimit.Register(() => _maxVolumes.Remove(max));
    }
}

This fixes the usability problem (while reducing the amount of code!). Callers no longer have to track the maximum they added in order to later remove it. It’s now impossible to remove before adding. Callers can manage their own tokens, with a CancellationTokenSource, or use tokens provided by something else. The same token can be used multiple times. Basically, the ‘pairing-remove-with-add’ work has been reduced and abstracted into a form that allows re-use, sharing, and delegation.

However, I must point out that making this change increases the severity of the thread safety issue. Cancellation tokens could come from anywhere, and may be cancelled asynchronously and/or concurrently. But fixing the efficiency problems is going to change important details, so we’ll get to thread safety as part of that.

Using a Doubly Linked List

The second problem I mentioned was efficiency: removing items is taking linear time. You might be expecting the removal time to improve to logarithmic (with a search tree) or expected sorta-constant time (with a hash table), but we can actually get guaranteed constant time removal by using the much simpler doubly linked list. We don’t need to search for what to remove, because we can just remember nodes as we add them.

Here’s an implementation that uses a cyclic doubly linked list:

class VolumeController {
    private class Node {
        public double Value;
        public Node Prev;
        public Node Next;
    }
    private readonly Node _head;
    public VolumeController() {
        _head = new Node();
        _head.Next = _head.Prev = _head;
    }
    public double Volume {
        get {
            var result = 100.0; // default volume
            for (var n = _head.Next; n != _head; n = n.Next)
                result = Math.Min(result, n.Value);
            return result;
        }
    }
    public void AddMax(double max, CancellationToken lifetimeOfLimit) {
        if (max < 0 || max > 100) throw new ArgumentOutOfRangeException("max");
        var n = new Node {
            Value = max,
            Next = _head
        };
        lock (_head) {
            // link new maximum into list (before _head), stashing the node
            n.Prev = _head.Prev;
            n.Prev.Next = n;
            _head.Prev = n;
        }
        lifetimeOfLimit.Register(() => {
            lock (_head) {
                // unlink node from list
                n.Next.Prev = n.Prev;
                n.Prev.Next = n.Next;
            }
        });
    }
}

The amount of code has expanded a lot (I’ll deal with that in a minute), but we now have guaranteed constant time addition and guaranteed constant time removal. When an item is added, its node is stored in the closure of the lambda registered to the cancellation token. So, when the token is cancelled, there’s no need to search for the node to remove.

This code is also thread safe (erm, I think) (Update: not thread safe. See My Bug, My Bad #4), due to the lock blocks around the insertion and removal of nodes. Interestingly, it’s not strictly necessary to synchronize access to the list when computing the maximum volume. The ‘Next’ pointer is guaranteed to be valid and never cleared, so all paths lead to the head node. There are potential data races but, on the scope of an individual race, grabbing a lock just moves the race around. Consider what can happen during an iteration as the collection is modified:

  1. The iteration is on a node c that is being removed: The removal’s completion has no effect on the iteration. c.Next is not modified by the removal, so the iteration ‘escapes’ the same way whether or not the removal completes first.
  2. The iteration is on a node c whose next node is being removed: Whether or not the next node will be included in the iteration is determined by a read/write race on c.Next. Grabbing the lock beforehand would just change the race to a write/write race on the lock, not remove the race. (Also, pointer assignment is guaranteed to be atomic in .Net so the iterator won’t read a invalid half-changed pointer.)
  3. The iteration is on a node c whose next node is head and a node is being inserted (before head): Whether or not the next node will be included is determined by a read/write race on c.Next. (Note that the node being inserted has valid Value and Next fields before the lock is acquired to add it, ensuring things can’t re-ordered so that it is in the chain before those details are valid.) Holding the lock would just change the race to a write/write race on the lock, not remove the race.

(Side note: actually I would appreciate input on this. This sort of thread safety is depressingly hard to get right. Have I actually prevented the iterator from seeing invalid ‘Next’ values by initializing them before acquiring the lock? Is there some possible re-ordering or optimization that makes everything fall apart?)

So, the iteration should always advance and eventually terminate and each race individually can only be moved around by a lock. However, different races might be moved differently. We lose linearizability. The collection seen by the iteration may contain anachronisms (to borrow a term), where you see a later change but not an earlier one. For example, consider this scenario:

             >---> time >--->
Thread1:        add:1              add:2             remove:1   add:3
Thread2:    iterate:start    iterate:1     iterate:2                   iterate:3    iterate:done

The iterator saw the sequence [1,2,3] even though the value 1 was removed before the value 3 was added. The inclusion of 1 at the same time as 3 is an anachronism. If the iterator respected linearizability, then the result would have been [], [1], [1,2], [2] or [2,3].

Note that anachronisms are not necessarily a problem. Usually you can ignore them, because you won’t miss any items that stay put through the iteration. Also, since items are added at the end of the list, the anachronisms you can see are more limited (Exercise: Can the computed volume spike upward at random? Find a situation where an anachronistic minimum volume is 100, without including a 100 in the collection, or prove that it’s impossible.). It’s not a big deal if the volume is limited to 20 for a moment as the collection transitions from [20, 10, 30] to [10, 30] to [30]. The benefit you get, not having to lock the collection while all of its contents are enumerated, is very compelling.

In any case, now that we know to use a doubly linked list with automatic removal, we can abstract it into its own type.

Using a Perishable Collection

Perishable collections implement the idea of “added until a lifetime expires”. Instead of having a remove method, items are removed automatically when the lifetime they are paired with ends (i.e. when they perish). I stumbled into this idea, and implemented it, as part of the Element project. My implementation is available as source code from github and as a NuGet package.

The main benefit of a perishable collection is its affinity for transformed views, but lets start by rewriting our starting example to use one:

class VolumeController {
    private readonly PerishableCollection<double> _maxVolumes = new PerishableCollection<double>();
    public VolumeController() {
        _maxVolumes.Add(100, Lifetime.Immortal); // default volume
    }
    public double Volume { get { return _maxVolumes.Min(); } }
    public void AddMax(double max, Lifetime lifetime) {
        if (max < 0 || max > 100) throw new ArgumentOutOfRangeException("max");
        _maxVolumes.Add(max, lifetime);
    }
}

This implementation improves on all three problems I mentioned earlier (usability, efficiency, thread-safety). There’s no need to pair removes with adds anymore (the work has been abstracted into the lifetime). Addition and (implicit) removal are now both constant time. It’s still possible to reduce the cost of computing the current volume (currently linear), by caching and updating, but that’s outside of the scope of what I want to address in this post. The class is even thread-safe, assuming intermediate anachronistic volumes are allowed.

Lifetime Library Diagram Side note: you may have noticed that the code is now using a Lifetime instead of a CancellationToken. Lifetime, another open source library with a NuGet package, fixes a small problem: when a CancellationTokenSource is garbage collected, holding onto its (now immortal) token prevents the garbage collection of its registered callbacks (which will never be run) and what they reference. Lifetime also supports temporary registrations (so they don’t accumulate over time in some corner cases) and the library includes some extra utility classes and methods. I intended to make a whole post about the library, and maybe I will, but for now all of its functionality is summarized on a small thrown-together diagram (to the right).

So an observable collection fits our example use case nicely. But the items in a perishable collections can also be observed over time, by using the CurrentAndFutureItems method. This returns an IObservable<Perishable<T>>, which an observer can be subscribed to in order to be given item/lifetime pairs in the collection. When an item is added, observers are given the item and its lifetime. When an item perishes, observers know about it because they have the lifetime.

Notice that, to determine which item was removed, an observer does not have to really do anything. The lifetime was already paired with the item. Contrast this with an observable collection that has ItemAdded/ItemRemoved events. When an observer receives an ItemRemoved event, they have to match it against a previous ItemAdded event (in some way). For example, if you want to show a UI control for each item in a collection, then you must maintain a dictionary matching each item to a list of controls (an item may appear multiple times). With lifetimes you just make the collection of controls perishable, propagate the lifetimes along, and never even think about dealing with dictionaries of lists (ugh..).

Without the need to match later removals, we can do something interesting: impure transformations. For example, augmenting items to include a unique id:

// create a new perishable collection
var collection = new PerishableCollection<string>();

// add some items to the collection
var mortalLife = new LifetimeSource();
collection.Add("mortal", mortalLife.Lifetime); //"mortal" will be removed when mortalLife is ended
collection.Add("forever", Lifetime.Immortal); //"forever" will never be removed

// get an observable that can be used to track the collection's items
IObservable<Perishable<string>> itemsObservable = collection.CurrentAndFutureItems();

// pair observed perishable items to a unique id
var id = 0;
IObservable<Perishable<Tuple<string, int>>> itemsWithIds = itemsObservable.LiftSelect(e => Tuple.Create(e, Interlocked.Increment(ref id)));

// observe id'd items back into a separate new perishable collection
PerishableCollection<Tuple<string, int>> collectionWithIds = itemsWithIds.ToPerishableCollection();
var peek1 = collectionWithIds.CurrentItems().ToArray(); // {("mortal", 1), ("forever", 2)}

// modify the original collection, checking that changes propagate to the transformed collection
collection.Add("late", Lifetime.Immortal); //
mortalLife.EndLifetime();
var peek2 = collectionWithIds.CurrentItems().ToArray(); //{("forever", 2), ("late", 3)}

In the above example, each item from a perishable collection is transformed into a new form, which has a unique id, and then fed into a new separate perishable collection. It uses the ‘LiftSelect’ method, which transforms the values inside perishables instead of the perishables themselves, which is defined in the perishable collections library. The library also defines ‘ListWhere’, to filter using the value inside perishables, and ‘ObserveNonPerishedCount’, to track the number of items in a perishable collection.

Note that the resulting transformed collection, from the example, has no trouble staying in sync. Even though it would have no way of matching a hypothetical RemovedItem event to an item it had seen before (because the projection never gives the same result twice). It’s impossible to do this sort of thing with ItemAdded/ItemRemoved events, without additional brittle wiring code to cache or undo the projection. This is what I mean when I say perishable collections “afford transformation”. They sidestep issues that shouldn’t be problems in the first place, so that you can just do what you want.

The example project for perishable collections takes this sort of concept and runs with it. The architecture is a bit over-the-top silly, actually, and drawing is done in a dumb naive way. But the game is kinda fun, or at least relaxing, and it has some interesting geometry code I’ll discuss at some point.

Snip Snap Screen Shot

Summary

Perishable collections have constant-time addition and automatic constant-time removal. They can be iterated and observed concurrently (but be aware of potential anachronisms). Transformed views of a perishable collection can naturally match removals to additions, without dealing with item equality.

My implementation of a perishable collection is available as source code and as a NuGet package. I would appreciate any comments about the thread-safety of the code. Lifetimes (improved cancellation tokens) are also available as source code and as a NuGet package.

(I hope anachronism catches on as a technical term.)

Discuss on Reddit, Hacker News


Twisted Oak Studios offers consulting and development on high-tech interactive projects. Check out our portfolio, or Give us a shout if you have anything you think some really rad engineers should help you with.

Archive

More interesting posts (6 of 8 articles)

Or check out our Portfolio.