Counting Iterators Lazily

posted by Craig Gidney on June 25, 2013

Suppose you want to determine if a sequence contains three or more items.

An obvious solution comes to mind: count the items, and see if the result is at least three. Unfortunately, things aren’t so simple. The obvious solution is inefficient, and can even create an infinite loop.

bool HasThreeOrMore<T>(IEnumerable<T> sequence) {
    return sequence.Count() >= 3; //<-- potential hang
}

In this post I'll explain the problem, the typical workarounds, and a custom workaround based on a lazy counter.

Going Too Far

When you ask for a Count of how many items are in a sequence, the program is going to determine how many items are in that sequence. There's no optimization (in C#, now or in the likely future) that "looks ahead" and sees that, because the count will only be used in a comparison against three, counting can stop after three or four items. As Eric Lippert puts it: computers are dumb. You asked for the exact number of items in the sequence, and if every item in that sequence has to be enumerated in order to give you that value then so be it.

Normally computers being dumb isn't too much of a big deal. You pay abstraction overhead all the time because of it. However, this overhead is different because it increases with the size of the sequence. The cost of counting all the items in a sequence is linear but the cost of counting up to three should be constant.

Given a large enough sequence, counting to three via Count will take seconds instead of nanoseconds. Even worse, the call to Count may never complete because some sequences are unbounded:

public static IEnumerable<T> RepeatForever<T>(this T value) {
    while (true) yield return value;
}

...

return "Hey! Listen!".RepeatForever().HasThreeOrMore(); // <--- hangs

The above program will hang as it desperately tries to count to infinity. Fortunately, because the arithmetic inside Count is done in a checked context, it won't actually hang forever. You'll get an arithmetic overflow exception when the internal count (eventually) exceeds Int32.MaxValue.

Standard Solutions

There's various solutions on stackoverflow for the 'counting too far' problem.

One method is to manually manage the enumerator:

bool HasAtLeast_A<T>(IEnumerable<T> sequence, int n) {
    using (var e = items.GetEnumerator())
        for (var i = 0; i < n; i++)
            if (!e.MoveNext())
                return false;
    return true;
}

That's pretty verbose, though. The Take method allows you to do the same thing without an explicit loop, by truncating the sequence being counted once you've reached your goal:

bool HasAtLeast_B<T>(IEnumerable<T> sequence, int n) {
    return sequence.Take(n).Count() >= n;
}

These solutions work, but they're not ideal. For one thing, we're not using the standard comparison operators anymore. I would prefer to use <, as opposed to a method with "IsLessThan"/"HasAtLeast" in the name. I want to be able to write this:

bool HasAtLeast_C<T>(IEnumerable<T> sequence, int n) {
    return sequence.LazyCount() >= n; // (requires LazyCount to be implemented)
}

Is there some way to do that?

Lazy Counting

The fundamental reason that Count does more work than necessary is its signature: only takes a sequence, returns an integer. Count can't predict what its result is going to be compared against, because that information isn't passed in. Count also can't have its result 'update itself' when the comparisons are performed, because integers don't support that sort of thing. Ultimately, Count's signature forces it to compute the exact number of items and return that (or be incorrect).

To overcome the limitation on Count, we're going to change the signature. Instead of taking extra details about what we will compare against, we'll return a type that will use those details when the comparisons actually happen. I call that type LazyCounter.

The actual mechanics of LazyCounter are straightforward. It has to internally track an enumerator to advance, and the count so far. It needs comparison operators that advance the internal count, but don't advance it more than necessary (e.g. X <= Y requires you to try to advance X past Y before you know the result, but X < Y only requires you to advance X to match Y). LazyCounter should have implicit conversions from int/long to LazyCounter, so that int-vs-Lazy and Lazy-vs-int comparisons are the same as Lazy-vs-Lazy comparisons. Equality (Equals/GetHashCode) based on the final count, as opposed to the current state, is also a good idea.

All in all, LazyCounter is 150 lines of boring code.

With the LazyCounter type in hand, we can write LazyCount method(s) in a straightforward way:

public static LazyCounter LazyCount<T>(this IEnumerable<T> sequence) {
    if (sequence == null) throw new ArgumentNullException("sequence");

    var asCollection = sequence as ICollection;
    if (asCollection != null) return asCollection.Count;

    var asReadOnlyCollection = sequence as IReadOnlyCollection<T>;
    if (asReadOnlyCollection != null) return asReadOnlyCollection.Count;

    return new LazyCounter(countAdvancer: sequence.GetEnumerator());
}

Of course, in the spirit of minimizing how much iteration is done, the code checks for common cases where a sequence has a conveniently precomputed count.

With LazyCount available, you can perform comparisons on the number of items in a sequence in a natural way without worrying about counting too far. LazyCount is a bit slower than Count, overall, but it won't count to infinity when you only need to count to three.

Summary

Comparisons involving a sequence's Count are inefficient (and incorrect, if the sequence is not guaranteed to be finite). This problem can be avoided by truncating the sequence or by using a lazy count.

Lazy counts only advance when asked for the results of comparisons, and only as much as necessary.

(To be honest, although I find lazy counting interesting, I typically just truncate sequences with Take when I want a limited count. It's simpler.)

---

Discuss on Reddit

---


Twisted Oak Studios offers consulting and development on high-tech interactive projects. Check out our portfolio, or Give us a shout if you have anything you think some really rad engineers should help you with.

Archive

More interesting posts (11 of 23 articles)

Or check out our Portfolio.