SLaks.Blog

Making the world a better place, one line of code at a time

Nested Iterators, part 2

Posted on Thursday, December 16, 2010, at 9:50:00 PM UTC

In part 1, we discussed the simple approach to making a nested iterator.  However, we fell short of a completely lazy nested iterator.

In simple cases, we can make an separate iterator method for the subsequence:

IEnumerable<IEnumerable<int>> FullyLazy() {
    for(int i = 0; i < 10; i++) 
        yield return Inner(i);
}
IEnumerable<int> Inner(int i) {
    for(int j = 0; j < 10; j++)
        yield return i * 10 + j;
}
Note that this is actually smaller than the single-method implementation!

This seems to work very well; the inner iterator code for a particular subsequence will not execute at all unless that subsequence is actually enumerated.

However, this approach falls short in practice.  To see why, consider a real-world example.
Here is a fully lazy implementation of a Partition method, which converts a sequence into a 2D “jagged IEnumerable” where each subsequence (partition) contains n elements.

class Ref<T> { 
    public Ref(T value) { Value = value; } 
    public T Value { get; set; } 
}

public static IEnumerable<IEnumerable<T>> Partition<T>
              (this IEnumerable<T> sequence, int size) {
    using(var enumerator = sequence.GetEnumerator()) {
        var isFinished = new Ref<bool>(false);
        while(!isFinished.Value) {
            yield return PartitionInner(
                 enumerator, size, isFinished
            );
        }
    }
}
static IEnumerable<T> PartitionInner<T>
      (IEnumerator<T> enumerator, int size, Ref<bool> isFinished) {
    while(size --> 0) {
          isFinished.Value = !enumerator.MoveNext();
        if (isFinished.Value) yield break;
        yield return enumerator.Current;
    }
}

This method is complicated by the need to communicate back from the inner iterator to the outer one.  Since iterators cannot have ref parameters, I need to make a “box” class that holds a reference to an int.  This allows the outer iterator to find out when the sequence finishes.

In addition, this implementation has a subtle bug: If the last partition is full, it will return an extra, empty, partition after it.
Fixing this issue would require that the outer method call MoveNext() before each call to the inner method.  This makes the code even more complicated; I won’t list it here (unless people really want me to)

This design has a more fundamental problem: The behavior of the outer method is determined by the inner ones.  It will only work if each inner IEnumerable<T> is enumerated exactly once, before the next MoveNext() call to the outer iterator.  If, for example, you iterate each inner iterator twice, it will behave unexpectedly:

Enumerable.Range(1, 8)
          .Partition(3)
          .Select(p => String.Join(", ", p.Concat(p)));
This code is intended to create strings that contain each partition twice.  It is supposed to return
1, 2, 3, 1, 2, 3
4, 5, 6, 4, 5, 6
7, 8, 7, 8

However, it actually creates the strings

1, 2, 3, 4, 5, 6
7, 8

Enumerating the inner iterator a second time will end up consuming extra items from the original sequence.

Thus, in order to produce enumerables that behave correctly, the inner enumerator needs to cache its items; it is impossible to make a fully lazy Partition method.

It gets worse.  In order to allow callers to write thingy.Partition(5).Skip(2), (which should skip the first two partitions) the outer enumerator needs to cache all items, because it cannot assume that the inner iterators will be called at all. 

Thus, the laziest possible Partition method must use the original approach:

public static IEnumerable<IEnumerable<T>> PartitionCorrect<T>
              (this IEnumerable<T> sequence, int size) {
    List<T> partition = new List<T>();
    foreach(var item in sequence) {
        partition.Add(item);
        if (partition.Count == size) {
            yield return partition;
            partition = new List<T>();
        }
    }
    if (partition.Count > 0)
        yield return partition;
}

It would be possible to write a slightly lazier version that combines these two approaches and passes a List<T> to the inner iterator, which would return whatever is in the list, then enumerate if necessary to fill the list. However, I’m too lazy to do it.  (unless people really want me to)

Categories: laziness, iterators, C#, .Net Tweet this post

comments powered by Disqus