Some alternative Zip extension methods

Filed in Software Developement Leave a comment

I ran into a small problem today, i had ‘n’ arrays of correlated data. I wanted to effectively “transpose” the rows into columns so that i would end up with the same number of arrays but containing the items of the same index of all arrays. to simplify that terrible explanation, here are some visuals:

start : [a0, a1, a2], [b0, b1, b2], [c0, c1, c2] –> transpose –> [a0, b0, c0], [a1, b1, c1], [a2, b2, c2]

Now the Zip function in .net4 can do this very easily, but only for TWO enumerables, not any number. Its basic function is to interleave two enumerables and project the result. I wanted to do something similar but without the silly two enumerable limitation. It’s very easy to extend Zip to operate on > 2 enumerables statically, you just need to add more and more params to the extension method, but i wanted to have true dynamic interleaving functionality. So i wrote two extension methods (really just one, and then a simplification for the standard projection) that would do this. One consequence to note is that this will be slower than the built-in Zip, quite obviously because we’re dealing with more enumerables, but also because we can’t run the same type of optimizations that Zip does (this is due to the dynamic nature of this function). So here is some code to look at:

public static IEnumerable<TResult> ZipMany<TItem, TResult>(
    this IEnumerable<IEnumerable<TItem>> source,
    Func<IEnumerable<TItem>, TResult> func)
    var iters = source.Select(x => x.GetEnumerator()).ToArray();
    while (iters.All(x => x.MoveNext()))
        yield return func(iters.Select(x => x.Current));

public static IEnumerable<List<TItem>> ZipMany<TItem>(
    this IEnumerable<IEnumerable<TItem>> source)
    return ZipMany(source, x => x.ToList());

So the idea is we build an array of enumerators for each of the input source lists. We have to call ToArray to invoke the execution of the projection, otherwise some strange and undesirable stuff happens. Then we just iterate each enumerator at the same pace and project the index aligned projection of the results. It’s actually pretty simple code for what it is really doing, but it works quite well and i thought i would share it. The second method is just a helper that simplifies the first by removing the need to project the results, this way you can project the results into any custom type if you want, but the default is to simply project them into a list (which is how i can imaging myself and most others wanting the results returned).

I am curious if anyone can point out any improvements or optimizations on this code. Mostly just so i can learn from them myself. Other than that, i am quite happy with how this turned out and it works wonderfully with the application that i wanted to use it with.

This same post is available at the msdn forums if you want to follow the comments there as well:

, ,