Wednesday, December 10, 2008

Tuples...

Tuples (which I have always pronounced 'tuh-pul' and others pronounce 'too-pull') are interesting data structures in that they combine two or more pieces of otherwise unrelated data of any types. They are most common (and supported) in functional languages, where they are considered first-class language constructs. For example, to create a Tuple in F#, you simply surround the elements you want to tuplize in a set of parenthesis: [ let x = (1, "one") ] - x in this case will have the type [ int * string ].

You may not realize it yet, but tuples have been around in .NET since 2.0 - the KeyValuePair generic class is, essentially, an immutable tuple of any two values. Using it as such, though, is unwieldy in my opinion - its purpose was to facilitate enumerating through dictionaries, and while it works fine in this context, KeyValuePair is ill-suited to more generic purposes.

During my experimentation with F# and other functional languages, I've come to see the Tuple type as a very valuable data structure that I miss very much when working on "real-world" code (real-world in this context meaning what I do for a living). .NET has no Tuple type, and while I'm aware that .NET 4 will have this type, that doesn't help me now. I could use the F# tuple types in my C# code, yes - but I'd have to distribute the F# binaries with my software, and I don't want to do that if I'm not using F#. My only other options are to use someone else's library, or roll my own. I've opted for the latter.

I've implemented four immutable tuple struct types, which hold between two and 5 values. Additionally, I implemented several helper methods and extension methods that make working with the Tuples a little easier in C#. Here's the code for the two-value Tuple struct:

    1 public struct Tuple<T1, T2> {

    2     private readonly T1 _value1; public T1 Value1 { get { return _value1; } }

    3     private readonly T2 _value2; public T2 Value2 { get { return _value2; } }

    4     public Tuple(T1 value1, T2 value2) { _value1 = value1; _value2 = value2; }

    5 

    6     public override bool Equals(object obj) {

    7         if (!(obj is Tuple<T1, T2>)) return false;

    8         if (obj == null) return false;

    9 

   10         Tuple<T1, T2> t = (Tuple<T1, T2>)obj;

   11         return (Value1.Equals(t.Value1) && Value2.Equals(t.Value2));

   12     }

   13 

   14     public override int GetHashCode() {

   15         return Value1.GetHashCode() ^ Value2.GetHashCode();

   16     }

   17 

   18     public KeyValuePair<T1, T2> AsKeyValuePair() {

   19         return new KeyValuePair<T1, T2>(Value1, Value2);

   20     }

   21 }

   22 ...

   23 

   24 public static class Tuples {

   25     public static Tuple<T1, T2> Tuple<T1, T2>(T1 value1, T2 value2) {

   26         return new Tuple<T1, T2>(value1, value2);

   27     }

   28     ...

   29 

   30     public static Tuple<T1, T2> Default<T1, T2>() {

   31         return new Tuple<T1, T2>(default(T1), default(T2));

   32     }

   33     ...

   34 

   35     public static IEnumerable<Tuple<T1, T2>> Zip<T1, T2>(IEnumerable<T1> first, IEnumerable<T2> second) {

   36         var enum1 = first.GetEnumerator();

   37         var enum2 = second.GetEnumerator();

   38 

   39         while (enum1.MoveNext() && enum2.MoveNext()) {

   40             yield return Tuple(enum1.Current, enum2.Current);

   41         }

   42     }

   43     ...

   44 }

   45 

   46 public static class TupleExtensions {

   47     public static Tuple<IEnumerable<T1>, IEnumerable<T2>> Unzip<T1, T2>(this IEnumerable<Tuple<T1, T2>> ienum) {

   48         var first = new List<T1>();

   49         var second = new List<T2>();

   50 

   51         foreach (var t in ienum) {

   52             first.Add(t.Value1);

   53             second.Add(t.Value2);

   54         }

   55 

   56         return Tuples.Tuple(first.AsEnumerable(), second.AsEnumerable());

   57     }

   58     ...

   59 

   60     public static Tuple<T1, T2> AsTuple<T1, T2>(this KeyValuePair<T1, T2> kvp) {

   61         return Tuples.Tuple(kvp.Key, kvp.Value);

   62     }

   63 }


The '...'s denote where the pattern is extended to cover all the tuple value counts from 2 to 5.

Only a little explanation is really needed here - the Tuple structs are read-only, so once they're initialized they can't be reset. In my experience this isn't a problem - I've never really *needed* that functionality where I can't just create a new Tuple. The static 'Tuples' class makes it easier to initialize a Tuple - using this, I can create a Tuple from existing data without having to add the type parameters. The compiler figures it out from the existing type data. Can't do this with constructors, sadly. The 'AsKeyValuePair' and 'AsTuple' methods (which only work with the 2-tuple struct) are pretty self explanatory.

The Zip method takes two or more IEnumerables and 'zips' them together into a single IEnumerable of Tuples. The Unzip method sort performs the reverse, although since you can only return one value from a method I package the unzipped Enumerables in a single Tuple.

If you'd like to use this in your own projects, I've uploaded my Tuples file to PasteBin - you can get to it from here. No attribution needed - though it would be nice if you'd drop a line here to let me know it's been useful to you. =)

1 comment:

Andreas said...

if (!(obj is Tuple))
return false;
if (obj == null)
return false;

The second if is redundant since if obj is null, the first if will already bail out.