109: (Default)
[personal profile] 109
In response to Vova Patryshev's article

The question I tried to answer here is "How to write efficient and generic in-memory index". It seems that the concept of Map, coming from functional languages, is unavoidable. Fortunately, in C# we have delegates, which allow us to avoid the hassle with interfaces. For the sake of keeping the dialog going, I used the same data class (hunter, datetime, mammothWeight), invented by Vova.


File index.cs contains an implementation of generic index structure. Please note that sorting issues are not addressed. Obviously, we can sort the index each time we do binary search or we can always maintain it in the sorted state by inserting new value in correct position. The choice is affected by the size if the index and the frequency of inserts-deletes versus the frequency of binary searches. This example does neither, it simply re-sorts the whole collection every time new object is added, which is of course very inefficient, but optimizing sorting strategy is out of scope for now.

It is also should be obvious how to extend this class to change the default sorting with Comparer.Default to any imaginable sort order, which would allow index to act as so-called "functional index". Finally, "Get" method can be overloaded to accept two parameters - Key and delegate Predicate<Key>, which will allow to run queries like "get data where key > something" or even "get data where SomeComplicatedFunction(key) is true".

Please enjoy the code:
using System.Collections.Generic;

delegate Y Map<X, Y>(X x);

class Index<Key, Data> : List<Data>
{
	private Map<Data, Key> getKey;
	private Comparer<Key> keyComparer = Comparer<Key>.Default;

	public Index(Map<Data, Key> getKey)
	{
		this.getKey = getKey;
	}

	private int Compare(int index, Key key)
	{
		return keyComparer.Compare(getKey(this[index]), key);
	}

	private int Compare(Data x, Data y)
	{
		return keyComparer.Compare(getKey(x), getKey(y));
	}

	public new void Sort()
	{
		base.Sort(Compare);
	}

	private int BinarySearch(Key key)
	{
		int first = 0;
		int last = Count - 1;
		while (first <= last)
		{
			int current = first + ((last - first) >> 1);
			int comparison = Compare(current, key);
			if (comparison == 0)
				return current;
			if (comparison < 0)
				first = current + 1;
			else
				last = current - 1;
		}
		return -1;
	}

	public List<Data> Get(Key key)
	{
		int mid = BinarySearch(key);
		if (mid == -1)
			return new List<Data>(); // empty list, arguable

		int first = mid - 1;
		while (first >= 0)
		{
			if (Compare(first, key) != 0)
				break;
			first--;
		}
		first++;

		int last = mid + 1;
		while (last < Count)
		{
			if (Compare(last, key) != 0)
				break;
			last++;
		}
		last--;

		return GetRange(first, last - first + 1);
	}
}


File hunt.cs is the example of the index usage. Please note that there are no constraints for the Hunt class. No interfaces to implement, nothing. The only "adapter" between data class and the index is the implementation of the Map delegate. This code doesn't have to be a part of data class, as well as the code that puts data in the storage or looks it up. Which means that the suggested Index class can be applied to any existing class without the need to change it.

using System;
using System.Collections.Generic;

class Hunt
{
	private string hunterName;
	private DateTime killedDate;
	private double mammothWeight;

	public Hunt(string hunterName, DateTime killedDate, double mammothWeight)
	{
		this.hunterName = hunterName;
		this.killedDate = killedDate;
		this.mammothWeight = mammothWeight;
	}

	public override string ToString()
	{
		return string.Format(
			"{0} scored {1} on {2}",
			hunterName,
			mammothWeight,
			killedDate);
	}

	private static string GetHunterName(Hunt hunt)
	{
		return hunt.hunterName;
	}

	private static Index<string, Hunt> hunterNameIndex = 
		new Index<string, Hunt>(new Map<Hunt, string>(GetHunterName));
	
	public static void Add(Hunt hunt)
	{
		hunterNameIndex.Add(hunt);
		hunterNameIndex.Sort();
	}

	public static List<Hunt> ByHunterName(string hunterName)
	{
		return hunterNameIndex.Get(hunterName);
	}
}

class TestHunt
{
	public static void Main()
	{
		Hunt.Add(new Hunt("Kosta", DateTime.Now, 42));
		Hunt.Add(new Hunt("Vlad", DateTime.Now, 142));
		Hunt.Add(new Hunt("Vlad", DateTime.Now, 143));

		Console.WriteLine("\nKosta scores:");

		List<Hunt> list = Hunt.ByHunterName("Kosta");
		foreach (Hunt hunt in list)
		{
			Console.WriteLine(hunt);
		}

		Console.WriteLine("\nJohn scores:");

		list = Hunt.ByHunterName("John");
		foreach (Hunt hunt in list)
		{
			Console.WriteLine(hunt);
		}

		Console.WriteLine("\nVlad scores:");

		list = Hunt.ByHunterName("Vlad");
		foreach (Hunt hunt in list)
		{
			Console.WriteLine(hunt);
		}
	}
}


In order to test the code in Visual Studio, create empty solution and add Index.cs and Hunt.cs with the content copied from above.

(no subject)

Date: 2007-01-27 08:28 am (UTC)
From: [identity profile] 109.livejournal.com
Btw -- you introduced the collection of hunt facts, not me. I really, really don't understand your comment.

(no subject)

Date: 2007-01-27 10:51 pm (UTC)
From: [identity profile] ivan-gandhi.livejournal.com
Well, I see the problem; and I am not ready to offer a solution. The problem is the perception. I need time to figure out how to deal with it.

(no subject)

Date: 2007-01-27 11:11 pm (UTC)
From: [identity profile] 109.livejournal.com
I can try to help you. First two parts of your article are just a declarations like: there is such things as map and currying and stuff and this is how we express these concepts in java. I don't see any value in it [personally for me], unless I see how to employ these concepts to make something practical. The third part of your article was an attempt of a practical example. The result of the attempt, in my opinion, was rather poor for 3 reasons:

1. the code snippet is incomplete and cannot be even tested "as is",
2. it is functionally limited to key equality only, and
3. it is inefficient in terms of used memory, because it duplicates keys.

So my goal was to give people something that works off the bat (do you think it was a lot of fun writing BinarySearch? boring.), efficient and extensible (allow to plug-in more functionality). Have I succeeded or failed? That's the feedback I would like to get, not "it is writing haskell in assembler". I have extreme difficulties understanding how does it relate to my post and/or what made you say that.

(no subject)

Date: 2007-01-29 11:19 pm (UTC)
From: [identity profile] ivan-gandhi.livejournal.com
:) You know I love theory more than practice. I don't give a shit about "saving memory on keys"; and the fact that my example does not solve a big problem of binary search or something... big deal. It is about the vision. How we view things. And here I see a problem. Conveying the vision.

(no subject)

Date: 2007-01-29 11:30 pm (UTC)
From: [identity profile] 109.livejournal.com
right, but if the vision is not how to use abstract concepts to make something practical, then what is it? you know I have the utmost respect for you, it makes me so sad to be not able to understand your vision...

Profile

109: (Default)
109

March 2019

S M T W T F S
     12
3456789
101112131415 16
17181920212223
24252627282930
31      

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags