109: (Default)
[personal profile] 109
In response to Vova Patryshev's article

The question I tried to answer here is "How to write efficient and generic in-memory index". It seems that the concept of Map, coming from functional languages, is unavoidable. Fortunately, in C# we have delegates, which allow us to avoid the hassle with interfaces. For the sake of keeping the dialog going, I used the same data class (hunter, datetime, mammothWeight), invented by Vova.


File index.cs contains an implementation of generic index structure. Please note that sorting issues are not addressed. Obviously, we can sort the index each time we do binary search or we can always maintain it in the sorted state by inserting new value in correct position. The choice is affected by the size if the index and the frequency of inserts-deletes versus the frequency of binary searches. This example does neither, it simply re-sorts the whole collection every time new object is added, which is of course very inefficient, but optimizing sorting strategy is out of scope for now.

It is also should be obvious how to extend this class to change the default sorting with Comparer.Default to any imaginable sort order, which would allow index to act as so-called "functional index". Finally, "Get" method can be overloaded to accept two parameters - Key and delegate Predicate<Key>, which will allow to run queries like "get data where key > something" or even "get data where SomeComplicatedFunction(key) is true".

Please enjoy the code:
using System.Collections.Generic;

delegate Y Map<X, Y>(X x);

class Index<Key, Data> : List<Data>
{
	private Map<Data, Key> getKey;
	private Comparer<Key> keyComparer = Comparer<Key>.Default;

	public Index(Map<Data, Key> getKey)
	{
		this.getKey = getKey;
	}

	private int Compare(int index, Key key)
	{
		return keyComparer.Compare(getKey(this[index]), key);
	}

	private int Compare(Data x, Data y)
	{
		return keyComparer.Compare(getKey(x), getKey(y));
	}

	public new void Sort()
	{
		base.Sort(Compare);
	}

	private int BinarySearch(Key key)
	{
		int first = 0;
		int last = Count - 1;
		while (first <= last)
		{
			int current = first + ((last - first) >> 1);
			int comparison = Compare(current, key);
			if (comparison == 0)
				return current;
			if (comparison < 0)
				first = current + 1;
			else
				last = current - 1;
		}
		return -1;
	}

	public List<Data> Get(Key key)
	{
		int mid = BinarySearch(key);
		if (mid == -1)
			return new List<Data>(); // empty list, arguable

		int first = mid - 1;
		while (first >= 0)
		{
			if (Compare(first, key) != 0)
				break;
			first--;
		}
		first++;

		int last = mid + 1;
		while (last < Count)
		{
			if (Compare(last, key) != 0)
				break;
			last++;
		}
		last--;

		return GetRange(first, last - first + 1);
	}
}


File hunt.cs is the example of the index usage. Please note that there are no constraints for the Hunt class. No interfaces to implement, nothing. The only "adapter" between data class and the index is the implementation of the Map delegate. This code doesn't have to be a part of data class, as well as the code that puts data in the storage or looks it up. Which means that the suggested Index class can be applied to any existing class without the need to change it.

using System;
using System.Collections.Generic;

class Hunt
{
	private string hunterName;
	private DateTime killedDate;
	private double mammothWeight;

	public Hunt(string hunterName, DateTime killedDate, double mammothWeight)
	{
		this.hunterName = hunterName;
		this.killedDate = killedDate;
		this.mammothWeight = mammothWeight;
	}

	public override string ToString()
	{
		return string.Format(
			"{0} scored {1} on {2}",
			hunterName,
			mammothWeight,
			killedDate);
	}

	private static string GetHunterName(Hunt hunt)
	{
		return hunt.hunterName;
	}

	private static Index<string, Hunt> hunterNameIndex = 
		new Index<string, Hunt>(new Map<Hunt, string>(GetHunterName));
	
	public static void Add(Hunt hunt)
	{
		hunterNameIndex.Add(hunt);
		hunterNameIndex.Sort();
	}

	public static List<Hunt> ByHunterName(string hunterName)
	{
		return hunterNameIndex.Get(hunterName);
	}
}

class TestHunt
{
	public static void Main()
	{
		Hunt.Add(new Hunt("Kosta", DateTime.Now, 42));
		Hunt.Add(new Hunt("Vlad", DateTime.Now, 142));
		Hunt.Add(new Hunt("Vlad", DateTime.Now, 143));

		Console.WriteLine("\nKosta scores:");

		List<Hunt> list = Hunt.ByHunterName("Kosta");
		foreach (Hunt hunt in list)
		{
			Console.WriteLine(hunt);
		}

		Console.WriteLine("\nJohn scores:");

		list = Hunt.ByHunterName("John");
		foreach (Hunt hunt in list)
		{
			Console.WriteLine(hunt);
		}

		Console.WriteLine("\nVlad scores:");

		list = Hunt.ByHunterName("Vlad");
		foreach (Hunt hunt in list)
		{
			Console.WriteLine(hunt);
		}
	}
}


In order to test the code in Visual Studio, create empty solution and add Index.cs and Hunt.cs with the content copied from above.

(no subject)

Date: 2007-01-27 02:05 am (UTC)
From: [identity profile] 109.livejournal.com
The only thing that is bothering me here is the explicit declaration of
private static string GetHunterName(Hunt hunt)
	{
		return hunt.hunterName;
	}
It should be possible to make it an inlined anonymous delegate right where it is used, but the syntax avoided me at the time of writing. The hint will be appreciated.

(no subject)

Date: 2007-01-27 02:34 am (UTC)
From: [identity profile] faceted-jacinth.livejournal.com
Map
[Error: Irreparable invalid markup ('<hunt,>') in entry. Owner must fix manually. Raw contents below.]

Map<Hunt, string>(delegate(Hunt hunt) { return hunt.hunterName; })

btw "Map" aint no good name for a delegate since it has strong connotations with data structure. I'd choose "Mapping" instead.

offtopic: меня что-то Invoke напрягает немножко, если мне нужно вызвать метод с кучей параметров, то приходится заводить в точности такой же делегат, с той же кучей параметров, чтобы потом написать ctrl.Invoke(new SomeMethodDelegate(SomeMethod), param1, param2), потому что у него параметр -- просто Delegate. Это как-то тупо, по-моему.

(no subject)

Date: 2007-01-27 02:50 am (UTC)
From: [identity profile] 109.livejournal.com
well, this was kind of intentional, to keep the same name (Map), which is rahther the name of the concept than anything else.

hopefully in 3.0 they will fix Invoke.

(no subject)

Date: 2007-01-27 04:21 am (UTC)
From: [identity profile] anonym-mouse.livejournal.com
Did you run any tests/benchmarks?
Say, I'm using an index in my application that
(test a) searches for any number of names among 14000
(test b) gets 14,000 from an index of 500,000

In one case I use a built-in indexing facility of MySQL; in another - a file utility that allows me to avoid using a database at all.

What would be characteristic times for your index (and please mention your CPU speed? (we'll suppose we have enough memory for the application and there is no content for disc access during the test)

(no subject)

Date: 2007-01-27 08:35 am (UTC)
From: [identity profile] 109.livejournal.com
No, I didn't -- I wrote the stuff a few hours ago. Please feel free to post the benchmarks. My prediction though will be that in-memory index will be faster as long as the collection fits in memory -- simply because memory access is much faster than disk access.

I am not sure though what do you mean by "file utility". The whole point is to keep data in memory and don't use disk access at all.

(no subject)

Date: 2007-01-27 04:43 am (UTC)
From: [identity profile] ivan-gandhi.livejournal.com
Man, it reminds me someone who said that he can implement all the Java/Ruby/Haskell in plain Assembler.

What I mean is: you map abstract stuff to a pretty concrete data structure; yes, it can be modeled in C#, so? You may be right to introduce a collection of "hunt facts", but it is pretty far from representing a function. Yes, in case we have a pile of facts, this is a solution. Will you actually use this solution? I am not sure.

(no subject)

Date: 2007-01-27 08:23 am (UTC)
From: [identity profile] 109.livejournal.com
I am not sure I understand what you mean. Yes, I mapped abstract stuff into something immediately useful. Yes, it is about a collection of facts. The way to represent a function is F(X) and is out of scope of this excercise.

(no subject)

Date: 2007-01-27 08:28 am (UTC)
From: [identity profile] 109.livejournal.com
Btw -- you introduced the collection of hunt facts, not me. I really, really don't understand your comment.

(no subject)

Date: 2007-01-27 10:51 pm (UTC)
From: [identity profile] ivan-gandhi.livejournal.com
Well, I see the problem; and I am not ready to offer a solution. The problem is the perception. I need time to figure out how to deal with it.

(no subject)

Date: 2007-01-27 11:11 pm (UTC)
From: [identity profile] 109.livejournal.com
I can try to help you. First two parts of your article are just a declarations like: there is such things as map and currying and stuff and this is how we express these concepts in java. I don't see any value in it [personally for me], unless I see how to employ these concepts to make something practical. The third part of your article was an attempt of a practical example. The result of the attempt, in my opinion, was rather poor for 3 reasons:

1. the code snippet is incomplete and cannot be even tested "as is",
2. it is functionally limited to key equality only, and
3. it is inefficient in terms of used memory, because it duplicates keys.

So my goal was to give people something that works off the bat (do you think it was a lot of fun writing BinarySearch? boring.), efficient and extensible (allow to plug-in more functionality). Have I succeeded or failed? That's the feedback I would like to get, not "it is writing haskell in assembler". I have extreme difficulties understanding how does it relate to my post and/or what made you say that.

(no subject)

Date: 2007-01-29 11:19 pm (UTC)
From: [identity profile] ivan-gandhi.livejournal.com
:) You know I love theory more than practice. I don't give a shit about "saving memory on keys"; and the fact that my example does not solve a big problem of binary search or something... big deal. It is about the vision. How we view things. And here I see a problem. Conveying the vision.

(no subject)

Date: 2007-01-29 11:30 pm (UTC)
From: [identity profile] 109.livejournal.com
right, but if the vision is not how to use abstract concepts to make something practical, then what is it? you know I have the utmost respect for you, it makes me so sad to be not able to understand your vision...

(no subject)

Date: 2007-01-29 11:18 pm (UTC)
From: [identity profile] ivan-gandhi.livejournal.com
Most probably, yes.

(no subject)

Date: 2007-01-28 04:49 am (UTC)
From: [identity profile] selfmade.livejournal.com
I didn't get it. What was the purpose to have 2 posts with just a link to this one?

(no subject)

Date: 2007-01-28 04:51 am (UTC)
From: [identity profile] selfmade.livejournal.com
Blya, I'm stupid. :)

(no subject)

Date: 2007-01-29 07:03 pm (UTC)
From: [identity profile] 109.livejournal.com
2? I have posted in 3 communities, you have obviously missed one.

(no subject)

Date: 2007-01-29 07:05 pm (UTC)
From: [identity profile] selfmade.livejournal.com
I definitely did. What was it about?

Profile

109: (Default)
109

March 2019

S M T W T F S
     12
3456789
101112131415 16
17181920212223
24252627282930
31      

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags