Wednesday, January 26, 2011

Tries

I learned a little about tries, or prefix tries, this month. I checked out the Wikipedia article, a Top Coder article, and saw a forum post about Patricia tries. There is a basic C# implementation of a trie at my Github account. In that case, each node has a lookup array for 26 letters that leads to a subtrees or leaf. Adding to the tree involves traversing each node's array for each character in a word until the last letter is reached. A flag in the node marks the end of a word. When a string needs testing as to whether it is in the trie, the test can possibly stop after only traversing a small number of the leading characters of the word. However, keeping 26 references on each node could use up a lot of space, depending on the data. There is a lot more for me to learn about the subject, such as how it relates to edit distance and spell-checking.

However, I'm currently checking out hash tables as an alternative. I also read some people use suffix arrays as another choice.

Thursday, January 20, 2011