Wednesday, February 16, 2011

Programming Contest 43

I submitted some code for a programming contest at a C/C++/C# blog. The first-place entry was more than an order of magnitude faster and had a higher score, but at least I learned something about software development.

The contest involved searching for words in a grid of letters, as in the game of Boggle. Once a word in the given dictionary file was found, information about it needed to be stored in a output file. This information included things like the word's first letter coordinate in the grid, the line number of the word in the file, the score, and the direction to read the letters. The direction was given as a series of numbers starting with 1 as North, 3 as East, 4 as SouthEast, etc. A word would still be valid even if it needed to be read right to left instead of left to right. The dictionary was limited to about 14,084 words that contained between 2 and 5 letters. This letter count limitation simplified things, as the game board itself was 5x5. So words would always line up in a horizontal, vertical, or diagonal manner. No bent lines needed to be figured. Scoring was kept relatively straightforward as a count of the number of letters in the dictionary word.

The letter Q presented a complication. In the Boggle game, the Q die always has a U. So even though a given series of letters might just be QA, In the contest, QUA needed to be logged since it was in the dictionary of valid words, and the U included in the score, for a total of 3. One more case had to be considered involving Q. If a U die followed the Q, then the pair was accepted as QU. In other words, a U was sometimes ignored.

Unfortunately for my entry's ranking, my code didn't report a case where a valid word had a Q that wasn't followed by a U; so it didn't get to tie with the higher ranked entries based on score. During development, I believed this wasn't a valid case, but after seeing the results, I believe it was, although I haven't gone through the output of the other entries to verify this. I posted a comment about this problem, but sometimes it is difficult to get feedback on every software requirement, and one has to make reasonable assumptions.

The input and output involved text files. No graphical programming was needed for this code. The input file consisted of 100 lines of 25 character strings that represented 100 grids of letters.

Optimization was very important because in the event that applications reported the same score for their grids, the faster solution was ranked higher. Multithreading was used by some entries since each of the 100 grids could be scored as a game independently of the other grids. In one case, the OpenMP api was used. The use of this api looked interesting because the syntax needed to parallelize loops was minimal. However, parallel programming can still be prone to bugs, and OpenMP apparently leaves things like race conditions up to the developer to design out. Also, I'm not sure if there is a profiling tool that would work properly when parallelism is used. The only thing I've found on Google is ompP. I haven't heard of it before, but then OpenMP is new to me as well. It is built into Visual C++ 2010 Professional, but isn't in the express edition. It needs compiler support to work. I may need to use gnu c++ to try it out sometime but am reluctant to leave C#.

I placed the code in a Github account for later. Perhaps I will be able to incorporate some of the ideas from the other entries into later revisions and see if the application runs faster.