Saturday, April 13, 2013

Code Jam 2013

Having recently finished my hobby project (more on that later ... probably), I was looking for something else to start working on. Coincidentally, the Code Jam was starting, so my choice was basically made for me. I had a go at the competition back in 2010, mainly to keep my knowledge of Python up to date, but for the next 2 years, it was scheduled in times that meant I couldn't dedicate enough time for it, but I had no such excuses this year!

Given my recent work in TypeScript and Node.js, I figured that would be an ideal choice as a platform, but I left it a bit late and didn't even start on the template code that I could use for all the problems until a few hours after the competition started. In the end, I managed to finish full solutions for 2 of the questions, a partial solution for 1, and an important learning experience for the last question. I'll quickly go through my experiences and some things I learned.

First - the template. In case anyone is unfamiliar with the competition, Code Jam is a set of coding questions that are presented as a set of textual inputs, resulting in a set of definite results. You are given an explanation of the problem, a small sample set of inputs, and the expected outputs for those inputs. After you've confirmed that your solution works for that sample set, you can download a small set of other inputs, and upload your solutions for them, and after that you get download a larger set of inputs and upload your solution for those. The input files are usually just a set of numbers or characters, one set for each test case, and the output format is standard, so it is very helpful to have some "framework" code put together to handle the input and output of the problems, so that you can quickly jump in and write the code specific to the problem without needing to reinvent the wheel every time. My template code is fairly boilerplate, so I'll mainly leave it to the code to tell the story, but I will mention one unexpected benefit of the choice to use Node.js - asynchronicity. When I was using Python, I was running tests one at a time, but using the lovely async package I was able to very easily tell node to run as many tests as possible at the same time. The basic idea of my template is that I read in the input file, convert it to an array of test cases, then instruct async to "map" that array through a solver function to a results array, and output those results when all the processing has been finished. I'm not really even sure if this ended up being necessary, but it still made me fall in love node just a bit more.

The first problem - tic-tac-toe-tomek - is understandably fairly easy to solve. The idea is that we have to have a look at the state of a game similar to tic-tac-toe, and say if someone won, if it was a draw, or if the game was not yet finished. The differences between the game and tic-tac-toe are that it's played on a 4x4 board, before players start placing Xs and Os a single T can be randomly placed on the board, and to win, a row or column or diagonal must contain all Xs and Ts, or all Os and Ts. Given that the board never got larger than 4x4, I chose to do this in a brute force-ish way and just iterate through all the rows, columns, and diagonals for each test case and check for the existence of dots, Xs, and Os (ignoring Ts completely). A dot meant that there was no winner in that row/column/diagonal, otherwise no Xs meant that O won and no Os meant that X won. My solution only took a few seconds to churn through the large dataset, so I was off to a good start.

The second problem - lawnmower - was a bit trickier, I was given a set of heights of square sections of grass, and needed to figure out if those heights could be created by the usage of a lawnmower with an adjustable trim length that must move across or up and down the whole yard. Thinking about this for a little while, and given that the set of heights was constrained to a maximum of 100 different lengths, I figured that this could be efficiently calculated by working through the heights one at a time from shortest to tallest. If we find any rows or columns that are completely made of up height 1, we can mark that row or column as "safe". Moving up the lengths, we can treat a row or column as safe if it either contains that height, or the exceptional row or column is also "safe". Figuring out the data model for this was another interesting challenge, but I think that the code explains that better than words could. I'm not exactly happy with all of the code, but it was getting late and it solved the large dataset in milliseconds so I called it a day.

The third problem - fair-and-square - was one that I didn't get a chance to complete fully. The problem involved finding all numbers between two limits that satisfied some constraints. The number must be a palindrome (same forwards and backwards), and it must be the square of another number that is a palindrome. I decided to take the easy way out for this one and I picked a solution that I knew would work for the first large dataset (where the largest possible limit was 10^14) but would definitely not work for the second large dataset (largest limit of 10^100). The easy way out was to just pre-calculate all possible numbers that satisfied the constraints, then iterate through that list for each test case, considering only the numbers that are between each test cases's limits. Since the number must be a square, we only need to go through each number up to the square root of the largest limit, which in the case of 10^14 is "only" 10^7, so as long as we can check 10 million numbers fairly efficiently, we are good to go. In the end this only took about a second and I'll leave it to the code to explain the details. When I had a look at the set of valid fair-and-square numbers, there was a pretty obvious pattern that I'm sure I could have extrapolated to the larger case if I tried hard enough, but I never quite got around to it.

The fourth and final problem - treasure - taught me a lesson that I had strangely enough been reminded of a few days ago - "your strongest intuitions are the ones that you should question first". The problem was that, given some keys of various types, and a set of treasure chests (along with information about the type of key that is needed to open the chest, and the keys that are inside the chest), we needed to work out if it was possible to open all the chests. I realized that this could be modeled as a graph traversal problem, with the nodes being the set of keys in hand and chests left unopened, and the edges being the act of using a key to open a chest. Since I was having a love affair with node's asynchronicity, my intuition told me that I could us an inefficient solution, as long as I paralellized the investigation of each set of edges. This, of course, turned out to be completely incorrect, but I only realized this after spending way too long trying to get my heard around how to use the async library, which often happens whenever I do anything tricky and try to do it aysnchronously. When I realized that my solution wasn't going to work, and I only had about an hour before the competition was over, I decided to throw in the towel. I still thought it was a pretty fun problem to work on, and I look forward to being able to see the solution and whatever the trick was to minimizing the number of graph traversals that needed to be made. I'm guessing there is some sort of smart look-ahead shortcut that can be used but it's still a bit of a mystery to me.

And that's it - I ended up placing at about 3500 out of 20,000 entrants and qualifying for the next round. Not a great result, but not too bad considering the fact that I went into it "blind", without a framework set up and without knowing too much about the libraries I would be using.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.