Wednesday 25 April 2012

A short solution.

My friend and roommate (and possibly future brother-in-law, heh.), who studies IT, was presented with the following problem/homework:

Given a string and a list of other strings, build a tree of all possible ways of assembling the given string from the strings in the list.

For example, given "tree" and ["tr" "e" "t" "re"], possible branches are ("tr" "e" "e"), ("t" "re" "e"), and so on. Each item from the list can be reused, so "tree" and ["tr" "e"] produces one valid result: ("tr" "e" "e"). A very simple final solution looks like this: (("tr" ("e" ("e")) ("ee"))) (for "tree" and ["tr" "ee" "e"]).

He had to do it in Java. Since I thought the problem was interesting, I decided to implement  a solution in Clojure.

Working in a language that builds trees out of nested lists, my choice of datatype was obvious: Nested lists (you probably didn't see that one coming). The first element of each list is the value of the node, whereas the rest of the elements are its children.

So, I though about what I needed.
First of all, a function that filtered the list of strings (lets call them "chunks" here), leaving only the chunks that are included at the start of the word. For example, given the same parameters as above, the function would return ("tr" "t"). Fairly simple to write:

(defn possible-nexts ;; Yes, slightly awkward name.
  "Given a word and a list of chunks, returns a list containing the chunks that are contained at the beginning of the word."
  [word chunks]
  (filter #(= (take (count %) word) (seq %)) chunks))

It's a fairly simple function, and I believe I don't need to explain how it works (and actually, it's largely irrelevant, as long as it produces the output we need).

Now, since we want to generate a tree, the generating function will obviously be recursive. The general idea for the algorithm is this:

1) Given a word and chunk list, obtain a list of possible nexts.
2) For each possible next chunk, recursively call this with the chunk cut off from the beginning of the word (so with "tree" and "tr" the call would get "ee") and the whole chunks list
3) If there are no possible nexts, terminate.

Simple enough.

(defn generate-tree
  "Generates a tree of all possible ways \"word\" can be assembled from the \"chunks\"."
  [word chunks]
  (when-let [nexts (seq (possible-nexts word chunks))]
    (map #(generate-tree (drop (count %) word) chunks) nexts))))

This was my original version. Again, a very simple function (though I managed to screw up, see below). when-let is fairly simple and does what it says on the tin, whereas seq coerces a collection into a seq, or nil if the collection is empty. This is great for at least two reasons:
1) You get uniform behaviour for all collection types.
2) Free emptiness-check!
Doing emptiness checks via seq is apparently idiomatic in Clojure. I found it rather confusing in the beginning, but now I like it.

Anyways, I have to admit I failed to store the value in each node. But that was quickly rectified with some minor changes...

(defn possible-nexts ;; Yes, slightly awkward name.
  "Given a word and a list of chunks, returns a list containing the chunks that are contained at the beginning of the word."
  [word chunks]
  (filter #(= (take (count %) word) (seq %)) chunks))

(declare generate-node)
(defn generate-tree
  "Generates a tree of all possible ways \"word\" can be assembled from the \"chunks\"."
  [word chunks]
  (when-let [nexts (seq (possible-nexts word chunks))]
    (map #(generate-node (drop (count %) word) chunks %) nexts))))

(defn generate-node
  "Generates a tree node."
  [word chunks chunk]
  (cons chunk (generate-tree word chunks)))

That's pretty much it. Less than ten lines total. Or so I (actually, we) thought. There turned out to be a few actual problems, and one aesthetic one.
For one, I didn't have a root node, per se, just a list of subtrees (one for each possible first chunk). I decided the root node shall hold the original word that is being assembled. This is easily solved; call (generate-node word chunks word) instead of (generate-tree word chunks). In light of the other problems, though, I decided to write one last function:

(defn to-tree
  "Nothing much, yet. But this shall be the entry point for the \"program\" shortly."
  [word chunks]
  (generate-node word chunks word))

The other problems are more serious: With certain combinations of chunks, there would be incomplete branches. For example, given "lol" and ["lo" "l"], one branch would be simply... ("l"). We don't want this garbage.
Other than that, if there were duplicate chunks, parts of the tree would be duplicated as well. So, we might want to filter the chunks list.

But I will solve both of those in the next posts, this is getting a bit long. You might have noticed that the "program" mostly consists of one-liners; my original, slightly less verbose (no function descriptions) was only 9 lines (2 whitespace, 3 function "header" and 4 lines of function body). The next post won't add more than ten lines (not counting function descriptions, again), either.

It also performs rather well, it was clearly faster than my friends Java version (though I suspect an adequately experienced Java programmer may write one faster than my nineliner. As might an adequately experienced Clojure programmer, I guess...).

Functional languages amaze me over and over again with how little I need to do to achieve what I want.

No comments: