Wednesday 13 June 2012

A Simple Solution; Part Two

A Short Solution, Part 2:

In the previous part, I laid out the initial draft of the solution to the problem described there, as well as, in the end, pointed out a few problems with said solution.
In addition, you might have noticed that the definition of generate-node was superfluous; in fact, I factored it out. The (cons chunk...) was added directly inside generate-tree.

Anyways. First we shall deal with dealing with duplicate entries in the chunk list and there not being a value in the root node.

(defn to-tree
  "Generates a solution tree for the given word and chunk list."
  [word chunks]
  (cons word (generate-tree word (keys (group-by identity chunks)))))
 
This function will be the one called in the end - it's not done yet, but both the mentioned problems are solved. The value of the root node is now the word being assembled, and
the chunk list is filtered of duplicate entries (I am not sure whether there is some other, more efficient method, this was what I thought of off the top of my head when writing).

Now, for the harder part. Filtering invalid entries out of the tree. Any branch that ends without an entry is invalid - but guessing which ones are invalid before everything is
generated would be impossible (I think?), or at least extremely hard to implement. So I went for the simpler, more functional approach: Filter out invalid entries after everything
is generated. Since the user will never see the unfiltered step inbetween, we can mark invalid branches in some way instead of writing a function that assembles a branch in order
to check it for validity.

So, first we need to modify our tree generation function:

(defn generate-tree
  [word chunks]
  (if-let [nexts (seq (possible-nexts word chunks))]
    (map #(cons % (generate-tree (drop (count %) word) chunks)) nexts)
    (when (seq word) '((:invalid)))))

(This also contains the correction with generate-node.)

Now, when there are no possible nexts, and there still remains some part of the word to be assembled (note again how (seq ...) is used as emptiness check), we add a node with the
value of :invalid. So now we have all invalid branches marked.

Then, the filtering...
First, we need a validity predicate, let's call it "valid?" (how imaginative).
First, let's consider which branches are technically invalid.
Given:

"valid?" ["va" "li" "id" "l" "d" "d?"]

We have:

va \ - l - id - :invalid
   \ - li \ - d - :invalid
            - d?

From the viewpoint of the "va" node, the "l" branch is invalid, and the "li" branch is valid. That is because all the children of the "l" branch are invalid, but one child of the
"li" branch is valid. Seeing that, we can conclude that nodes are invalid if:

1) It has at least one child and all children are invalid
  or
2) The own value is :invalid

Let's code it:

(defn valid?
  "Returns whether a node in the tree is valid."
  [[value & children]]
  (cond (= value :invalid) false
        (not children)     true
        :else (reduce #(or %1 %2) false (map valid? children))))

Fairly simple, is it not? The check is, obviously, recursive. I suppose that if we could traverse the tree from the lowermost nodes instead from the root, the algorithm would be faster
(we could simply hack off invalid branches without having to check the entire depth if a branch is invalid), but our tree nodes are not aware of their parents, nor do we have a list of
branch tips.

So, given this check, we can now filter.

(defn filter-valid
  "Filters invalid branches from the tree."
  [[value & children]]
  (cons value (map filter-valid (filter valid? children))))

Again pretty much one single line.
Now, add this to to-tree...

(defn to-tree
  "Generates a solution tree for the given word and chunk list."
  [word chunks]
  (filter-valid (cons word (generate-tree word (keys (group-by identity chunks))))))

DONE. Simply call to-tree with the word and the chunks and it spews out the final result. You can run it through clojure.pprint/pprint to make it readable, but that only aesthetics. ;P

Final program:

(defn possible-nexts
  "Given a word and a list of chunks, returns a list containing the chunks that are contained at the beginning of the word."
  [word chunks]
  (filter #(= (take (count %) word) (seq %)) chunks))
(defn generate-tree
  [word chunks]
  (if-let [nexts (seq (possible-nexts word chunks))]
    (map #(cons % (generate-tree (drop (count %) word) chunks)) nexts)
    (when (seq word) '((:invalid)))))

(defn valid?
  "Returns whether a node in the tree is valid."
  [[value & children]]
  (cond (= value :invalid) false
        (not children)     true
        :else (reduce #(or %1 %2) false (map valid? children))))

(defn filter-valid
  "Filters invalid branches from the tree."
  [[value & children]]
  (cons value (map filter-valid (filter valid? children))))

(defn to-tree
  "Generates a solution tree for the given word and chunk list."
  [word chunks]
  (filter-valid (cons word (generate-tree word (keys (group-by identity chunks))))))

So, that's it. FIVE function ranging from one to three lines of executed code each.

3 comments:

MageKing17 said...

Ported to Python3, for kicks: http://pastebin.com/RuwD0Qth

Actually, when I say "ported", I mean, "I wrote a Python script that appears to do the same thing as your Clojure code, but it's hard to be sure since I don't know Clojure," but whatever.

Narvius said...

:D
You made me realize two mistakes I made during the transcription of my code onto the blog, namely:
1) Appending '(:invalid) instead of '((invalid)). As a result, the program tried to destructure the keyword later, which throws an exception.
2) I had the pointless (map filter-valid? (map valid? ...)) instead of the correct (map filter-valid? (filter valid? ...)

MageKing17 said...

Aha! I thought I'd misunderstood what the Clojure code was trying to do; my initial attempt indeed ran into those two problems. :P