Friday 10 April 2015

Num literals in Haskell

This post is a Literate Haskell file. You can save it as a .lhs file, and execute it.

In this post, I will screw around a bit with one particular fact of Haskell's syntax. So, let's first enable a language extension I will need (and explain) later, as well as knock out any relevant imports.

> {-# LANGUAGE ScopedTypeVariables #-}
>
> import Data.Function (on)

Now; when you enter a literal number, what would you expect it's type to be?

> literal = 0

It sort of looks like an integer, but it could actually also be a float with, coincidentally, nothing past the decimal point. So Haskell does the most reasonable thing possible -- 'literal :: Num a => a'. It doesn't decide what type of number it is yet. This opens up a couple of cool things.

> data Complex a = a :+ a
> type Complex' = Complex Double
>
> instance (Eq a, Floating a) => Num (Complex a) where
>   (r1 :+ i1) + (r2 :+ i2) = (r1 + r2)              :+ (i1 + i2)
>   (r1 :+ i1) * (r2 :+ i2) = (r1 * r2 - i1 * i2)    :+ (r1 * i2 + i1 * r2)
>   abs (r :+ i)            = (sqrt $ r * r + i * i) :+ 0
>   signum (0 :+ 0)         = 0 :+ 0
>   signum c@(r :+ i)       = let (mr :+ _) = abs c in (r / mr) :+ (i / mr)
>   negate (r :+ i)         = negate r :+ negate i
>   fromInteger             = (:+ 0) . fromInteger

This datatype of course corresponds to complex numbers, as defined in mathematics. Now, we can make literals be complex numbers without any futher work!

> literal2 = 5
>
> op1 :: Complex'
> op1 = literal2 + (2.3 :+ (-5.8))

'literal2 :: Complex''. That's pretty straightforward. Now let's have a bit of fun with this.

> data Lolteger = Zero | One | Two | Three | Four | Five
>               | Six | Seven | Eight | Nine
>               deriving (Bounded, Enum, Eq, Ord, Read, Show)

We can define a Num instance for this, pretty easily. For over/underflow on operations I'll just take the result modulo 10 (so Lolteger would form the ring Z_10, integers modulo 10). As we'll have a lot of converting to and from Enum, let's define helpers for this.

> apply1 :: (Int -> Int) -> Lolteger -> Lolteger
> apply1 f = toEnum . (`mod` limit) . f . fromEnum
>   where limit = 1 + fromEnum (maxBound :: Lolteger)
>
> apply2 :: (Int -> Int -> Int) -> Lolteger -> Lolteger -> Lolteger
> apply2 f a b = toEnum . (`mod` limit) $ f (fromEnum a) (fromEnum b)
>   where limit = 1 + fromEnum (maxBound :: Lolteger)

> instance Num Lolteger where
>   (+)         = apply2 (+)
>   (*)         = apply2 (*)
>   abs         = apply1 abs
>   signum      = apply1 signum
>   negate      = apply1 negate
>   fromInteger = toEnum . (`mod` limit) . fromInteger
>     where limit = 1 + fromEnum (maxBound :: Lolteger)

This is probably obvious now, but amuses me to no end:

> four :: Lolteger
> four = 4

=> Four

> addition = One + Five

=> Six

> youCanEvenMixIt = 3 `times` Two
>   where times = (*)

=> Six

In fact, anything that is Bounded and an Enum can be made to behave like a Num like this. So, let's make use of a common Haskell design pattern: The newtype with an instance.

> newtype NumWrapper a = NW { getNW :: a } deriving (Read, Show)

Now, we define an almost identical Num instance for this (the only difference is that a bit more wrapping and unwrapping is involved for the helpers).

> infixr 8 .:
> (.:) = (.) . (.)
>
> apply1' :: forall a. (Bounded a, Enum a) =>
>            (Int -> Int) -> NumWrapper a -> NumWrapper a
> apply1' f = NW . toEnum . (`mod` limit) . f . fromEnum . getNW
>   where limit = 1 + fromEnum (maxBound :: a)
>
> apply2' :: forall a. (Bounded a, Enum a) =>
>            (Int -> Int -> Int) -> NumWrapper a -> NumWrapper a -> NumWrapper a
> apply2' f = NW . toEnum . (`mod` limit) .: f `on` fromEnum . getNW
>   where limit = 1 + fromEnum (maxBound :: a)

There's two things here that might be hard to understand: The foralls in the type declarations, and (.:). So let's do them in order.

1) forall. Recall the beginning of this post, I enabled ScopedTypeVariables. That is part of a family of extensions that allow you to write foralls, and each of them does something slightly different. STV, which I used here, is the most simple of them. Notice that in the helper definition limit, I use a type signature on maxBound. Usually, maxBound :: a would mean that this maxBound can work for any 'a'. In fact, Haskell implicitly adds foralls to all type signatures per default. What I wanted to achieve, however, is that this 'a' is the same as in the type signature of the function - after all I'm looking for the maxBound of the Enum we're using! So now only the function is "for all 'a's", and the maxBound is for the specific 'a' we're dealing with. And that, in short, is what ScopedTypeVariables does.
2) (.:). This is actually pretty simple. 'f . g' composes two functions with one argument each. For 'f .: g', 'g' has two arguments, and 'f' one. How and why that works took me a while to figure out, but it's actually pretty simple (hint: Start with 'fmap . fmap', then apply the fact that for functions, 'fmap = (.)').

> instance (Bounded a, Enum a) => Num (NumWrapper a) where
>   (+)         = apply2' (+)
>   (*)         = apply2' (*)
>   abs         = apply1' abs
>   signum      = apply1' signum
>   negate      = apply1' negate
>   fromInteger = NW . toEnum . (`mod` limit) . fromInteger
>     where limit = 1 + fromEnum (maxBound :: a)

Anyways, here we define the pretty-much-identical Num instance for NumWrapper. Now any Bounded Enum can be treated as a Num.

> magic :: Bool
> magic = getNW literal

'literal' was defined at the very beginning of the file, as 'literal = 0'. After this, 'literal :: NumWrapper Bool'. Then, we remove the NumWrapper, and get False. In fact, amusingly enough, for all that work, we've created an alias for 'toEnum' for literals called 'getNW'.

This is all pretty trivial, I think, but working through it even once still can be of use, so there.

-- N

Sunday 10 February 2013

How I screwed myself over with side-effects

RETCON TIME. Screw part 2 of the previous article, it wouldn't introduce anything useful - all the insights were presented in part one. And just copying some code and annotating it with obviousness misses the point a bit. And I forgot what that other thing is I wanted to write. After all, two months or so passed (yay me for my update schedule, by the way).

 Aaaaaaaaanyways.

Currently I'm into Common Lisp (you might have noticed by now that I'm kinda sold on that whole "crapton of parentheses" concept), which differs from Clojure in that it is not strictly functional. While true, that's a bit of an understatement. There are other significant differences as well. It doesn't have the highly derisive attitude towards side effects (aka. actually changing something) that Clojure has.

This causes the Common Lisp standard to frequently define two similar functions, with the difference that one of them is destructive but more efficient. What that means is, that for example (nreverse is the destructive variant, reverse is the "pure", or "functional" one):

(defvar *list* '(1 2 3 4 5))
(reverse *list*)
=> '(5 4 3 2 1)
*list*
=> '(1 2 3 4 5)
(nreverse *list*)
=> '(5 4 3 2 1)
*list*
=> '(1)

The last result it implementation-dependant. Basically, functions that the standard acknowledges to be destructive are allowed to do anything with their input, including taking it apart with a chainsaw and/or sending it to the moon. The implementation I'm using (Clozure CL 1.8) has a very understandable side effect - Lisp lists are pretty much linked lists, so when reversing, the previously first cell '(1) is the last one and points to nothing, though technically the result of evaluating *list* after calling (nreverse *list*) might as well be '(its a trap!) Of course I'm exaggarating. This would actually be probably not allowed, due to unnecessarily creating new cells and/or symbols. At least I think it would be sensible for the standard to forbid this. The obvious reason for this split is efficiency - why create a full copy, when you can make the reverse (or anything else) in place?

 (defmacro internal-step~ (sym &rest syms)
  (if (null syms)
    sym
    `(,sym (internal-step~ ,@syms))))


(defmacro ~ (&rest syms)
  `(internal-step~ ,@(nreverse syms)))

(defmacro ? (&rest form)
  (let ((sym (gensym)))
    `(let ((,sym ,form))
       (format t "? :: ~a => ~a~%" (mapcar #'(lambda (x) (if (listp x) "..." x)) ',form) ,sym)
       ,sym)))


Now, those three macros. The first two form something roughly equivalent to Clojure's -> macro. ? is my favourite debug macro, which outputs the result in some way I want it to, and then returns it.

As you see, ~ uses nreverse to reverse the input list. In my code, I had this little bit:

(when (< 0 (~ s get-levels length)
   ...)

For some reason it told me that "# is not of expected type REAL." This meant, it was using only the value of "s" (which indeed was a CLOS object), discarding the two additional calls. When I manually expanded (~ s get-levels length) in-repl, however, the correct (length (get-levels s)) was generated.

Did I mention that the effect of destructively operating on literals is not defined by the standard? Good. Because if you've been following along, this is exactly what I did here - I took the data literal '(s get-levels length) and put it through nreverse. Which in my implementation apparently turned the literal into '(s) (due to the aforementioned sensible destructive reversing process). Which obviously evaluates to a CLOS object, not a number.

And all because I was used to nreverse (due to being destructive often not being a problem) and didn't even consider what damage the side-effect could cause. And I just wasted almost half an hour on removing one single letter (changing the call from nreverse to reverse fixed it of course).

But hey, blog post idea for free!

Tuesday 11 December 2012

Messing with probabilities (part 1: The math)

[Edit: This is no longer "part 1", but the whole article. See "How I screwed myself over with side effects".]

In the mini-project I was thinking of recently there are a lot of coin flips. They usually occur in groups of one to eight, and I only want to know the total amount of heads. Now, I could simply flip that digital coins that many times and be done with it, or I could be a bored college student who actually should be doing other things right now, and think of something more sophisticated.

For something so trivial, there must be a way to reduce the several random flips to ONE random number generation. It's just a matter of dividing the [0, 1]-space into the right intervals, to maintain the same probability for every outcome.

Having cursory knowledge of statistics, my thoughts immediately wandered to something called the binomial distribution. In short, a binomial distribution with parameters n and p - denoted as B(n, p) - describes the chance that x out of n trials with chance p to succeed, will succeed. For example, here is the probability distribution function f(x) (or, in this case, probability mass function, since it's discrete) for B(5, 60%).
This chart tells us that the chance that if we try something that has a 60% chance to succed five times, our chance to succeed twice is around 25%.

It's not hard to notice that getting heads on a coin flip is a random trial with a chance to succeed of 50%. Thus, B(n, 50%), where N is the amount of coins flipped, perfectly describes what I'm trying to achieve!

Calculating the PDF for B(n, p) is a fairly trivial matter, just translating a formula. Here's the Clojure:

(defn binomial [n k]
  (letfn [(fact [n] (reduce * 1 (range 2 (inc n))))]
    (/ (fact n) (fact (- n k)) (fact k))))

(defn binom-pdf [n p]
   (fn [x]
    (if (<= 0 x n)
      (* (Math/pow (- 1 p) (- n x)) (Math/pow p x) (binomial n x))
      0)))

So, now we have the PDF for B(n, p). There's another, related function, called the cumulative distribution function. It's a function F(x) that describes the chance that the result will be less than x. Seems like a weird thing to devote an entire function to, but it's surprisingly useful in practice. The CDF for B(5, 60%) looks like this:

And, the PDF again:
The CDF is simply the sum of all the PDF entries for values less than x. For example, in this case, CDF(3) = PDF(0) + PDF(1) + PDF(2) + PDF(3). This is why the CDF is always rising, and why the final value is exactly equal 1 (= 100%).

Using the cumulative distribution function, we can now map one single random number to a final result. Just for reference, the values of this CDF are roughly:
CDF(0) = 0.01
CDF(1) = 0.087
CDF(2) = 0.317
CDF(3) = 0.663
CDF(4) = 0.922
CDF(5) = 1

Thus, we have a set of intervals, that each corresponds to a result:

[0, 0.01) => 0
[0.01, 0.087) => 1
[0.087, 0.317) => 2
[0.317, 0.663) => 3
[0.663, 0.922) => 4
[0.922, 1] => 5

Which solves the original problem: We divided [0, 1] into intervals that maintain the same weight for every possible outcome. Now, we can just generate one random number, pick the interval it belongs to, and we have our final result.

And how to more-or-less elegantly do that in code, I'll show in the next post (actually, the post AFTER the next one - there's another thing I want to interject first).

Monday 6 August 2012

Macros I've been using

Here are a few macros I've been using.

The epic trio of Debug macros:

 

1. The regular, simple debug:

(defmacro ?? [& form]
  `(let [result# ~form]
     (println "DEBUG:" '~form "=>" result#)
     result#))

This really is the most simple version there is. If you want to know what a particular expression you cannot for some reason simply extract returns, add ?? as the first element of that expression.

(+ 2 3 (* 4 5) (?? - 5 3))
DEBUG: (- 5 3) => 2
=> 27

2. The regular, abridged debug:

(defmacro ? [& form]
  `(let [result# ~form]
     (println "DEBUG:" '~(map #(if (coll? %)
                                 "..."
                                 (str %)) form) "=>" result#)
     result#))

This one will replace sub-forms in the form with "...". This is useful if you don't want your output too cluttered:

(? + 2 3 (* 4 5) (- 5 3))
DEBUG: (+ 2 3 ... ...) => 27
=> 27

3. The deep inspect:

(defmacro ??? [& form]
  `(? ~@(map #(if (coll? %)
                `(??? ~@%)
                %) form)))

This one places an abridged debug "?" in every form in the tree in which it is placed. For example:

(??? + 2 3 (* 4 5) (- 5 3))
  is equivalent to
(? + 2 3 (? * 4 5) (? - 5 3))

Thus giving:

(??? + 2 3 (* 4 5) (- 5 3))
DEBUG: (* 4 5) => 20
DEBUG: (- 5 3) => 2
DEBUG: (+ 2 3 ... ...) => 27
=> 27

So, recapping, add one to three question marks, the more you put, the more info you get.

It would be, of course, trivial to implement ????, which would be equivalent to a tree full of ??'s, but what's the point? Since ??? is intended to be used on trees, not shallow expressions, the verbosity of ?? on every level would be thorough overkill.

Multi-set!:

 

(defmacro multi-set! [target & pairs]
  (let [x (gensym)]
    `(let [~x ~target]
       ~@(for [[field value] pairs]
           `(set! (. ~x ~field) ~value)))
       ~x)))

Due to working with java.awt.GridBagConstraints a lot recently, I got annoyed with how often I have to rewrite the line (.set! (.[some field] c) [some value]). So, being lazy as hell, I'd much rather write

(multi-set! [instance-expr]
  ([field] [value])
  ([field] [value])
  ([field] [value])
  ...)

In fact, I wrote a macro called "constraints" for working with GridBagConstraints, but it's a tad too long to share here. Also, it's boring.

This macro actually uses a fun trick, aka displays a subtle problem with the auto-gensym feature (symbols with # appended are replaced with a unique name, in order to avoid name clashes).

If you were to write...

(defmacro multi-set-wrong! [target & pairs]
  (let [x# ~target]
    ~@(for [[field value] pairs]
        `(set! (. x# ~field) ~value))
    x#))

In this case, the auto-gensymmed symbols x# in the let-form and inside the set!-form are DIFFERENT symbols, because they are within DIFFERENT quotes. So, this version doesn't work as intended. In fact, the generated (set! ...) expressions will throw an exception, because the symbol x# in them could not be found.

Thus, in the actually working version, we create a symbol by hand, and use that symbol throughout the macro. This is a trick I learned from reading the source of the doto macro (I had a hunch I'd find something useful there, because the macro works similarly to mine - I was right!).

Convenient Map Writing Macros: 


More than once, I've written macros that simply implement a more convenient syntax for maps. For example, since I'm learning japanese right now, I wrote a program to quiz me on Hiragana. For that, I had to store all corresponding Unicode characters in the program, and map them to the Romaji (latin) transcriptions.

(defmacro defseries [group & chars]
  `(def ~group
     ~(into [] (map (fn [[name code]]
                      (let [code (+ code 0x3000)]
                         {:name name, :code code}))
                     chars))))

While this wasn't the perfect approach in this case (here, I still had to type char => pairs, just with smaller values and less overhead), but it still vastly reduced the typing I had to do. I do things like that quite frequently.

(defseries a ("a" 0x42) ("i" 0x44) ("u" 0x46) ("e" 0x48) ("o" 0x4A))

I typed out roughly 15-16 series like that. So I did save me some typing with just a tad of thinking.

That's it for now.

N.

Tuesday 17 July 2012

Dynamic Bindings

Time for another post... Nothing fancy though, just something simple I wanted to share. Also, yes. Clojure.

Recently, I've been doing something that involved some statistics (...I've been trying various formulae for calculating combat damage. Eh.), as well as some randomness (dice rolls, I admit it. -.-").

More clearly: The formulae include randomness as part of their calculations. I, when making tables and lists of results, do not want any randomness. Instead of modifying the function definition, or adding some hacky additional parameter, the entire thing can be solved in a rather elegant manner via dynamic bindings.

Dynamic bindings are a mechanism that allow you to... aw, shucks. I can't express it in one sentence, at least not clearly. Consider this example instead:

(def foo 2)

(defn bar []
  (let [foo 4]
    (baz)))

(defn baz []
  (println foo))

Rather obviously, the output is 2. After all, the redefinition of "foo" is contained to the lexical context of the "let" within bar. Now consider this:

(def ^:dynamic *foo* 2)

(defn bar []
  (binding [*foo* 4]
    (baz)))

(defn baz []
  (println *foo*))

This will print 4. At first I thought this kinda goes against the idea of pureness, otherwise so prevalent in Clojure. But as with everything, the only important thing is how you use it. As I've said, there are things that can be solved in a more elegant way thanks to this tool.

Consider something as mundane as I/O redirection. Clojure makes, of course, use of streams, and all I/O operations use, per default, the standard input and output streams. Now, sometimes we want to use different streams... a naive way would be then to simply add optional arguments to the I/O functions and be done with it. Except... it's ugly. Really. Wouldn't it be more awesome if we could just say "Those operations? Do them on this stream instead, kthx."? With dynamic bindings, we can do exactly that. clojure.core defines the dynamically bindable var *out*, the value of which is the currently used stream. So, we might do this...

(binding [*out* (some-file-or-whatever)]
  (println ":D:D:D")

In the scope of this binding, all I/O operations that are aware of *out* will act on the new stream instead. Pretty cool, eh?

Anyways, dice, randomness. You might already see what I did there. And either way, I don't think it requires much more explanation...

(def ^:dynamic *dice-mode* :random)

(defn roll-dice [[amount sides modifier]]
  (case *dice-mode*
    :random (reduce + (or modifier 0) (for [i (range amount)] (inc (rand-int sides))))
    :average (int (Math/round (+ (* amount (/ sides 2)) modifier)))
    :zero 0))

Yes. In the part where I do the statistics, I can now either take the average of dice rolls, or simply completely omit them, via a simple binding.

You might be wondering about why I'm using asterisks. It's not per se required to use them. It's just, you are supposed to name dynamic variables this way. Clojure will give you a stern talking to if you use a dynamically bindable var and do NOT assign it a name surrounded with asterisks (or when you use such a name with regular vars). I guess the reason is readability? I'm not sure though. I don't mind either way, so I'm not likely to look it up. You can, though.

N.

Thursday 14 June 2012

VK_VERYLONGKEYNAME (on reflection in Clojure)

So, I'm still writing a game. As it happens with games, I rely on keypresses for the user to tell me what he wants to do. That made me fall back on Java's KeyListener/KeyAdapter facilities.

The basic idea is that I get callbacks with information about the pressed key as argument (WARNING: GROSS OVERSIMPLIFICATION.), among other things the keycode, which is an integer. Now, I'm supposed to dispatch it using constants defined in KeyListener/KeyAdapter/KeyEvent/probably some other places.

Problem?

It's a lot of typing in Clojure. Especially since each constant is called VK_(something). I had large maps looking like that:

{ KeyEvent/VK_UP [some function]
  KeyEvent/VK_DOWN [some function]
  KeyEvent/VK_RIGHT [some function]
... }

You get the idea. Since I have several keymaps, that gets annoying quickly. So I started to define aliases for some frequently used ones in a seperate "keys" namespace. Now I only had to write, for example, "keys/up" instead of "KeyAdapter/VK_UP". That's a lot of saved space (and some gained readability).

But that introduced something else that annoyed me: Some of the more rare keys were still written out fully, and it created an aesthetic discrepancy, which upset my minor OCD. Besides, having to redefine tons of variables by hand isn't particularly elegant in itself.

(Disclaimer: I am aware of clojure.contrib/import-static. But I don't rely on that library enough to warrant including it in my project just for this.)

Sure enough, there is a better way.
The first nugget of information I needed was... how to get a list of all fields in a Java class?

Of course, Stack Overflow helped me there. It's

(clojure.reflect/reflect java.awt.event.KeyEvent)

It spews out a LOT of things like this:

{:bases #{java.awt.event.InputEvent},
 :flags #{:public},
 :members
 #{{:name VK_F17,
    :type int,
    :declaring-class java.awt.event.KeyEvent,
    :flags #{:static :public :final}}
   {:name VK_DEAD_DOUBLEACUTE,
    :type int,
    :declaring-class java.awt.event.KeyEvent,
    :flags #{:static :public :final}}
...}

This is the abridged pretty-print output of the stuff. We know all the VK_ fields are keycodes. Everything else falls in place itself, basically. Most of what I need I covered in a previous post already.

We want to:
1. From the list of members, filter out everything that is not a keycode.
2. Define the aliases over it.

Simple enough.

(->> (clojure.reflect/reflect java.awt.event.KeyEvent)
     (:members) (map :name) (filter #(.startsWith (str %) "VK_")))

This yields us a list of symbols (which I am grateful for, as it simplifies the next step) of all fields that contain keycodes. Now, as in my earlier post, we doseq over the list and define the aliases via eval in the loop. I also wanted the VK_ dropped and the entire name downcased, but that's just my whim.

(doseq [sym (->> (clojure.reflect/reflect java.awt.event.KeyEvent)
            (:members) (map :name) (filter #(.startsWith (str %) "VK_")))]
  (eval `(def ~(symbol (.toLowerCase (drop 3 (str sym))))
           (. java.awt.event.KeyEvent ~sym))))

That's it. Now, there are vars like "a", "up" and so on that hold the corresponding keycodes. I put the entire thing into a new namespace (game.keys) that I require like that:

(ns game.some-module
  (:require [game.keys :as keys])

Another nonproblem solved! :D

Wednesday 13 June 2012

A Simple Solution; Part Two

A Short Solution, Part 2:

In the previous part, I laid out the initial draft of the solution to the problem described there, as well as, in the end, pointed out a few problems with said solution.
In addition, you might have noticed that the definition of generate-node was superfluous; in fact, I factored it out. The (cons chunk...) was added directly inside generate-tree.

Anyways. First we shall deal with dealing with duplicate entries in the chunk list and there not being a value in the root node.

(defn to-tree
  "Generates a solution tree for the given word and chunk list."
  [word chunks]
  (cons word (generate-tree word (keys (group-by identity chunks)))))
 
This function will be the one called in the end - it's not done yet, but both the mentioned problems are solved. The value of the root node is now the word being assembled, and
the chunk list is filtered of duplicate entries (I am not sure whether there is some other, more efficient method, this was what I thought of off the top of my head when writing).

Now, for the harder part. Filtering invalid entries out of the tree. Any branch that ends without an entry is invalid - but guessing which ones are invalid before everything is
generated would be impossible (I think?), or at least extremely hard to implement. So I went for the simpler, more functional approach: Filter out invalid entries after everything
is generated. Since the user will never see the unfiltered step inbetween, we can mark invalid branches in some way instead of writing a function that assembles a branch in order
to check it for validity.

So, first we need to modify our tree generation function:

(defn generate-tree
  [word chunks]
  (if-let [nexts (seq (possible-nexts word chunks))]
    (map #(cons % (generate-tree (drop (count %) word) chunks)) nexts)
    (when (seq word) '((:invalid)))))

(This also contains the correction with generate-node.)

Now, when there are no possible nexts, and there still remains some part of the word to be assembled (note again how (seq ...) is used as emptiness check), we add a node with the
value of :invalid. So now we have all invalid branches marked.

Then, the filtering...
First, we need a validity predicate, let's call it "valid?" (how imaginative).
First, let's consider which branches are technically invalid.
Given:

"valid?" ["va" "li" "id" "l" "d" "d?"]

We have:

va \ - l - id - :invalid
   \ - li \ - d - :invalid
            - d?

From the viewpoint of the "va" node, the "l" branch is invalid, and the "li" branch is valid. That is because all the children of the "l" branch are invalid, but one child of the
"li" branch is valid. Seeing that, we can conclude that nodes are invalid if:

1) It has at least one child and all children are invalid
  or
2) The own value is :invalid

Let's code it:

(defn valid?
  "Returns whether a node in the tree is valid."
  [[value & children]]
  (cond (= value :invalid) false
        (not children)     true
        :else (reduce #(or %1 %2) false (map valid? children))))

Fairly simple, is it not? The check is, obviously, recursive. I suppose that if we could traverse the tree from the lowermost nodes instead from the root, the algorithm would be faster
(we could simply hack off invalid branches without having to check the entire depth if a branch is invalid), but our tree nodes are not aware of their parents, nor do we have a list of
branch tips.

So, given this check, we can now filter.

(defn filter-valid
  "Filters invalid branches from the tree."
  [[value & children]]
  (cons value (map filter-valid (filter valid? children))))

Again pretty much one single line.
Now, add this to to-tree...

(defn to-tree
  "Generates a solution tree for the given word and chunk list."
  [word chunks]
  (filter-valid (cons word (generate-tree word (keys (group-by identity chunks))))))

DONE. Simply call to-tree with the word and the chunks and it spews out the final result. You can run it through clojure.pprint/pprint to make it readable, but that only aesthetics. ;P

Final program:

(defn possible-nexts
  "Given a word and a list of chunks, returns a list containing the chunks that are contained at the beginning of the word."
  [word chunks]
  (filter #(= (take (count %) word) (seq %)) chunks))
(defn generate-tree
  [word chunks]
  (if-let [nexts (seq (possible-nexts word chunks))]
    (map #(cons % (generate-tree (drop (count %) word) chunks)) nexts)
    (when (seq word) '((:invalid)))))

(defn valid?
  "Returns whether a node in the tree is valid."
  [[value & children]]
  (cond (= value :invalid) false
        (not children)     true
        :else (reduce #(or %1 %2) false (map valid? children))))

(defn filter-valid
  "Filters invalid branches from the tree."
  [[value & children]]
  (cons value (map filter-valid (filter valid? children))))

(defn to-tree
  "Generates a solution tree for the given word and chunk list."
  [word chunks]
  (filter-valid (cons word (generate-tree word (keys (group-by identity chunks))))))

So, that's it. FIVE function ranging from one to three lines of executed code each.