Wednesday, 25 April 2012

A short solution.

My friend and roommate (and possibly future brother-in-law, heh.), who studies IT, was presented with the following problem/homework:

Given a string and a list of other strings, build a tree of all possible ways of assembling the given string from the strings in the list.

For example, given "tree" and ["tr" "e" "t" "re"], possible branches are ("tr" "e" "e"), ("t" "re" "e"), and so on. Each item from the list can be reused, so "tree" and ["tr" "e"] produces one valid result: ("tr" "e" "e"). A very simple final solution looks like this: (("tr" ("e" ("e")) ("ee"))) (for "tree" and ["tr" "ee" "e"]).

He had to do it in Java. Since I thought the problem was interesting, I decided to implement  a solution in Clojure.

Working in a language that builds trees out of nested lists, my choice of datatype was obvious: Nested lists (you probably didn't see that one coming). The first element of each list is the value of the node, whereas the rest of the elements are its children.

So, I though about what I needed.
First of all, a function that filtered the list of strings (lets call them "chunks" here), leaving only the chunks that are included at the start of the word. For example, given the same parameters as above, the function would return ("tr" "t"). Fairly simple to write:

(defn possible-nexts ;; Yes, slightly awkward name.
  "Given a word and a list of chunks, returns a list containing the chunks that are contained at the beginning of the word."
  [word chunks]
  (filter #(= (take (count %) word) (seq %)) chunks))

It's a fairly simple function, and I believe I don't need to explain how it works (and actually, it's largely irrelevant, as long as it produces the output we need).

Now, since we want to generate a tree, the generating function will obviously be recursive. The general idea for the algorithm is this:

1) Given a word and chunk list, obtain a list of possible nexts.
2) For each possible next chunk, recursively call this with the chunk cut off from the beginning of the word (so with "tree" and "tr" the call would get "ee") and the whole chunks list
3) If there are no possible nexts, terminate.

Simple enough.

(defn generate-tree
  "Generates a tree of all possible ways \"word\" can be assembled from the \"chunks\"."
  [word chunks]
  (when-let [nexts (seq (possible-nexts word chunks))]
    (map #(generate-tree (drop (count %) word) chunks) nexts))))

This was my original version. Again, a very simple function (though I managed to screw up, see below). when-let is fairly simple and does what it says on the tin, whereas seq coerces a collection into a seq, or nil if the collection is empty. This is great for at least two reasons:
1) You get uniform behaviour for all collection types.
2) Free emptiness-check!
Doing emptiness checks via seq is apparently idiomatic in Clojure. I found it rather confusing in the beginning, but now I like it.

Anyways, I have to admit I failed to store the value in each node. But that was quickly rectified with some minor changes...

(defn possible-nexts ;; Yes, slightly awkward name.
  "Given a word and a list of chunks, returns a list containing the chunks that are contained at the beginning of the word."
  [word chunks]
  (filter #(= (take (count %) word) (seq %)) chunks))

(declare generate-node)
(defn generate-tree
  "Generates a tree of all possible ways \"word\" can be assembled from the \"chunks\"."
  [word chunks]
  (when-let [nexts (seq (possible-nexts word chunks))]
    (map #(generate-node (drop (count %) word) chunks %) nexts))))

(defn generate-node
  "Generates a tree node."
  [word chunks chunk]
  (cons chunk (generate-tree word chunks)))

That's pretty much it. Less than ten lines total. Or so I (actually, we) thought. There turned out to be a few actual problems, and one aesthetic one.
For one, I didn't have a root node, per se, just a list of subtrees (one for each possible first chunk). I decided the root node shall hold the original word that is being assembled. This is easily solved; call (generate-node word chunks word) instead of (generate-tree word chunks). In light of the other problems, though, I decided to write one last function:

(defn to-tree
  "Nothing much, yet. But this shall be the entry point for the \"program\" shortly."
  [word chunks]
  (generate-node word chunks word))

The other problems are more serious: With certain combinations of chunks, there would be incomplete branches. For example, given "lol" and ["lo" "l"], one branch would be simply... ("l"). We don't want this garbage.
Other than that, if there were duplicate chunks, parts of the tree would be duplicated as well. So, we might want to filter the chunks list.

But I will solve both of those in the next posts, this is getting a bit long. You might have noticed that the "program" mostly consists of one-liners; my original, slightly less verbose (no function descriptions) was only 9 lines (2 whitespace, 3 function "header" and 4 lines of function body). The next post won't add more than ten lines (not counting function descriptions, again), either.

It also performs rather well, it was clearly faster than my friends Java version (though I suspect an adequately experienced Java programmer may write one faster than my nineliner. As might an adequately experienced Clojure programmer, I guess...).

Functional languages amaze me over and over again with how little I need to do to achieve what I want.

Saturday, 21 April 2012

Installing Clojure: Linux, Emacs, Swank - Step by Step

(This was posted in 2012. Don't count on it being up to date.)

I had to format my computer recently, so now I also have to go through the entire setup process again.

When I did it the first time, it was an epic pain in the arse, but this time I got through with barely a problem, so I decided I might want to share my fairly simple method. Also, because it annoys me when I find handwritten guides and am too agitated to check the post date for outdatedness, I am writing this post on the 21 April 2012. And minor explanations why I do certain things, because that's another thing I don't like about handwritten guides: Magic. ;P


I'm using Emacs + leiningen under Linux Mint to serve all my needs (any linux distro will do, I suppose. I'm not an expert on distros, or Linux in general, quite the opposite). Leiningen is a truly wonderful tool that is surprisingly comfortable to use, provided the command line does not scare you, of course. Anyways, leiningen provides much useful functionality for handling projects and is fairly configurable. It also, and that is the most important thing, I think, it handles dependencies for your projects. Emacs is of course the editor we'll use.

So, without further ado:

1) Get all the things you need (via "sudo apt-get install [name]"). After a fresh install of Mint 12, that was:
  a) emacs23
  b) leiningen
Any other dependencies should be handled by apt-get.

2) Install the swank plugin for leiningen.

> lein plugin install swank-clojure 1.4.2

A very useful thing. Really, very useful. Very short version: Get reasonably feature-loaded REPL within Emacs with minimal hassle. Long version: Slime. Slime is a REPL for lisps (it literally stands for "Superior Lisp Interaction Mode for Emacs"), which consists of two parts, the server and the client. This easily allows Emacs to interact with it; the client is integrated into Emacs, with it not caring where the server really is. lein-swank provides the server part, customised for Clojure.

3) Launch emacs once. I usually use emacs -nw, so it runs in the console, but some people might prefer the more interface-y version. This spawns the .emacs.d/ folder in your home directory.

4) Get clojure-mode (and paredit).

First, create and enter ~/.emacs.d/site-lisp   **
Paredit is a minor mode for Emacs that makes working with LISPs sane, and also infinitely more pleasurable. It has a few quirks you need to get used to, but saying it is "recommended" when working with a LISP (like Clojure) is like saying that breathing is "recommended" when living.
Obtain it via
> wget http://mumble.net/~campbell/emacs/paredit.el

** Small plugins for Emacs, like paredit, I usually put into a common site-lisp directory, whereas larger ones, that span multiple files, go into their own directories. You don't have to do it like that, of course, you''ll just have to adjust the load paths (will come up soon) accordingly. It's not hard.

Now, Clojure Mode. I am sad to say my git-fu failed me, and I downloaded it from github via the ZIP button and unzipped it into ~/.emacs.d/clojure-mode/. Oh well, one day I shall be capable of everything from within the command line!
Anyways, Clojure mode. Major mode for Emacs. It provides syntax highlighting, intendation, and other stuff for Clojure.

5) Configure the crap out of Emacs.

This is pretty much the final step. Go to your home directory, and open the file called .emacs - this file will be executed when Emacs is being started up and can be used to configure it.
We'll be using a few commands in Emacs Lisp, a language integrated into Emacs, here.


  a) I usually disable the menu bars and stuff. If you want, too, put:

(menu-bar-mode nil)

This is of course entirely optional, but those one or two lines of extra space are more valuable to me than those pesky menu bars. :P


  b) Add site-lisp to the load path.

(add-to-list 'load-path "~/.emacs.d/site-lisp")

That way, Emacs knows where to look when you want to load a file. Actually, there are a multitude of directories on the load path already (to check which ones, type "load-path" into the file and press C-x C-e with the curser directly after what you just typed. Just remember to remove that line afterwards :3), and you theoretically could use those, but they are shared, and it's always a good idea to keep personal configurations bound to your account - thus we use the appropriate directories in your home folder.


  c) Load paredit

(require 'paredit)

Fairly self-explanatory - we load the paredit.el file. This does not activate it yet, but it supplies the function paredit-mode, which can be called to enable (and disable) paredit. We will use it in the next steps. Emacs knows where to find paredit.el thanks to our adding the site-lisp directory to the load path, of course.


  d)  Load and configure clojure-mode
(add-to-list 'load-path "~/.emacs.d/clojure-mode")
(require 'clojure-mode)

Two fairly self-explanatory steps, again. We basically do the same as with paredit. Again, this does not yet activate clojure-mode, it merely loads the file.

(add-hook 'clojure-mode-hook
          (lambda ()
            (paredit-mode 1)))

 This is the by far trickiest part, but by no means complicated. add-hook takes two arguments, the hook (I'm not sure what exactly they are, but they are usually [or always?] called [mode-name]-hook), and a function. Then, whenever the mode is enabled, that function is called. Due to this snippet of code, whenever clojure-mode is enabled, paredit goes along with it.

Almost done.

clojure-mode gets enabled automatically when you open a .clj file, though you can manually enable it via M-x clojure-mode any time.

 Once you have clojure-mode loaded, you can use M-x clojure-jack-in to open a slime session/REPL (after typing that, wait for a few seconds - booting it up takes some times).

Actually, you could add clojure-jack-in  to the function that is hooked onto clojure-mode, so that the REPL is automatically launched, though since clojure-mode is started everytime you open a new .clj file, it tries connecting each time you do so (and succeeds, but I'm not sure if there are any side effects - better to just do it manually, or do some slightly more advanced hacking that checks whether there is a clojure repl buffer and doesn't start a new one if there is).

And a final thing about clojure-jack-in: It is intimatly bound with leiningen. It will only work within leiningen projects, which isn't much of limitation actually. Simply learn to use leiningen - it is worth it.

Aaaaand now really a final thing:

When using swank-clojure 1.4.2, I got an error message about Emacs not being able to start a swank server - it STILL WORKS for some reason. If it doesn't, or the message annoys you, use swank-clojure 1.3.3.

I hope I helped.