Haskell I/O in Five Minutes

October 19th, 2012

The way you do I/O in Haskell may be radically different from what newcomers are used to, but in fact it follows a few simple rules. Let me try to explain.

For example, you might expect a line of text to pop up in the terminal when you apply putStrLn to "Hello" somewhere in a program. Actually, nothing happens!1 Instead the expression evaluates to a value which represents the action of printing “Hello”. Evaluating the expression does not cause the action to be performed.

I/O actions, such as putStrLn "Hello", are just ordinary values. This means that you can put them in variables, store them in lists, pass them to functions and so on. In other words, I/O actions are first class values.

If you don’t perform an action by evaluating it, then how do you go about doing it? The answer is simple: you let it be main:

main :: IO ()
main = putStrLn "Hello"

When you run the program the Haskell runtime takes the value of main (which must be an I/O action) and performs it. With only this “recipe for I/O” presented so far at hand, it seems like we can only perform one single action.

How do you perform two actions, then? It turns out that there is a way to combine two I/O actions into one action. To understand Haskell code I find it very useful to look at the types, so let’s first do that for two useful actions:

putStrLn :: String -> IO ()
getLine  :: IO String

The first line should be read as “putStrLn is a function that takes a string and returns an action that, when performed, produces ()2”. The the second should be read as “getLine is an action that, when performed, produces a string”.

The function that combines two actions is called “bind” an is written in Haskell as the infix operator “>>=“. It has the following type:

(>>=) :: IO a -> (a -> IO b) -> IO b

Bind takes two arguments (the left and right operands) and returns a compound action. The meaning of this new action is to first perform the action to the left. The left action produces a value of type a. Then the function to the right is applied with this value, which returns a second action. Finally the second action is performed. The value it produces becomes the result of the whole compound action.

With this new ingredient it is possible to define an action that does two things, for example reading and printing:

main :: IO ()
main = getLine >>= (\line -> putStrLn (reverse line))

This example reads a line from the terminal, reverses it, and prints it back. Finally there is another function which come handy when you compose action. It is called “return” and has the following type3:

return :: a -> IO a

That is, it is a function that takes an a and returns an action that, when performed, produces an a. The action does not actually perform any real I/O and the value it yields is simply the one passed as the argument. Actions from return are often used as the last step of an action sequence to combine intermediate results into a bigger one. For example:

getLinePair :: IO (String, String)
getLinePair = getLine >>= (\x ->
                getLine >>= (\y ->
                  return (x, y)))

The meaning of this action is to read a first line from the terminal and then a second one. The result of the action is a pair of the first and the second line.

And that’s it. After this next steps could be to read about the “do notation” or to browse the documentation for the System.IO package. If you enjoyed this tutorial or have any questions, feel free to post a comment below!

– raek

(Thanks to @kajgo and @ricli85 for proofreading!)

  1. This is not true at the top level of the GHCi command-line since it handles I/O actions specially compared to other values. []
  2. This is read as “unit” and means “no useful value”. []
  3. In reality, the (>>=) and return functions do actually involve a more general type than IO. []

Update: Eagle 6.2.0 has been released. The issue remains though and this updated guide still applies.

Minor update: Changed instructions to download libpng source code via FTP instead of Git. This results in one less needed tool and a decreased download size.

Update: This guide has been revised to work on both 32-bit and 64-bit systems. Previously it only worked on 32-bit systems. Please write a comment if you have any questions or find something that does not work.

Update: Eagle 6.1.0 has been released! This guide still applies to both 6.0.0 and 6.1.0, and has been updated with the following change: The string “6.0″ has been replaced by “6.1″ in the name of the installer and in the installation location. Apart from that everything is the same.

CadSoft recently released version 6 of its PCB layout program Eagle. So, you wanted to use it in Ubuntu? The eagle package in Ubuntu is only version 5.11.0 as of Oneiric, so you downloaded the new version from the official website? You ran the installer and got this error?

error while loading shared libraries: libpng14.so.14: cannot open shared object file: No such file or directory

It turns out that Eagle needs newer versions of some libraries than are available in the Ubuntu repos. This post will show you how to get hold of these specific version and how set them up with Eagle.

No super user access is needed (besides from installing build tools) in the approach I chose. The special versions of the libraries are only used Eagle and will never cause trouble for any other applications you have installed, because the system directories are never touched.

Overview

My approach was to download the source for all libraries Eagle needs, compile them (as 32-bit, since that’s what Eagle requires), install them in a local directory, and make a small script to launch Eagle. The libraries Eagle needs are:

  • libpng 1.4.x (provides libpng14.so)
  • OpenSSL 1.0.0 (provides libssl.so.1.0.0 and libcrypto.so.1.0.0)
  • libjpeg v8 (provides libjpeg.so.8)

Preparations

In my setup I put all relevant files the directory /home/raek/.eagle . To follow my steps you need to create your own directory and subsitute its path it for /home/raek/.eagle in all the following steps.

mkdir /home/raek/.eagle
cd /home/raek/.eagle

I had to install some tools and libraries were needed in the process. This depended on whether I was on my 32-bit or 64-bit machine. In the 32-bit scenario they were:

sudo apt-get install build-essential perl
sudo apt-get install zlib1g zlib1g-dev

And in the 64-bit they were:

sudo apt-get install build-essential perl gcc-multilib
sudo apt-get install ia32-libs lib32z1 lib32z1-dev

Downloading, building, and installing libraries locally

I went about to install a typical library like this:

  1. Download the source code and unpack it.
  2. Configure the library with suitable options.
  3. Build the library.
  4. Test the library.
  5. Install the library.
  6. Verify that the expected .so file shows up in
    /home/raek/.eagle/usr/lib and that is 32-bit.

Of the configure options, the --prefix=/home/raek/.eagle/usr option is important since it tells the build system where to put the files when make install is run. It allowed me to put the result my own usr directory rather than in the system-wide /usr or /usr/local.

The CFLAGS=-m32 option (in its various forms) is also needed to force the libraries to be built in 32-bit form.

I verified that the resulting .so files were in the right place and 32-bit with the file -L command. If it was 32-bit the command came back with “ELF 32-bit”, if was 64-bit it came back with “ELF 64-bit”, and if the file didn’t exist at all it came back with “No such file or directory”.

libpng 1.4.x

I downloaded the libpng14 source code and configured, built, tested, installed and verified it:

wget http://www.sourceforge.net/projects/libpng/files/libpng14/older-releases/1.4.11/libpng-1.4.11.tar.gz
tar zxf libpng-1.4.11.tar.gz
cd libpng-1.4.11
./configure --prefix=/home/raek/.eagle/usr CFLAGS=-m32
make check
make install
cd ..
file -L usr/lib/libpng14.so

libssl 1.0.0

The libssl install procedure was very similar, but here the shared option was needed to generate .so files:

wget http://www.openssl.org/source/openssl-1.0.0f.tar.gz
tar zxf openssl-1.0.0f.tar.gz
cd openssl-1.0.0f
./Configure shared --prefix=/home/raek/.eagle/usr linux-generic32 -m32
make
make test
make install
cd ..
file -L usr/lib/libssl.so.1.0.0
file -L usr/lib/libcrypto.so.1.0.0

libjpeg v8

No surprises here:

wget http://www.ijg.org/files/jpegsrc.v8c.tar.gz
tar zxf jpegsrc.v8c.tar.gz
cd jpeg-8c
./configure --prefix=/home/raek/.eagle/usr CFLAGS=-m32
make
make test
make install
cd ..
file -L usr/lib/libjpeg.so.8

Installing Eagle

I now had all the library files I needed in my /home/raek/.eagle/usr/lib directory and proceeded with downloading and installing Eagle itself. I told the shared library loader to always look for libraries in this directory first by setting the LD_LIBRARY_PATH environment variable in my shell session.

I could then run the Eagle installer and chose to install Eagle in /home/raek/.eagle/eagle-6.2.0 . After that I could start eagle by running the binary found in eagle-6.2.0/bin/eagle.

wget ftp://ftp.cadsoft.de/eagle/program/6.2/eagle-lin-6.2.0.run
export LD_LIBRARY_PATH=/home/raek/.eagle/usr/lib
sh eagle-lin-6.2.0.run
/home/raek/.eagle/eagle-6.2.0/bin/eagle

Making a Launch Script

Starting eagle worked fine, but I din’t want to have to run the export command in a terminal each time I were going to start Eagle. Therefore I made a small script with the following contents:

#!/bin/sh
export LD_LIBRARY_PATH=/home/raek/.eagle/usr/lib
/home/raek/.eagle/eagle-6.2.0/bin/eagle

After I wrote the script I made it executable and added a symlink to it in my .bin directory, which I have on my PATH.

nano run_eagle.sh
chmod a+x run_eagle.sh
cd /home/raek/.bin
ln -s /home/raek/.eagle/run_eagle.sh /home/raek/.bin/eagle

I can now start Eagle by just running eagle! If I want to uninstall Eagle some time in the future, all I need to do is to delete /home/raek/.eagle and /home/raek/.bin/eagle and both Eagle and the special version libraries will be gone.

And that’s it! Please drop a comment below if this was useful for you (or if if something turned out to not work)!

The re- functions

June 17th, 2011

I recently realized that of the core Clojure regex functions (re-pattern, re-matcher, re-matches, re-groups, re-find, re-seq) I was completely unaware of how re-matcher, re-matches, re-groups were supposed to be used. Their names hint that they are useful for something, but I had never needed to use them. To understand them, I first had to dive into the Javadoc a bit, more specifically the java.util.regex package.

The are two basic concepts in the regex package: the Pattern and the Matcher. A Pattern is what a Clojure regex literal produces and (re-pattern pattern-string) can be used to create one from a string.

user=> (def p (re-pattern "abc|def"))
#'user/p

A Matcher is a stateful class used to find one or more matching (sub)sequences of a string. A matcher can be created with (re-matcher pattern string) and is initially in a state where the match region is the whole string.

user=> (def m (re-matcher p "defabc"))
#'user/m

There are two methods for trying to match the region against the pattern: (.find matcher) and (.matches matcher). Here, “matches” should be read as in “it matches” and not “the matches”. The methods alter the state of the Matcher in the following way: if the beginning of the region matches the pattern, then true is returned and the the matched substring is stored internally and popped off the beginning of the remaining match region. Otherwise false is returned. The two methods differ in that .find will scan the string for a match but .matches requires the whole remaining region to match.

user=> (.find m)
true

To extract the matched subsequence (or the matched groups) for the most recent match, re-groups is used. If no groups are present in the patter, the match is returned as a string. If n groups are present in the pattern, a vector of size n+1 is returned, where the first element is the whole match and the rest the matches of the groups. The state of the Matcher remains unchanged.

user=> (re-groups m)
"def"
user=> (re-groups m)
"def"
user=> (.find m)
true
user=> (re-groups m)
"abc"
user=> (.find m)
false

The re-find function is a wrapper for .find that in addition to accept a Matcher as an argument can also take a pattern and a string and create its own Matcher. A call like (re-find pattern string) is equivalent to (let [m (re-matcher pattern string)] (when (.find m) (re-groups m))).

user=> (re-find #"abc|def" "xdefabcy")
"def"

The re-matches function works just like re-find, except that it uses the .matches method and does not come with a single argument variant (that would take a Matcher).

user=> (re-matches #"abc|def" "def")
"def"
user=> (re-matches #"abc|def" "defabc")
nil

In addition, there’s the re-seq function that returns a sequence of all the matches re-find would find. It accepts a pattern and a string as its arguments: (re-seq pattern string).

user=> (re-seq #"abc|def" "xdefabcy")
("def" "abc")

In the end, four of the functions turn out to be more useful than the others for a Clojure programmer:

  • To create a regex pattern from a string, use re-pattern.
  • To match a string completely against a pattern, use re-matches.
  • To find some part of a string that matches a pattern, use re-find.
  • To find all the parts of a string that matches a pattern, use re-seq.


This is the first post of one my ongoing attempt to write a series of blog posts about how to get started with development in Clojure. This post will cover a beginner’s first encounter with the Clojure Read Eval Print Loop.

This post assumes a UNIX-like environment. Some details might vary from OS to OS (especially for Windows). In those cases, study the documentation related to your platform for the mentioned applications.

So, you’ve heard some interesting things about this language called Clojure… Good. When you play around with an interactive programming language (which Clojure is a prime example of) you usually do it though a shell of some sort. In Lisp languages, this shell is traditionally called the REPL, which stands for Read Eval Print Loop.

The question I can almost hear you think is: “How do I install Clojure?” The answer to that question, which is somewhat unusual, is: “You don’t.” Clojure’s approach to language and library versions is similar to Virtualenv of Python or RVM of Ruby. To launch Clojure you use what’s usually called a build tool. Since they do more than just build tasks, so you can think of them as “Clojure environment tools”. Two of the most common ones are Leiningen and Cake. Here, I will only cover Leiningen, but Cake works nearly identically for basic tasks.

Now, what you do install is Leiningen:

  • Download the lein script available from the project page.
  • Put the file in a directory where you keep executables. I keep mine in ~/.bin/
  • Make it executable: chmod a+x lein
  • Ensure that directory with executables is on the PATH. I do this by having the following in my ~/.profile file:
    PATH=$PATH:/home/raek/.bin
    export PATH

    This makes lein available both in applications started from Bash and in Gnome1; just change the /home/raek/ to your own home directory.

  • Run lein once to let it download the files it needs. The files are placed in ~/.lein and ~/.m2
  • Run lein repl to get a Clojure REPL!
  • Optional: Install rlwrap using the package manager of your OS to get a better REPL experience. Otherwise JLine is used which does not support UTF-8.

With a bare REPL in a terminal you can do much, but the lack of editing and saving abilities can make it tiresome in the long run. I therefore recommend to use an ordinary text editor to write the code and then send the code to the REPL. The simplest way to accomplish this is to simply copy and paste the code, but more sophisticated editors provide more convenient methods (I will shortly show this can done in Emacs).

Restricting oneself to only the REPL has some serious limitations and because of this I don’t recommend this approach in general, except for learning (for that it may indeed be very useful) and trivial projects. So far I haven’t mentioned how divide code into multiple files, how to use third party libraries or how to specify the version of Clojure to use. For that, you need to set up a Leiningen project. This is the topic for the next (upcoming) part of this blog post series.

In Emacs, you can interact with the REPL by following these steps:

  • Install clojure-mode using package.el by following the instructions on the official Getting Started with Emacs wiki page.
  • Execute M-x customize-variable RET inferior-lisp-program and configure it to use lein repl as the program.
  • Open a .clj file, or run M-x clojure-mode in a buffer (e.g. *scratch*).
  • Press C-c C-z to start the Clojure REPL.
  • Use C-x C-e at the end of an expresison to evaluate it or C-M-x to evaluate the expression spanning between the outermost parentheses surrounding the point.

This is it for now. Happy hacking!

// raek


  1. You might need to restart the X server for this change to be effective. []

Executors in Clojure

January 24th, 2011

Java has a very useful package called java.util.concurrent, which contains classes and interfaces for tasks, task execution tracking, thread-to-thread communication with blocking queues, locks, semaphores, atomic containers and the Executors Framework.  This blog post will walk you through the concepts of the Executors Framework as seen from Clojure.

But first a word on the relationship between Clojure and existing Java frameworks. Clojure has been designed to make Java interop as seamless as possible. Where Java is not broken, Clojure does not in general1 add a wrapping layer.2

The Executors Framework provides abstractions for representing tasks, handles to running tasks and executors as objects. In Java, general tasks (or units of work) are contained in instances of Runnable or Callable. A task is executed by calling its run and call method respectively. The interfaces are very straight forward3 :

package java.lang;
public interface Runnable {
    void run();
}
package java.util.concurrent;
public interface Callable<V> {
    V call();
}

The difference between them is that call can return a value, unlike run which is of type void. As you might have realized, this abstraction is a bit similar to function objects: They are both ways of encapsulating pieces of code as objects. Naturally, Clojure functions implement Runnable and Callable by invoking itself with zero arguments:

user=> (defn demo-task []
         (println "boo!")
         123)
#'user/demo-task
user=> (.run demo-task)
boo!
nil
user=> (.call demo-task)
boo!
123

Now that we have a way of describing tasks, we can explore how we can pass them to something that can execute them. The simplest abstraction for this is the Executor. It has one method called execute that takes a Runnable. When it is invoked, the Executor is expected to execute the task some time in the future.

package java.util.concurrent;
public interface Executor {
    void execute(Runnable command);
}

For fun, we can now implement an Executor that when passed a task creates a dedicated thread for it and runs the tasks in it:

user=> (defn create-thread-executor []
         (reify
           java.util.concurrent.Executor
           (execute [_ task]
             (let [f #(try
                        (task)
                        ;; return value is ignored by Thread
                        (catch Throwable e
                          ;; not much we can do here
                          (.printStackTrace e *out*)))]
               (doto (Thread. f)
                 (.start))))))
#'user/create-thread-executor
user=> (alter-var-root #'*out* (constantly *out*))4
#<PrintWriter java.io.PrintWriter@16c72cc>
user=> (def exe (create-thread-executor))
#'user/exe
user=> (.execute exe demo-task)
boo!
nil

There are three things that the above code does not address very well: it doesn’t tell you when the task is done, it does not provide a way of getting back any value and it does not provide a way for the calling code to detect a failure in the task. A much richer abstraction is the ExecutionService. It extends the Executor interface and provides methods to get a result back from a task, submit multiple tasks at once and to gracefully shut it down: awaitTermination, invokeAll, invokeAny, isShutdown, isTerminated, shutdown, shutdownNow and submit. Since it allows tasks to communicate a value back, tasks can be of type Callable.

Along with the ExecutorService, another concept is introduced: the Future. An object that implements this interface represent a handle to a task that is queued, cancelled, being executed or has been scheduled for execution. When you submit a task to an ExecutorService, you get a Future back. You can use it to retrieve its result, query whether it’s done yet, or cancel it, among other things. Its interface is as follows:

package java.util.concurrent;
public interface Future<V> {
    boolean cancel(boolean mayInterruptIfRunning);
    V get();
    V get(long timeout, TimeUnit unit);
    boolean isCancelled();
    boolean isDone();
}

When invoking the get method, the call will block until the task is done or the call times out (if a timeout was given). Clojure provides wrapper functions (whose names begin with future-), for all of these methods except for get. (You can still access those methods with usual Java interop.) A call to get can exit in five ways:

  • On sucess, it returns the result of the call method of the task, if it was a Callable, or nil, if it was a Runnable.
  • If the Future has been cancelled, a CancellationException is thrown.
  • If the body of the task has thrown an unhandled exception, an ExecutionException is thrown with that exception as its cause.
  • If the thread that executed the task has been interrupted, an InterruptedException is thrown.
  • If a timeout was given and that time has passed, a TimeoutException is thrown

In addition, the Executors Framework provides the Executors class, which is a colletion of static factory methods for creating various concrete instances of ExecutorService. Two very useful ones are newFixedThreadPool and newCachedThreadPool. Using thread pools is usually a good idea, since thread creation is an expensive operation.

newFixedThreadPool solves the problem by creating a fixed number of threads, and using them to execute the tasks. The cost for creating new threads only occurs once, but only a fixed number of tasks can be run at the same time. The approach of newCachedThreadPool is to start with no threads and creates new ones as it needs them. If a thread is done with its task, it will stay around for sixty seconds. If it does not get a new task in that time, it will be deallocated. Let’s try using the first kind from Clojure:

user=> (import 'java.util.concurrent.Executors)
java.util.concurrent.Executors
user=> (def pool (Executors/newFixedThreadPool 4))
#'user/create-thread-executor
user=> (defn sleep-print-and-double [x]
         (Thread/sleep 1000)
         (println x "done!")
         (* x 2))
#'user/sleep-and-print
user=> (let [tasks (for [i (range 10)]
                     #(sleep-print-and-double i))
             futures (.invokeAll pool tasks)]
         (for [ftr futures]
           (.get ftr)))
;; (1 sec delay)
0 done!
1 done!
3 done!
2 done!
;; (1 sec delay)
4 done!
5 done!
7 done!
6 done!
;; (1 sec delay)
8 done!
9 done!
(0 2 4 6 8 10 12 14 16 18)

Not too complicated, wasn’t it? I will finish by describing a macro in Clojure that you might have heard of: future. (The name itself may be a bit unfortunate, since it only tells us that we get a Future object, but not what took care of the task.) future takes some expressions, wraps them up as the body of an anonymous function and passed that function to future-call. In other words, (future (foo) (bar)) is just a more convenient way of writing (future-call (fn [] (foo) (bar))).

future-call, the function that actually does the work, submits the given task to one of Clojure’s internal thread pools5 and gets a Future back. Another object that implements both IDeref (which allows you to call deref/@ on it) and Future will be the actual return value. Dereferencing it will invoke the get method of the Future from the thread pool and any call to a Future method on it will be delegated to that Future too. All this is perhaps best clarified with the source code itself:

(defn future-call [^Callable f]
  (let [fut (.submit clojure.lang.Agent/soloExecutor f)]
    (reify
     clojure.lang.IDeref
      (deref [_] (.get fut))
     java.util.concurrent.Future
      (get [_] (.get fut))
      (get [_ timeout unit] (.get fut timeout unit))
      (isCancelled [_] (.isCancelled fut))
      (isDone [_] (.isDone fut))
      (cancel [_ interrupt?] (.cancel fut interrupt?)))))

future provides a fairly simple standard solution for starting off some piece of code in another thread and you are likely to come across it. Now you know how it works under the hood.

// raek

  1. An exception is the clojure.string namespace, which got added in Clojure 1.2. []
  2. http://clojure-log.n01se.net/date/2010-12-02.html#17:53 []
  3. When in Clojure, you can think of type parameters as being of type Object. []
  4. This makes the current repl the default output stream for new threads. []
  5. This happens to be the same one used by the implementation of send-off []