# Counting Words in Clojure

2013-10-26

One of the early Clojure exercises on exercism.io is writing a word count algorithm. The exercise is easy enough in Ruby, but at first glance seems daunting for someone not accustomed to functional programming. In reality, though, with a proper introduction to the concept of reduce in Clojure, the exercise is quite enjoyable and not nearly so difficult.

To start, let's assume we have a string: `one fish two fish red fish blue fish`. How do we write an algorithm to get the following output?

``````{"one" 1, "fish" 4, "two" 1, "red" 1, "blue" 1}
``````

In Ruby, we might split the string out into an array of words and iterate over them while keeping track of their frequency using a hash. Clojure gives us an easy way to do much the same:

``````(def words-to-count "one fish two fish red fish blue fish")

(frequencies (clojure.string/split words-to-count #"\s"))
; {"one" 1, "fish" 4, "two" 1, "red" 1, "blue" 1}
``````

First, we split our string using a regular expression for whitespace (i.e., `#"\s"`), and then pass the resulting vector off to the `frequencies` function which does the rest of the work, returning the result we wanted.

So that wasn't so hard, but what if we didn't have the handy `frequencies` function? Enter `reduce`.

In a Clojure repl, we can type `(doc reduce)` to get some basic documentation on the function. According to the docs, `reduce` takes a function, an initial value, and a collection. For example, if we wanted to sum a a vector of numbers, we could use `reduce` as follows:

``````(reduce + 0 [1 2 3]) ; 6
``````

The plus sign is our function, the zero our initial value, and the vector our collection. Understanding `reduce` in my mind is a fundamental requirement to understanding much of how functional programming works. And there's a great post on reduce which has helped me wrap my mind around the subtle function. The post deserves to be read in its entirety, but I will briefly present the points relevant to counting words here.

The core of `reduce` is in its use of looping and recursion. Let's go one step further with `reduce` before we return to our word count algorithm.

Assume we wanted to sum a vector without using `reduce`. How would that work? We would start by using `loop` and `recur`, Clojure's (and functional programming's) version of iteration.

``````(loop [acc 0 coll [1 2 3]]
(if (empty? coll) acc
(recur (+ acc (first coll)) (rest coll))))
``````

We start by assigning some local values, `acc` for our accumulator which is set to zero, and `coll` for our initial collection. As with all recursive algorithms, we access the base case first, i.e., that our collection is empty. If it is in fact empty, we will return our accumulator. Otherwise, we use `recur` to pass new values for `acc` and `coll`: 1) the sum of our accumulator's value and the first value in the collection as `acc`, and 2) what's left of the collection without the first element, i.e., `(rest coll)`, as `coll`. This recursion will continue until we empty out `coll` and have added all values to the accumulator.

Whereas the loop above will only use addition, we could easily write our own version of reduce to which we could pass the function with which we wanted to reduce our collection:

``````(defn my-reduce [fn init collection]
(loop [acc init coll collection]
(if (empty? coll) acc
(recur (fn acc (first coll)) (rest coll)))))
``````

Although the function above is a naive implementation, it nonetheless demonstrates how `reduce` is built from `loop` and `recur`.

Now that we've covered the basics of `reduce`, let's return to our word count algorithm above. We know now that we can use `reduce` in the absence of `frequencies`. Our first attempt might look something like the following:

``````(def split-words (clojure.string/split words-to-count #"\s"))

(defn count-words [words-to-count]
(reduce _ {} words-to-count))

(count-words split-words)
``````

Once we've split our string into a vector of words, we can easily formulate the skeleton of our call to `reduce`. What function will take the place of the underscore above? Let's write it now.

We'll call the function `record-word-count`. It will take two arguments, a map to keep track of each word and its count, and a word to enter into a new copy of the map.

``````(defn record-word-count [memo word]
(assoc memo word (inc (memo word 0))))
``````

We'll start from the inside and move outwards. First, with `(memo word 0)`, we attempt to lookup the value stored in `memo` for our `word`. If we don't find it, we return 0. This is much like Ruby's `Hash#fetch` method which optionally takes a value for a missing key. Next, whatever value we return (zero or otherwise), we increment its value with `inc`. Finally, we use `assoc` to create a new map which will include our word and its updated value.

Our resulting code is as follows:

``````(def split-words (clojure.string/split words-to-count #"\s"))

(defn record-word-count [memo word]
(assoc memo word (inc (memo word 0))))

(defn count-words [words-to-count]
(reduce record-word-count {} words-to-count))

(count-words split-words)
; {"one" 1, "fish" 4, "two" 1, "red" 1, "blue" 1}
``````