This is a four-part series. You can find the parts here:

In our last episode, we learned how we can re-implement many other Clojure collection processing functions such as mapv and filterv in terms of reduce. Were you able to implement some of the other functions we talked about like keep or group-by?

In this post, we’ll start to look into Clojure transducers. There is a lot of material, so we won’t get through everything today, but we’ll continue next time.

Now, for whatever reason, transducers are to Clojure what monads are to Haskell: they both take a bit of a mind warp to understand and they have both spawned a cottage industry of tutorials across the Internet. Well, never wanting to be left out, this is my transducer tutorial. Unlike some who have compared transducers to burritos, I’m going in a different direction.

All the monad tutorials make it easy for you to understand monads by giving you a succinct description of them. “A monad is just a monoid in the category of endofunctors,” they say. Yes, yes, of course. That much is obvious.

I’ve got my own pithy description of transducers, too: Transducers are just middleware for reducing functions. There. See. That was easy. “Alright,” I hear you saying, “That much is obvious, but could you unpack that a bit?” Of course.

Let’s start by remembering why transducers exist in the first place.

First, Clojure’s original collection processing functions were lazy. Calling map or filter on a collection doesn’t actually do all the processing at the time of the initial call. Rather, these functions return a lazy-seq that delays processing until you actually consume the results. This is fine if you’re always dealing with infinite sequences and wanting to only realize a small set of results, but it’s inefficient if you always want to process the whole collection. In contrast, transducers are greedy and process the whole collection at once.

Now, there are other functions that we saw last time that also are greedy, such as mapv and filterv. If you want performance, they work great. Unfortunately, they have one drawback: when you compose them together, they create multiple intermediate vectors that are immediately thrown away. For instance:

user> (->> (range 10)
           (filterv odd?)
           ;; there's an intermediate sequence here
           (mapv (partial * 5)))
[5 15 25 35 45]

In this case, filterv creates an output vector which is consumed by mapv which then creates another output vector. That intermediate vector is short-lived and just creates more garbage for the runtime GC to deal with.

Transducers allow us to compose filter and map transformations without creating intermediate garbage.

Finally, map and filter always create an output sequence (because it’s a lazy-seq) and mapv and filterv always create output vectors. But what if you want to process your collection and return a set as the output? You could use into, like so.

user> (->> (range 10)
           (filterv odd?)
           ;; there's an intermediate sequence here
           (mapv (partial * 5))
           ;; and another one here
           (into #{}))
#{15 25 35 5 45}

But this creates yet another intermediate sequence (the output from mapv) that is just thrown away, increasing GC pressure even further.

Transducers are independent of the final collection processing. You can use the same transducer stack to output a vector or a set or whatever.

Okay, so what is a transducer? As I said, a transducer is middleware for reducing functions. Let’s talk about middleware for a bit.

If you’re familiar with Clojure’s popular web application library, Ring, then you’re familiar with middleware. If not, the concept is fairly simple.

In Ring, a web request is parsed by the server into a Clojure map that contains information about the request. Each request is ultimately fulfilled by a handler. A handler is just a Clojure function that takes a request map as input and returns another map that describes the response. The web server then uses the response map to format an HTTP response back to the client.

The Ring wiki gives an example of a simple hander:

(defn handler [request]
  {:status 200
   :headers {"Content-Type" "text/html"}
   :body "Hello World"})

This handler just ignores the request and returns a “Hello World” response.

Ring’s concept of middleware allows you to wrap a handler with another function that does pre-processing or post-processing of the request. Again, the Ring wiki provides this simple example:

(defn wrap-content-type [handler content-type]
  (fn [request]
    (let [response (handler request)]
      (assoc-in response [:headers "Content-Type"] content-type))))
(def app
  (wrap-content-type handler "text/html"))

So, you can see that wrap-content-type returns a function closure that wraps the handler argument. This function closure is then assigned to app which is passed to the web server as the handler. The function clojure returned by wrap-content-type first calls the original handler and then modifies the response map on the way out. The middleware can also modify the request map before calling the wrapped handler.

You can apply a series of Ring middleware easily:

(def app
  (-> handler
      (wrap-content-type "text/html")

Here, we’re passing the handler to the wrap-content-type middleware, which returns a closure, which is passed to the wrap-keyword-params middleware, which returns another closure wrapping the first, which is then passed to the wrap-params middleware that finally returns another closure which wraps the others. So, when a request comes in, it’s going to get passed first to wrap-params, then wrap-keyword-params, then by wrap-content-type, and finally by the original handler. Any of those pieces of middleware can update the request map on the way in and the response map on the way out.

Now, let’s take a look at our custom version of into.

user> (defn cc-into [to from]
        (reduce conj to from))

Note that we’re using conj as our reducing function here to build the final output collection.

Now, let’s take a look at our custom version of filterv.

user> (defn cc-filterv [pred coll]
        (reduce (fn [state input]
                  (if (pred input)
                    (conj state input)    ; here's the conj
                []                        ; here's the output collection

Notice how this works. The reducing function in cc-filterv applies the test predicate to the input and if the result is truthy, it calls another reducing function, conj. The conj and the initial empty vector are there to build the output collection.

If you look at this abstractly, we’ve wrapped some middleware, hardcoded to perform filtering, around the final reducing function which builds the output value. The conj is playing the same role as the final handler in the case of Ring. We’re not quite at the stage where this is a transducer, but this is where we’re heading.

Interestingly, cc-filterv only creates vectors as the output collection. What if we wanted to build another type of collection, or even a whole new data structure, after filtering the input collection? Certainly, we could always compose cc-filterv with another function.

For instance, maybe we want to filter map entries.

user> (defn map-value-odd? [[k v]]
        (odd? v))
user> (->> {:a 1 :b 2 :c 3 :d 4}
           (cc-filterv map-value-odd?)
           (cc-into {}))
{:a 1, :c 3}

This works great, but we’re creating an intermediate collection again. Ideally, we could do it in one step. If we modify cc-filterv a bit, we can get the behavior we want. Here’s cc-filter-into. It can filter a collection and put it into another output collection using a supplied reducing function in a single pass over the input collection.

user> (defn cc-filter-into [pred rf init coll]
        (reduce (fn [state input]
                  (if (pred input)
                    (rf state input)
user> (cc-filter-into map-value-odd? conj {} {:a 1 :b 2 :c 3 :d 4})
{:a 1, :c 3}

Bingo! One function that filters an input collection and builds an output value of our choice using a final reducing function that we specify. No intermediate collections. In fact, since we have control of the reducing function and the initial value, the final value doesn’t even have to be a collection (in this sense, cc-filter-into is a poor name as it implies the final result is a collection).

For instance, we could sum up the odd values in a set.

user> (cc-filter-into odd? + 0 #{1 2 3 4 5 6 7 8 9})

This is awesome!

“Hmmm,” I hear you say. “But what if we want to do something other than just filtering? Maybe we want to modify each value, like we would with map. This seems cool, but it’s a bit limiting.”

That’s a great question. Unfortunately, we’re out of words for today, so we’ll hold it until next time.