Experience Report – Introducing an experienced Ruby Developer to Clojure

I recently introduced Steve to clojure and wanted a good example to show the power of clojure. Steve is a lead developer for our client and has has worked extensively with Ruby and values the expressiveness and the productivity of Ruby. He also has a working knowledge of Java and is interested in clojure but not very convinced about its utility and is generally wary of lisps. I wanted to build an example that demonstrates the syntax, the expressiveness and the concurrency features of clojure. I also wanted to highlight some aspects of the language like threading, function composition and so on which are either absent from Ruby or make it hard to express.

I have tried introducing clojure to other people and have favored the approach of short examples like the ones on Clojure docs. For example, I always used a (Thread/sleep 1000) to demonstrate a long running job for pmap. In fact, clojure docs on pmap uses a similar example. While this convinces the people on why such concurrent systems are important, they fail to see how it can be useful in a real world scenario.

So this time, instead of using several short academic examples, I decided to write a simple program that mimics a real world scenario that can leverage concurrency. The problem I chose to demonstrate was to make a API call to reddit to get top content on the clojure subreddit and do a count of articles by domain and extract the titles of the articles using html scraping using enlive. This problem is not too complex at the same time its not as trivial as (Thread/sleep 1000).

We started off by getting the reddit feed. I initially used slurp for its instant gratification value and showed that this was similar to Ruby’s open method in the open-uri library. Reddit performs rate limiting on API requests and determines it based on the User-Agent header. So I used (clj-http)[https://github.com/dakrone/clj-http] to fetch the JSON with a different user agent.

(require '[clj-http.client :as client])
(require '[clojure.data.json :as json])

(def ^:const REDDIT-URL "http://reddit.com/r/clojure.json?limit=100")
(def ^:const headers {:headers {"User-Agent" "showoffclojure.core by vagmi"}})

(def parse-json
  (comp #(json/read-str % :key-fn keyword)
        :body
        #(client/get % headers)))

(parse-json reddit-url)

I wanted to demonstrate the functional nature of clojure. This demonstrates how simple functions can be composed using comp to higher level functions and how they can be invoked. This also shows how keywords are behave as functions in clojure and are quite prevalent when dealing with maps. I also briefly talked about the IFn interface.

(let [reddit-data (-> (parse-json REDDIT-URL)
                      (:data)
                      (:children))]
  (map :data reddit-data))

To extract the data out of the returned json, I used the -> threading macro. Although, I mentioned macros, I did not spend too much time on them in the interest of brevity.

(require '[net.cgrand.enlive-html :as html])

(defn fetch-url [url]
  (-> url
      (client/get headers)
      ;; Java interop happens seamlessly
      (java.io.StringReader.)
      (html/html-resource)))

(defn extract-title [url]
  (try
    (first (map html/text
                (-> (fetch-url url)
                    (html/select [:html :head :title]))))
    ;; Exception handling is as easy as this
    (catch Exception e "unknown")
    (catch Error e "Unknown")))

We then wrote functions to extract the title of the page using enlive and I introduced how easy it was to do Java interop using Clojure.

;; this will have a structure like
;; {"infoq.com" 3 "self.Clojure" 4}
(defonce domain-summary (ref {}))

;; this will have the following keys
;; [{:article_url, :reddit_title, :actual_title}]
(defonce article-details (ref []))

(defn update-summary [entry]
  (let [title (extract-title (:url entry))
        domain-count (@domain-summary (:domain entry))]
    (dosync
      (if domain-count
        (alter domain-summary assoc (:domain entry) (inc domain-count))
        (alter domain-summary assoc (:domain entry) 1))
      (alter article-details conj {:url (:url entry)
                                   :reddit-title (:title entry)
                                   :actual-title title}))
    (print ".")
    (flush)))

I then talked a bit about how clojure handles values vs identities and its opinions on managing state in a program. We decided to have two data structures domain-summary and article-details. domain-summary will be a map that will have the domain as the key and the count of the entries from the api as the value and the article-details will be a vector of map having the url, reddit-title and actual-title of the page. For every entry in the reddit api list, we will have to update both the domain-summary and the article-details structures. This segued nicely into STM and how clojure supports STM through refs. We put pretty . to indicate the progress and also to point out that IO operations should stay out of dosync blocks.

(let [reddit-data (-> (parse-json REDDIT-URL)
                      (:data)
                      (:children))]
  (map (comp update-summary :data) reddit-data))

We then changed the previous map expression to extract the summary as well. Now the part that completely blew his mind was when I changed map to pmap and it magically utilized all the cores performing ~7x faster as I have 8 cores on my machine.

By the end of this session, Steve was quite convinced about using clojure for our production projects and picked up enough clojure over the weekend to port some of our core parts off ruby to clojure to evaluate it on a real word scenario. I also helped setup vim-fireplace on his machine.

I cleaned up the code a bit and checked it into github. So if you would like to show off clojure to your colleagues or friends, please feel free to use it. If you have any suggestions to make things simpler or demonstrate other concepts that would be helpful in selling clojure’s value proposition better, please send me a PR. I would like to keep the example simple (~ 50 – 60 lines).

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s