Clojure-based Machine Learning

For many years, I’ve been convinced that programming needs to move forward and abandon the Algol family of languages that, still today, dampens the field. And that that forward direction has been signalled for decades by (mostly) functional, possibly dynamic languages with an immersive environment. But it wasn’t until recently that I was able to finally put my money where my mouth has been all these years.

A couple of years ago I finally became a co-founder and started working on a company, BigML, I could call my own (I had been close in the past, but not really there), and was finally in a position to really influence our development decisions at every level. Even at the most sensitive of them all: language choice.

During my previous professional career I had been repeatedly dissapointed by the lack of conviction, when not plain mediocrity, of the technical decision makers in what seemed all kinds of company, no matter how cool, big or small: one would always end up eaten alive by the Java/C++ ogre, with a couple testimonial scripts (or perhaps some unused scheme bindings) to pay lip service to the hipsters around.

Deep down, the people in charge either didn’t think languages make any difference or, worse, bought into the silly “availability of programmers” argument. I’m still surprised anyone would believe such a thing. If a candidate came telling me that s/he wanted to program only in, say, Java because that’s what s/he knows best and that s/he doesn’t really feel prepared or interested in learning and using, say, Clojure (or any other language, really), I wouldn’t hire her/him in a million years, no matter what language my project were using, and no matter how many thousands of candidates like this one I had at my disposal.

Give me programmers willing to learn and eager to use Lisp or Haskell (or Erlang, ML, OCaml, Smalltalk, Factor, Scala…) any day, even if we happen to be using C++, for goodness sake! Those are the ones one needs, scanty as they might be.

Given the sad precedents, when I embarked in my new adventure I was careful not to try to impose on anyone my heretical ideals: they had to be accepted on their own grounds, not based on my authority. But I was finally lucky enough to meet a team of people with enough intellectual curiosity and good sense (which is, again, actually a pre-condition to be in a startup). People, let me tell you, with no experience whatsoever in languages outside the mainstream, but people that, nonetheless, were good programmers. And when you give a good programmer a good language, good things happen.

Come to think of it, it’s true that, with good programmers, the language doesn’t matter: they’ll choose the best one, or follow the advice of more experienced colleagues, and quickly take advantage of any extra power the novel environment has to offer.

Our experience so far could hardly be a better counterexample against algol naysayers.

Our backend is 99.4% coded in Clojure, and 66% of the team had never programmed seriously in any Lisp, let alone Haskell or Prolog (heck, not even I (the remaining 33%) had actually tried anything non-mainstream for real in a big project!) Maybe some Ruby, and lots and lots of Java and C and C++. But they accepted the challenge after reading around and learning the basics, and 3 months later you couldn’t take Clojure from their prying hands. More importantly, they had fun discovering they could also be Dr Jekyll, or that the functional side wasn’t an impractical alley.

Of course, lots of effort is a must, and someone with a bit of experience to guide the newbies is probably necessary. In particular, extensive code reviews were of the essence in our case, and I had never read and criticised this many lines of code in a similar amount of time. But know what? I prefer to “waste” time in code reviews to spend at least as much writing factories of factories of factories and similar boilerplate (configured, of course, using XML) and chasing bugs in the resulting soup. Not to mention that, again, you need code reviews no matter the language you use: the trick of having a code base so ugly that nobody would review it doesn’t fly.

So yes, it’s more difficult to find hackers, but you need to find them anyway. And yes, it may require more serious software engineering, but, again, you need it anyway. So why shouldn’t we use the best tools at our disposal? I can finally tell you: been there, done that, and it does work.

7 comments

charleslparker says:

June 21, 2013 at 7:09 pm

As part of the aforementioned 66%, I agree that our Clojure journey has been one worth taking.

I particularly like your comment at the end: “It may require more serious software engineering, but, again, you need it anyway.” Nothing could be truer. I don’t have the passion around programming languages that you do, but BigML has made me a huge believer in serious software engineering. I was reminded of this recently when I read a (totally serious) statement on a blog that “This Haskell function is self-documenting, providing you know how to read Haskell.”

Any choice of language is only as important as the decision to use it as well as it can be used. If you’re really committed to using a tool well, than you’ll probably do just that. If you’re not, then the choice of tool isn’t going to save you from writing bad software.

firesofmay says:

June 21, 2013 at 7:34 pm

Very nice article. Though I am not sure why you named it “Clojure-based Machine Learning”. A bit misleading I think.
But I totally agree with you above.
Thanks for sharing.

1. jao says:
  
  June 21, 2013 at 8:19 pm
  
  We were thinking of a series of posts explaining how and why we use clojure and “Clojure-based Machine Learning” sounded like a good title for the whole series, and then we didn’t find a title good enough for the initial post! It will all make sense… eventually 🙂
  
jt says:

June 23, 2013 at 11:14 am

Interesting article… How are you assessing (e.g. capability, productivity, agility, etc.) the results of your choice? Has your ratio between developing infrastructure versus developing end user features changed when using Clojure as compared to the traditional dev tools?

1. jao says:
  
  June 24, 2013 at 2:23 pm
  
  The biggest gains are probably in the productivity boost that using a higher level of abstraction gives us and the agility derived from using a REPL-enabled language. The purely-functional-by-default nature of Clojure has eliminated many of the nightmares associated with stateful concurrency for us, and the far superior expressivity (as compared to, say, Java) has saved us, in my estimation, at least a factor of 3 in LOC, probably more.
  
  That said, the fact that Clojure builds on top of the JVM and provides very low impedance access to the huge number of Java libraries available out there has certainly allowed us to focus on our specific features. The nice thing is that usually we can do it through Clojure wrappers already out there.
  
  In that sense, we’re getting the best of both worlds.
  
Pingback: Most interesting links of June ’13 « The Holy Java
Marc says:

January 27, 2015 at 6:31 pm

Loved the article! Every point of it is so true, so similar to my experience when I joined a startup where we have worked on Common Lisp.
Loved the REPL language, now looking for another journey with a functional programming language.
Once you get into functional programming, it gets you spoiled 😀 it’s hard to go back to old C++/Java.