From Big Blue’s Predictive Analytics to Machine Learning with BigML
Within 24 hours of turning in my IBM badge, laptop and signed exit papers, I found myself on a plane to Buenos Aires followed by Melbourne, and Sydney for conferences and client meetings representing a company I was not even officially working for yet. I am now one of the most recent additions to the BigML team and my motivation to give up my IBM career and to make a fresh start with a startup did not come up as a sudden urge. Rather, it was a gradual process of observing the sea change in the marketplace.
Having interfaced with many analytics organizations as part of my tenure at IBM, it is my conviction that we have entered a new era, where the democratization of machine learning is allowing organizations large and small to add repeatable statistical rigor to all kinds of processes that up to now have been predominantly influenced by human bias e.g. candidate identification and the interview process (HR), predicting vacation rental prices, athlete profiling by scouts, sizing and pricing complex services projects, and optimizing crop yields and farming operations. No doubt all of those business profiles, including the guy predicting vacation rental prices will one day utilize machine learning – without having to reinvent themselves as hackers that is.
Speed, Deployability, and Costs
Like most digital technologies, Machine Learning is in the process of becoming automated and commoditized with BigML, Amazon, Microsoft, and Google leading the charge.
The business drivers? There are many:
- Transforming manual set of processes into a single fluid one by leveraging easy to use services
- Lowering the complexity and cost of building and deploying predictive models
- Increasing business performance by applying machine learning in daily operations to speed up the time-to-market of more and more data-driven decisions.
With tools like BigML, in a fraction of the time that it takes to install and configure any statistical software package like R, SAS or SPSS, you can create an online account, load your source data, train, test, and boom! You have built a predictive model that helps you score all your present and future data. Best of it, the same process template can be applied to any of your functional areas (marketing, sales, risk, compliance, maintenance etc.) accelerating the scale of data driven actions in the whole company.
Oh, and did I mention exportable models that can be shared with anybody in your organization, be enabled to run remotely on IOT devices or other high value assets (i.e. cell towers, manufacturing equipment, infrastructure pipes, etc.) and supporting most popular runtime environments such as Python, C++, Java, Node.js and more. That’s right, you can create models, then export them to anybody or practically anything and have them run locally at a single machine or send them to a million machines at no additional cost. Yes, democratization of advanced analytics is literally here.
All this means the traditional enterprise software vendor approach (SAS, SAP, IBM etc.) of selling large bundles of software that include components/modules that sit unused is slowly seeing the end of days. Users want innovation, not re-stitching together of 20+ year old platforms buttressed with spend heavy traditional marketing to keep up with the budding tools that have adopted cloud-based distributed machine learning driven approaches from birth. On top of that, over the last decade companies have become much more comfortable working with innovative startup providers that offer a SaaS model built on IaaS and PaaS substrates, thus not shying away from passing the savings from low overhead costs to their customers. Since the playbook of best practices needed to operate a cloud-born company are common knowledge these days, we will likely witness the product selection bias tipping increasingly away from the incumbents.
Machine Learning made “beautiful”?
When I first heard BigML’s motto, my thought was “Who would associate Machine Learning with beauty?” However, after seeing the audience reactions ranging from “Wow!”, “Very cool”, and from time to time “This is like IBM Watson and Tableau put together” I have come to appreciate the effort BigML team has put in enabling a highly streamlined and understandable machine learning workflow that is capable of demystifying the mathematically complex Machine Learning concepts for even the uninitiated. Beautifully simple indeed. It has finally occurred to me that living in an era ushered in by Mr. Steve Jobs has gotten us all spoiled with much higher expectations from any and all products that we come to touch. Naturally, there is no reason why the same should not apply to Machine Learning software.
While such meaningful progress is being made towards making machine learning more usable and understandable for a broader set of technical and business users, today’s typical practitioners are mainly Developers, Data Scientists, and “Data Wranglers”—the unsung heroes of any analytics project. Which brings us to an inconvenient reality: There are just not enough of these well-balanced teams of “Practitioners” in the market to meet the exploding demand. So what are companies to do at a time Machine Learning is supposed to take center stage in their irreversible digital evolution?
If you are tasked with building a Data Science team, I recommend getting started with assigning tasks to Business Analysts and Data Wranglers. They should clearly prioritize and formulate the business problem, extract, transform and get ready the relevant data sources for predictive modeling. Most practitioners agree that up to 80% of the time and effort is spent with those stages of the process. In parallel, you can aim to find a person with “practical” experience in the area of machine learning. WARNING – if during your hiring process you come across a Data Scientist whose career highlight is an exotic algorithm he has been working on for the last 2 years, say THANK YOU and RUN the other way. That’s more of a Machine Learning Researcher profile than a practitioner. Frankly many companies don’t need that level of specialization in a field that has been around for over half a century with many proven techniques and approaches already productized and available as RESTful API end points – just add data.
As far as BigML is concerned, we are Data Scientist agnostic. We recognize that a “practical” and well aligned Data Scientist can empower a broader team with relevant Machine Learning knowledge allowing them to more efficiently explore problems that matter to the business. Equally important is a scalable, programmable, and easy to use MLaaS (Machine Learning as a Service) tool that let’s capable Developers and in-house SMEs build, test and deploy predictive use cases that can learn and get better and better over time. The results speak for themselves as far as the business impact is concerned regardless if one can derive the mathematical formulae that gave rise to the Random Decision Forest algorithm. Bootstrapping is the call of the day, which is only fitting in a assume nothing, test everything, let the data be your guide Lean Startup world.
Admittedly, some of these steps are easier written here than actually delivered, so as expected, companies will need to do their homework and identify the differences between these MLaaS providers with care. Well, I already did my due diligence, and now I am sprinting with BigML.