Skip to content

Machine Learning for MBAs? Yes, they can!

Two weeks ago, I had the chance to conduct a workshop at The University of California, Berkeley’s Haas School of Business as part of Professor Gregory La Blanc’s Data Science and Strategy class for MBAs and business leaders. This meant showcasing a subset of the comprehensive Machine Learning capabilities of the BigML platform such as Models (Decision Trees), Logistic Regressions and Ensembles while solving some example use cases centered around the predictive use cases of disease diagnostics and credit risk analysis. The best of it was that those in the classroom got to replicate those use cases in their own BigML accounts instead of passively observing.

Haas School of Business

Haas School of Business

According to the syllabus, the objective of the Data Strategy course is to provide an understanding of the role of data and statistical analysis in managerial decision-making with a specific focus on the role of managers as both consumers and producers of information, illustrating how finding and/or developing the right data and applying appropriate statistical methods can help solve problems in business. As such, the main focus areas are developing literacy within the potentially intimidating field of quantitative analytics and the ability to assess existing business models from that analytical prism.

As an MBA that has followed a career trajectory spanning highly data-driven roles such as marketing analytics, software product management, and business intelligence I have consistently the beneficiary of following an empirical approach informed by insights based on business data harvested from various systems of record.

Haas School of Business MBAs

After the workshop, I’m very encouraged to have seen the conviction and the resolve from tomorrow’s MBA candidates to own up to the “In God we trust, but all else bring data” mentality. In addition to that broader impression, I’d like to share some findings from an informal survey shared with the attendees.

  • The class had a good mix of those with technical degrees (engineering, math, etc.) and non-technical degrees.
  • Based on survey feedback, more than two-thirds of the class did not have any prior experience with Machine Learning whatsoever. The remaining ones had some limited exposure in the form of self-learning or a related class they took as part of their former technical education. With that said, none had practiced Machine Learning in their prior careers. All in all, they were newbies to Machine Learning.
  • On a very positive note, after the workshop, most respondents thought Machine Learning can be described as a more advanced form of analytics while some opined that it’s also increasingly a must-learn skill set for any white-collar professional. Interestingly, no attendees mentioned that Machine Learning is too complex and confusing or “overhyped” even though those were also offered as attitudinal choices. We’ve been observing this new behavior for multiple years now. Some refer to it as the Citizen Data Scientist movement even though I don’t much fancy that phrase but am fully in support of the core concept it represents.
  • Perhaps the most interesting feedback was related to the main motives in learning Machine Learning. Almost all respondents agreed that they would like to be able to better communicate with Machine Learning specialists or Data Engineers in their future jobs by having a good grasp of the core concepts of Machine Learning ( e.g., cut through ‘hype’ or jargon) as well as being self-sufficient when it comes to discovering insights in business data they have direct access to. Following those top two reasons was the perception that Machine Learning has become a highly desirable skill by employers potentially giving them an edge when re-entering the job market. Close behind that third motivation was the fact that some find Machine Learning intellectually stimulating regardless of its implications on their future career. I suspect those were skewed to the left-brained ones with technical degrees.
  • Last but not least, almost everyone in the classroom thought that they were likely to use BigML especially when they are considering a new predictive use case where they have access to relevant business data.

I predict future business leaders will follow in the footsteps of examples like NDA Lynn such that they won’t be afraid to autonomously initiate and execute their search for new business insights with or without help from scientists and/or researchers in their organizations. We’ll keep tirelessly promoting the promise and potential of Machine Learning and see how far we can take this prediction.

Machine Learning Summer School in The Netherlands: First Edition!

BigML and Nyenrode Business Universiteit are thrilled to announce the first edition of our Machine Learning Summer School in The Netherlands! The four-day event will take place at Nyenrode Business University, in Breukelen, and the program is designed to cater to different professional profiles and their needs:

  • Machine Learning for Executives – July 8 (day 1): A C-level course on Machine Learning, ideal for business leaders and senior executives in all industries. Attendees will be able to understand how Machine Learning can be adopted in any organization, focusing on the strategy to follow as well as the key points that managers should know when making decisions. Additionally, we will see several real-world success stories presented by companies that are currently applying Machine Learning techniques.
  • MAIN CONFERENCE: Introduction to Machine Learning – July 9 and 10 (days 2-3): A two-day crash course designed for business innovators, industry practitioners, as well as students, seeking a quick, practical, and hands-on introduction to Machine Learning to solve real-world problems. The content presented during these two days will serve as a good introduction to the kind of work that students can expect if they enroll in advanced Machine Learning and AI Masters.
  • Working with the Masters – July 11 (day 4): A full day of learning with the Machine Learning masters that helps put theoretical concepts into practice in a hands-on manner. This course is tailored for experienced business analysts, data scientists, and Machine Learning practitioners that wish to work on real-world data and real use cases; a unique opportunity to work with leading Machine Learning experts. Attendees will be able to bring their own data.

Where

Nyenrode Business Universiteit, Straatweg 25, 3621 BG Breukelen, The Netherlands. See map here.

When

4-day event: on July 8-11, 2019 from 8:30 AM to 5:00 PM CEST.

Tickets

Please purchase your ticket(s) here. We recommend that you register soon as space is limited. You can join the complete four-day event for a full experience or just the courses you find most interesting!

Schedule

You can check out the full agenda and other details of the event here.

Networking

Get to know the lecturers and speakers and other attendees during the networking breaks and dinners we offer after the sessions. We expect hundreds of locals as well as Machine Learning practitioners and experts attending from all around the world!

Do not hesitate to contact us at education@bigml.com if you would like to co-organize a Machine Learning School in your city, as we look forward to growing the Machine Learning Schools series!

Machine Learning Internship: Standing on the shoulders of giants

I took this photo at the Valencian Summer School in Machine Learning 2018. That was my second Summer School, but my first one as a BigML intern. My internship had just started few days ago. Since I published this tweet last September things changed a lot, but let me provide a context for it.

Internship tweet

What happened between both Summer Schools? I realized that almost all my viewpoints about Machine Learning were wrong.

I belong to the most adaptive and agile generation ever. People call us Millennials. We were born during the dot-com bubble, and we lived through the dot-com crash. We saw the first iPhone keynote and the transformation from taxis to Ubers and hotels to Airbnbs. We know hype well, and we’re starting to learn how to separate hype from real value. It was in my first Valencian Summer School, and more specifically, during the Enrique Dans talk, when I decided to unlearn everything I had been told previously about Machine Learning.

I forgot about killer robots, having machines replacing doctors or trying to build KITT. Instead, I started to think about finding patterns in data that can help doctors making decisions, reduce waste of energy or help to save lives by preventing disasters.

In the same way, I forgot about unaffordable GPUs, tons of hours of programming every single line of every single ML algorithm or the frustration of not being able to find the best hyper-parameters for my  model. Instead, I started to focus on the problem, not the tool, and let BigML do the rest for me. After all, why shy away from standing on the shoulders of giants?

And that was my philosophy during this internship. I got certified as a BigML Engineer, worked on multiple real-world use cases and created workflows with WhizzML to perform Feature Selection. And then, I met one of those giants to stand on, Jao, BigML’s CTO. With him, I started working on BigML’s backend, called wintermute.

I discovered the benefits of functional programming with Jao, and he even introduced me to the emacs religion! The experience I gained with WhizzML helped me to move forward and abandon the Algol family of languages. Clojuredocs was my homepage during those days, and it still is.

There is an interesting internal project in which I’ve been involved that I would also like to mention. It’s called Neuromancer. With Neuromancer, we can see how well our resources scale, beyond the Big-O notation. It let us test possible optimizations for all BigML’s models.

Pablo González

Looking back, the journey has been long, but this is only the beginning. Now, as a full-time employee of BigML, I will keep contributing to our mission of democratizing Machine Learning as it penetrates all corners of our globe. Just like a bamboo plant, we’ve planted it a while back on stable ground, and we now see a few new bamboo shoots growing each and every day. But, soon enough, when the roots are fully established underground, it will grow as crazy, positively impacting all Millenial careers for decades to come.

Deep Learning, Part 3: Too Deep or Not Too Deep? That is the Question.

In my previous two posts in this series, I’ve essentially argued both sides of the same issue. In the first, I explained why deep learning is not a panacea, when machine learning systems (now and likely always) will fail, and why deep learning in its current state is not immune to these failures.

Deep Learning

In the second post, I explained why deep learning, from the perspective of machine learning scientists and engineers, is an important advance: Rather than a learning algorithm, deep learning gives us a flexible, extensible framework for specifying machine learning algorithms. Many of the algorithms so far expressed in that framework give orders of magnitude-level improvement on the performance of previous solutions. In addition, it’s a tool that allows us to tackle some problems heretofore unsolvable directly by machine learning methods.

For those of you wanting a clean sound-byte about deep learning, I’m afraid you won’t get it from me. The reason I’ve written so much here is that I think nature of the advance that deep learning has brought to machine learning is complex and defies broad judgments, especially at this fairly early stage in its development. But I think it is worth it to take a step backward and try to understand which judgments are important and how to make them properly.

Flaky Machines or Lazy People?

This series of posts was motivated in part by my encounters with Gary Marcus’ perspectives on deep learning. At the root of his positions is the notion that deep learning (and here he means “statistical machine learning”) is, in various ways, “not enough”. In his medium post, it’s “not enough” for general intelligence, and in the synced interview it’s “not enough” to be “reliable”.

This notion of whether current machine learning systems are “good enough” gets to the heart of the back and forth on deep learning. Marcus cites driverless cars as an example of how AI isn’t mature enough yet to rely on 100%, and that AI needs a “foundational change” to ensure a safe level of reliability. There’s a bit of ambiguity in the interview about what he means by AI, but my own impression is that this is less of a critique of machine learning, and more of a critique of the software around it.

For example, we have vision systems able to track and identify pedestrians on the road. These systems, as Marcus says, are mostly reliable but certainly make occasional mistakes. The job of academic and corporate researchers is to create these systems and make them as error-free as possible, but in the long run, they will always have some degree of unreliability.

Something consumes the predictions of these vision systems and acts accordingly; it is and always will be the job of that thing to avoid treating these predictions as the unvarnished truth. If the predictions were guaranteed to be correct, the consumer’s job would be much easier. As it is, consuming the predictions of a vision system requires some level of cleverness and skepticism. Maybe that cleverness involves awareness of separate sensor systems or other information streams like location and time of day. It might require symbolic approaches of the type Marcus favors. It might require more and very different deep learning, as Yann LeCun suggests. It might require something that’s entirely new.

Designing software that works properly with machine-learned models is hard. You have to do the difficult work of characterizing the model’s weaknesses and engineering around them. But critical readers should reject the notion that machine learning needs to provide extreme reliability on its own in order to be useful in mission critical situations. If a vision system can accurately find and track 95% of pedestrians, and other sensors and logic pick up the remaining 5%, you’ve arrived at “enough” without having a perfect model.

When is “Enough” Enough?

So then the question becomes, “are we there yet?” with current ML systems. That depends, of course, on how good we think we need them to be for the engineers and domain experts to pull their outputs across the finish line. There are a lot of areas in which deep learning puts us in shouting distance, but it general, whether or not we’re there yet depends in turn on what you want the system to do and the quality of your engineers. When thinking about that question, though, it’s important to consider that the finish line might not be exactly where you think it is.

Consider the problem of machine translation. Douglas Hofstadter wrote a great article where he systematically lays bare the flaws in state-of-the-art machine translation systems. He’s right: For linguistic ideas with even a little complexity, they’re not great and are at times totally unusable. But the whole article reminded me of a blog post Hal Daumé III wrote more than 10 years ago when he and I were both recent Ph.D.’s In it, he wonders how much of human translation is better than computer translation when you really consider everything (street signs, menus, simple interpersonal interactions, and so on). Again, he asks this more than ten years ago.

The point here is that if machine translation for these things is already noticeably better than the second-rate human translations we apply in practice (or was ten years ago), there’s already a sense in which the models we have are very much good enough. How it deals with more complex phrases and ideas is an interesting question, and might yield new research directions, but this is all academic as far as its applicability is concerned. The existing technology, imperfect as it is, has a use and a place in society.

Even less relevant is how “deep” the model’s knowledge is, or how “stupid” it is, or whether the algorithm is “actually learning” (whatever that means). These are all flavors of the “computers don’t really understand what they’re doing” argument that traces its way through Hofstadter, John Searle,  Alan Turing and dozens of other philosophers all the way back to Ada Lovelace. There are loads of counter-arguments (I have even spun out a few of my own versions), but maybe the most compelling reason to ignore these questions is that the answers are often less interesting than the answer to the question, “Can we use it?”

A number of years ago, my wife and I hosted two members of a Belgian boys choir that was on tour. Neither she nor I spoke any French, so we relied on Google Translate to communicate with them. To this day, I remember typing “We made a pie.  Would you like some?” into my phone and watching their faces light up as the translation appeared. Did the computer understand anything about pie, or generosity, or the happiness of children, or how its own flawed translations could help create indelible memories?  Probably not.  But we did!

The Final Exam

Artificial Intelligence

The criticism that machine learning is not enough on its own to produce systems that exhibit reliably intelligent behavior is a broken criticism. Deep learning gets us part of the way towards such systems, perhaps quite a lot of the way, but does anyone think it’s necessary or even advisable to cede the entire behavior of, say, a car to a machine-learned model? Saying no doesn’t mean backing away from a fully-autonomous car; as Marcus himself points out, there are other techniques in AI and software at large that are better suited to certain aspects of these problems. There can be many layers of human-comprehensible logic sitting between deep learning and the gas pedal, and it’s likely the totality of the system, rather than the learned component alone, that will display behavior that we might recognize as intelligent.

Is it a flaw or a problem with deep learning when it can’t solve the aspects of these problems that no one really wants or needs solved?  I don’t think so. Again, paraphrasing Marcus (and myself), machine learning is a tool. If you buy a nail gun and it jams, then yeah, that’s a problem with the nail gun, but if you try to use a nail gun to cut a piece of wood in half, that’s more of a problem with you. Deep learning is a very important step forward in the evolution of the tool (and a large one compared to other recent steps), but that step doesn’t change its fundamental nature. No matter what improvements you make, a nail gun is never going to become a table saw. Certainly, it’s unethical and bad business for tool manufacturers to make inflated claims about their tool’s usefulness, but it’s finally the job of the operator to determine which tool to use and how to use it.

Pundits can argue all day long about how impactful deep learning is and how smart machine learning can possibly be, but none of those arguments will matter in the long run. As I’ve said before, the only real test of the usefulness of machine learning is if domain experts and data engineers can leverage it to create software that has value for other human beings. Therein lies the power, the only real power, of new technology and the only goal that counts.

Deep Learning, Part 2: Depth Charge

In the first in this series of posts, I discussed a bit about why deep learning isn’t fundamentally different from the rest of machine learning, and why that lack of a difference implies many contexts in which deep networks will fail to perform at a human level of cognition, for a variety of reasons.

Deep Learning Depth Charge

Is there no difference, then, between deep learning and the rest of machine learning? Is Stuart Russell right that deep learning is nothing more than machine learning with faster computers, more data, and a few clever tricks, and is Rich Sutton right that such things are the main drivers behind machine learning’s recent advances?

Certainly, there’s a sense in which that’s true, but there’s also an important sense in which it’s not. More specifically, deep learning as it stands now represents an important advance in machine learning for reasons mostly unrelated to the access to increasing amounts of computation and data. In the last post, we covered how understanding deep learning is the same as the rest of machine learning is the key to knowing some of the problems that deep learning does not solve. Here, let’s try to understand how deep learning is different and maybe along the way we’ll find some problems that it solves better than anything else.

How To Create a Machine Learning Algorithm

Before talking about machine learning, it’s important to know how existing machine learning algorithms have been created by the academic community. Oversimplifying dramatically, there’s usually a two-step process:

  1. Come up with some objective function
  2. Find a way to mathematically optimize that function with respect to training data

In machine learning parlance, usually, the objective function is basically any way of measuring the performance of your classifier on some data. So one objective function would be “What percent of the time does my classifier predict the right answer?”. When we optimize that objective, we mean that we learn the classifier’s internal workings so that it performs well on the training data by that measurement. If the measurement were “percent correct” as above, it means that we learn a classifier that gets the right answer on the training data all of the time, or as close to it as possible.

An easy, concrete example is ordinary least squares regression: The algorithm seeks to learn weights to minimize the squared difference between the model’s predictions and the truth in the data. There’s the objective function (the sum of the squared differences) and the method used to optimize it (ordinary least squares).

There are a lot of different variations on this theme. For example, most versions of support vector machines for classification are based on the same basic objective, but there are a lot of algorithms to optimize that objective, from quadratic programming approaches like the simplex and interior point methods to cutting plane algorithms, to sequential minimal optimization. Decision trees admit a lot of different criteria to optimize (information gain, Gini index, etc.), but given that criteria, the method of optimizing it is basically the same.

Importantly, these two steps don’t usually happen in sequence because one depends on the other: When you’re thinking of an objective function, you immediately put aside ones that you know you can’t mathematically optimize, because there’s really no use for such a function that you can’t optimize against the data that you have. It turns out that the majority of objective functions you’d like can’t be efficiently optimized mathematically, and in fact, much of the machine learning academic literature is devoted to finding clever objective functions or new and faster ways of optimizing existing ones.

From Algorithms to Objectives

Now we can understand the first-way deep learning is rather different from the usual way of doing machine learning. In deep learning, we typically use only one basic way of optimizing the objective function, and that’s using a family of algorithms known collectively as gradient descent. Gradient descent means a bunch of different things, but they all rely on knowing the gradient of the objective function with respect to every parameter in the model. This, of course, means calculus and calculus is hard.

Deep_learning

In deep networks, this is even harder because of the variety of things like activations functions, topologies, and types of connections. The programmer needs to know the derivative of the objective with respect to every parameter in every possible topology in order to optimize it, then they also need to know the ins and outs of all of the gradient descent algorithms they want to offer. Engineering-wise, it’s a nightmare!

The saving grace here is that the calculus is very mechanical: Given a network topology in all of its gory detail, the process of calculating the gradients is based on a set of rules and it’s not a difficult calculation, just massive and tedious and prone to small mistakes. So a whole bunch of somebodies finally buckled down, got the collection of rules together and turned that tedious process into computer code. The upshot is that now you can use programming to specify any type of network you want, including some objective, then just pass in the data and all of the calculus is figured out for you.

This is called automatic differentiation, and it’s the main thing that separates deep learning frameworks like Theano, Torch, Keras, and so on from the rest of computation. You just have to be able to specify your network and your objective function in one of these frameworks, and the software “knows” how to optimize it.

Put another way, instead of telling the computer what to do you’re now just telling it what you want.

More Than What You Need

It’s hard to overstate how much more flexible this is as an engineering approach than what we used to do. Before deep learning, you’d look at the rather small list of problems that machine learning was able to solve and try to cram your specific problem into one of those bins. Is this a classification or regression problem? Use boosting or SVMs. Is it a clustering problem? Use k-means or g-means. Is it metric learning? LMNN or RCA! Collaborative filtering? SVD or PLSI! Label sequence learning? HMMs or CRFs or even M3Ns! Depending on what your inputs and desired outputs look like, you choose an acronym and off you go.

This works well for a lot of things, but for others, there’s just not a great answer. What if your input is a picture and your output is a text description of that picture? Or what if your input is a sequence of words and your output is that sequence in a different language? Before, you’d say things like, “Well, even though it’s sequences, this is kind of a classification problem, so we can preprocess the input with a windowing function, then learn a classifier, then transform the outputs, then learn another classifier on those outputs, then apply some sort of smoothing” and so on. The alternative to this sort of shoehorning the problem into an existing bag was to design a learning algorithm from scratch, but both options come to much the same thing: A big job for someone with loads of domain knowledge and machine learning expertise.

With a deep learning framework, you just say “this is the input, here’s what the output should be, and here’s how to measure the difference between the model’s output and what it should be. Give me a model.” Does this always work? Certainly not; all of those acronyms are not going away anytime soon. But the fact that deep learning can often be made to work for problems of any of the types above with very little in the way of domain knowledge or mathematical insight, that’s a powerful new capability.

Perhaps you can also see how much more extensible this is vs. the previous way of doing things. Any operation that is differentiable can be “thrown onto the pile”, its rules added to the list of rules for differentiation and then can become a part of any network structure. One very important operation currently on the pile is convolution, which is the current star of the deep learning show. This isn’t a scientific innovation; that convolution kernels can be learned via gradient descent is almost 30 year-old news, but in the context of an engine that can automatically differentiate the parameters of any network structure, you end up using them in combination with things like residual connections and batch normalization which pushes their performance to new heights.

Maybe just as important as the flexibility of deep learning frameworks is the fact that the gradient descent typically happens in tiny steps, where a small slice of the training data updates the classifier at each one. This may not seem like a big deal, but it means that you can take advantage of effectively an infinite amount of training data. This allows you to use a simulator to augment your training data and make your classifier more robust, which is a natural fit in areas like computer vision and game playing. Other machine learning algorithms have the same sort of update behavior, but given the complexity of the networks needed to solve some problems and the amount of data needed to properly fit them, data augmentation becomes a game-changing necessity.

You’re probably starting to realize now that “Deep Learning” isn’t really a machine learning algorithm, per se. Closer to the truth is that it’s a language for expressing machine learning algorithms, and that language is still getting more expressive all the time. This is partially what our own Tom Dietterich means when he says that we don’t really know what the limits are for deep learning yet. It’s tough to see where the story ends if only because its authors are still being provided with new words they could say. To say that something will never be expressible in an evolving language such as this one seems premature.

Some Seasoning

Now, even considering the huge caveat that is the first post in this series, I’d like to put forth a couple of additional grains of salt. First, the above narrative gives the impression of complete novelty to the compositional nature of deep learning, but that’s not entirely so. There have been previous attempts in the machine learning literature to create algorithmic frameworks of this sort. One such attempt that comes to mind is Thorsten Joachims’ excellent work developing SVM-struct, which does for support vector machines much of what automatic differentiation does for deep learning. While SVM-struct does allow you to attack a diverse set of problems, it lacks the ability to incorporate new operators that constantly expand the space of possible internal structures for the learned model.

Second, admittedly, I may have oversimplified things a bit. The complexity of creating or selecting a problem-specific machine learning algorithm has not disappeared entirely. It’s just been pushed into the design of the network architecture: Instead of having a few dozen off the shelf algorithms that you must understand, each with five or ten parameters, you’ve now got the entire world of deep learning to explore with each new problem you encounter. For problems that already have well-defined solutions, deep learning’s flexibility can be more of a distraction than an asset.

The Proof Is In The Pudding

All of that said, it would be silly to talk about any of this if it hadn’t led to some absurd gains in performance in marquee problems. I’m old enough to have been a young researcher in 2010, when state-of-the-art results were claimed on CIFAR-10 at just over 20% error. Now, multiple papers claim error rates under 3% and even more claim sub-20% error on ImageNet, which has 100 times as many classes. We see similar sorts of improvement in object detection, speech recognition, and machine translation.

In addition, this formalism allows us to apply machine learning directly to a significantly broader set of possible problems like question answering, or image denoising, or <deep breath> generating HTML from a hand-drawn sketch of a web page.

HTML Deep Learning

Even though I saw that last one for the first time a year ago, it still makes me a bit dizzy to think about it. To someone in the field who spent countless hours learning how to engineer the domain-specific features needed to solve these problems, and countless more finessing classifier outputs into something that actually resembled a solution, seeing these algorithms work, even imperfectly, borders on a religious experience.

Deep learning is revolutionary from a number of angles, but most of those are viewable primarily from the inside of the field rather than the outside. Deep learning as it is now is not an “I’m afraid I can’t do that”-level advance in the field. But for those of us slaving away in the trenches, it can look very special.  Even the closing barb of Gary Marcus’ synced interview has a nice ring to it:

They work most of the time, but you don’t know when they’re gonna do something totally bizarre.

Really? Machine learning? Working most of the time? It’s music to my ears! I think Marcus is talking about driverless cars here, but roughly the same thing could be said of state-of-the-art speech recognition or image classification. The quote is an unintentional compliment to the community; Marcus doesn’t seem to be aware of how recent this level of success is, how difficult it was to get here, and how big of a part deep learning played in the most recent ascent. Why yes, it is working most of the time! Thank you for noticing!

With regard to the second half of that quote, clearly there’s work left to be done, but what does the rest of that work look like and who gets to decide? In the final post in this series, I’ll speculate on what the current arguments about machine learning say about its use and its future. So stay tuned…

Deep Learning, Part 1: Not as Deep as You Think

Gary Marcus has emerged as one of deep learning’s chief skeptics. In a recent interview, and a slightly less recent medium post, he discusses his feud with deep learning pioneer Yann LeCun and some of his views on how deep learning is overhyped.

I find the whole thing entertaining, but at many times LeCun and Marcus are talking past each other more than with each other. Marcus seems to me to be either unaware of or ignoring certain truths about machine learning and LeCun seems to basically agree with Marcus’ ideas in a way that’s unsatisfying for Marcus.

The temptation for me to brush 10 years of dust off of my professor hat is too much to ignore. Outside observers could benefit greatly from some additional context in this discussion and in this series of posts I’ll be happy to provide some. Most important here, in my opinion, is to understand where the variety of perspectives come from, and where deep learning sits relative to the rest of machine learning. Deep learning is both an incremental advance and a revolutionary one. It’s the same old stuff and something entirely new. Which one you see depends on how you choose to look at it.

The Usual Awfulness of Machine Learning

Marcus’ post, The Deepest Problem with Deep Learning is written partly in response to Yoshi Bengio’s recent-ish interview with Technology Review. In the post, Marcus comes off as a bit surprised that Bengio’s tone about deep learning is circumspect about its long term prospects, and goes on to reiterate some of his own long-held criticisms of the field.

Most of Marcus’ core arguments about deep learning’s weaknesses are valid and maybe more uncontroversial than he thinks: All of the problems with deep learning that he mentions are commonly encountered by practitioners in the wild. His post doesn’t belabor these arguments. Instead, he spends a good deal of it suggesting that the field is either in denial or deliberately misleading the public about the strengths and weaknesses of deep learning.

Not only is this incorrect, but it also unnecessarily weakens his top line arguments. In short, the problems with deep learning are worse than Marcus’ post suggests, and they are problems that infect all of machine learning. Alas, “confronting” academics with these realities is going to be met with a sigh and a shrug, because we’ve known about and documented all of these things for decades. However, it’s more than possible that, with the increased publicity around machine learning in the last few years, there are people out there who are informed about the field at a high-level while only tangentially aware of its well-known limitations. Let’s review those now.

What Machine Learning Still Can’t Do

By now, examples of CNN-based image recognition being “defeated” by various unusual or manipulated input data should be old news. While the composition of these examples is an interesting curiosity to those in the field, it’s important to understand why they are surprising to almost no one with a background in machine learning.

Consider the following fake but realistic dataset of eight people, in which we know the height, weight, and number of pregnancies for eight people, and we want to predict their sex based on those variables:

Height (in.) Weight (lbs.) Pregnancies Sex
72 170 0 M
71 140 0 M
74 250 0 M
76 300 0 M
69 160 0 F
65 140 2 F
60 100 1 F
63 150 0 F

Any reasonable decision tree induction algorithm will find a concise classifier (Height > 70 = Male else Female) that classifies the data perfectly. The model is certainly not perfect, but also not a terrible one by ML standards, considering the amount of data we have. It will almost certainly perform much better than chance at predicting peoples’ sex in the real world. And yet, any adult human will do better with the same input data. The model has an obvious (to us) blind spot: It doesn’t know that people over 5’10” who have been pregnant at least one time are overwhelmingly likely to be female.

This can easily be phrased in a more accusatory way: Even when given training data about men and women and the number of pregnancies each person has had, the model fails to encode any information at all about which sex is more likely to get pregnant!

It sounds pretty damning in those words; the model’s “knowledge” turns out to be incredibly shallow. But this is not a surprise to people in the field. Machine learning algorithms are by design parsimonious, myopic, and at the mercy of the amount and type of training data that you have. More problems are exposed when we allow the case of adversarially selected examples, where you are allowed to present examples constructed or chosen to “fool” the model. I’ll leave it as an exercise for the reader to calculate how well the classifier would do on a dataset of WNBA players and Kentucky Derby jockeys.

Enter Deep Learning, To No Fanfare At All

Deep learning is not different (at least in this way) from the rest of statistical learning: All of the adversarial examples presented in the image recognition literature are more or less the same as the 5’11” person who’s been pregnant; there was nothing like that in the dataset, so there’s no good reason to expect the model would get it right, despite the “obviousness” of the right answer to us. 

There are various machine learning techniques for addressing bits and pieces of this problem, but in general, it’s not something easily solvable within the confines of the algorithm. This isn’t a “flaw” in the algorithm per se; the algorithm is doing what it should with the data that it has. Marcus is right when he says that machine-learned models will fail to generalize to out-of-distribution inputs, but, I mean, come on. That’s the i.i.d. assumption! It’s been printed right there on the tin for decades!  

Use at your own risk

Marcus’ assertion that “In a healthy field, everything would stop when a systematic class of errors that surprising and illuminating was discovered” presupposes that researchers in the field were surprised or problems illuminated by that particular class of errors. I certainly wasn’t, and my intuition is that few in the field would be. On the contrary, if you show me those images without telling me the classifier’s performance, I’m going to say something like “that’s going to be a tough one for it to get right”.

In the back-and-forth on Twitter, Marcus seems stung that the community is “dismissive” of this type of error, and scandalized that the possibility of this type of error isn’t mentioned in the landmark Nature paper on deep learning, and herein, I think, lies the disconnect. For the academic literature, this is too mundane and well-known of a limit to bother stating. Marcus wants a field-wide and very public mea culpa for a precondition of machine learning that was trotted out repeatedly during our classes in grad school. He will probably remain disappointed. Few in the community will see the need to restate that limitation every time there’s a new advance in machine learning; the existence of that limit is a part of the context of every advance, as much as the existence of computers themselves. 

For communications with the public at large outside of the field, though, perhaps Marcus is right that such limits could take center stage a bit more often (as Bengio rightly puts them in his interview). Yes, it’s true! You can almost always find a way to break a machine learning model by fussing with the input data, and it’s often not even very hard! One more time for the people in the back:

People who think deep learning is immune to the usual problems associated with statistical machine learning are wrong, and those problems mean that many machine learning models can be broken by a weak adversary or even subtle, non-adversarial changes in the input data.

This makes machine learning sound pretty crummy, and again elicits quite a bit of hand-wringing from the uninitiated.  There are breathless diatribes about how machine learning systems can be, horror of horrors, fooled into making incorrect predictions!  They’re not wrong; if you’re in a situation where you think such trickery might be afoot, that absolutely has to be dealt with somewhere in your technology stack.  Then again, this is so even if you’re not using machine learning.

Fortunately, there are many, many cases where this sort of brittleness is just not that much of a problem. In speech recognition, for example, there’s no one trying to “fool” the model and languages don’t typically undergo massive distributional changes or have particularly strange and crucial corner cases. Hence, all speech recognition systems use machine learning and the models do well enough to be worth billions of dollars.

Yes, all machine-learned models will fail somehow. But don’t conflate this failure with a lack of usability.

Not Even Close

I won’t go as deeply into Marcus’ other points (such as the limits on the type of reasoning deep learning can do or its ability to understand natural language) in detail, but I found it interesting how closely those points coincide with someone else’s arguments about why “strong AI” probably won’t happen soon. That was written before I’d even heard of Gary Marcus and the relevant section is comprised mostly of ideas that I heard many times over the course of my grad school education (which is now far –  disturbingly far – in the past). Yes, these points are again valid, but among people in the field, they again have little novelty.

By and large, Marcus is right about the limitations of statistical machine learning, and anyone suggesting that deep learning is spectacularly different on these particular axes is at least a little bit misinformed (okay, maybe it’s a little bit different). For the most part, though, I don’t see experts in the field suggesting this.  Certainly not to the pathological levels hinted at by Marcus’ Medium post. I do readily admit the possibility that, amid the glow of high-profile successes and the public spotlight, that all of the academic theory and empirical results showing exactly how and when machine learning fails may get lost in the noise, and hopefully I’ve done a little to clarify and contextualize some of those failures.

So is that it, then? Is deep learning really nothing more than another thing drawn from the same bag of (somewhat fragile) tricks as the rest of machine learning?  As I’ve already said, that depends on how you look at it. If you look at it as we have in this post, yes, it’s not so very different. In my next post, however, we’ll take a look from another angle and see if we can spot some differences between deep learning and the rest of machine learning.

Linear Regression: A Technical Overview

BigML has added multiple linear regression to its suite of supervised learning methods. In this sixth and final blog post of our series, we will give a rundown of the technical details for this method.

Model Definition

Given a numeric objective field y, we model its response as a linear combination of our inputs x_1,\cdots,x_n, and an intercept value \beta_0.

y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \ldots + \beta_n x_n = \beta_0 + \sum_{i=1}^n \beta_i x_i

Simple Linear Regression

For illustrative purposes, let’s consider the case of a problem with a single input. We can see that the above expression then represents a line with slope \beta_1 and intercept \beta_0.

y = \beta_0 + \beta_1 x

The task now is to find the values of \beta_0, \beta_1 that parameterize a line which is the best fit for our data. In order to do so we must obtain a metric which quantifies how well a given line fits the data.

residuals

Given a candidate line, we can measure the vertical distance between the line and each of our data points. These distances are called residuals. Squaring the residual for each data point and computing the sum, we get our metric.

S = \sum_{i=1}^n (y_i - (\beta_0 + \beta_1 x_i))^2

As one might expect, the sum of squared residuals is minimized when \beta_0, \beta_1 define a line that passes more or less thorough the middle of the data points.

Multiple Linear Regression

When we deal with multiple input variables, it becomes more convenient to express the problem using vector and matrix notation. For a dataset with n rows and p inputs, define \mathbf{y} as a column vector of length n containing the objective values, \mathbf{X} as a n \times p matrix where each row corresponds to a particular input instance, and \mathbf{\beta} as a column vector of length p containing values of the regression coefficients. The sum of squared residuals can thus be expressed as:

S = ||\mathbf{y - X\beta}||_2^2

The value of \mathbf{\beta} which minimizes this is given by the closed-form expression:

\mathbf{\beta = (X^T X)^{-1} X^T y}

The matrix inverse is the most computationally intensive portion of solving a linear regression problem. Rather than directly constructing the matrix \mathbf{X} and performing the inverse, BigML’s implementation uses an orthogonal decomposition which can be incrementally updated with observed data. This allows for solving linear regression problems with datasets which are too large to fit into memory.

Predictions

Predicting new data points with a linear regression model is just about as easy as it can get. We simply take the coefficients \beta_0,\ldots,\beta_n from the model and evaluate the regression equation above to obtain a predicted value for y. BigML also returns two metrics that describe the quality of the prediction: the confidence interval and the prediction interval. These are illustrated in the following figure:

intervals

These two intervals carry different meanings. Depending on how the predictions are to be used, one will be more suitable than the other.

The confidence interval is the narrower of the two. It gives the 95% confidence range for the mean response. If you were to sample a large number of points at the same x-coordinate, there is a 95% probability that the mean of their y values will be within this range.

The prediction interval is the wider interval. For a single point at the given x-coordinate, its y value will be within this range with 95% probability.

BigML Field Types and Linear Regression

In the regression equation, all of the input variables x_n are numeric values. Naturally, BigML’s linear regression model also supports categorical, text, and items fields as inputs. If you have seen how our logistic regression models handle these inputs, then this will be mostly familiar, but there are a couple important differences.

Categorical Fields

Categorical fields are transformed to numeric values via field codings. By default, linear regression uses a dummy coding system. For a categorical field with class values, there will be n-1 numeric predictor variables x. We designate one class value as the reference value (by default the first one in lexicographic order). Each of the predictors corresponds to one of the remaining class values, taking a value of 1 when that value appears and 0 otherwise. For example, consider a categorical field with values “Red”, “Green”, and “Blue”. Since there are 3 class values, dummy coding will produce 2 numeric predictors x1 and x2. Assuming we set the reference value to “Red”, each class value produces the following predictor values:

Field value x1 x2
Red 0 0
Green 1 0
Blue 0 1

Other coding systems such as contrast coding are also supported. For more details check out the API documentation.

Text and Items Fields

Text and items fields are treated in the same fashion. There will be one numeric predictor for each term in the tag cloud/items list. The value for each predictor is the number of times that term/item occurs in the input.

Missing Values

If an input field contains missing values in the training data then an additional binary-valued predictor will be created which takes a value of 1 when the field is missing and 0 otherwise.  The value for all other predictors pertaining to the field will be 0 when the field is missing. For example, a numeric field with missing values will have two predictors: one for the field itself plus the missing value predictor. If the input has a missing value for this field, then its two predictors will be (0,1), in contrast, if the field is not missing, but equal to zero, then the predictors will be (0,0).

Wrap Up

That’s pretty much it for the nitty-gritty of multiple linear regression. Being a rather venerable machine learning tool, its internals are relatively straightforward. Nevertheless, you should find that it applies well to many real-world learning problems. Head over to the dashboard and give it a try!

Automating Linear Regressions with WhizzML & Python Bindings

by

This blog post, the fifth of our series of six posts about Linear regressions, focuses on those users that want to automate their Machine Learning workflows using programming languages. If you follow the BigML blog, you may already be familiar with WhizzML, BigML’s domain-specific language for automating Machine Learning workflows, implementing high-level Machine Learning algorithms, and easily sharing them with others. WhizzML helps developers create Machine Learning workflows and execute them entirely in the cloud. This avoids network problems, memory issues and lack of computing capacity while taking full advantage of WhizzML’s built-in parallelization. If you aren’t familiar with WhizzML yet, we recommend that you read the series of posts we published this summer about how to create WhizzML scripts: Part 1, Part 2 and Part 3 to quickly discover the benefits.

Screen Shot 2017-03-15 at 01.51.13To help automate the manipulation of BigML’s Machine Learning resources, we also maintain a set of bindingswhich allow users to work in their favorite language (Java, C#, PHP, Swift, and others) with the BigML platform.

Let’s see how to use Linear Regressions through both the popular BigML Python Bindings and WhizzML. Note that the operations described in this post are also available in this list of bindings.

The first step is creating Linear Regressions with the default settings. We start from an existing Dataset to train the model in BigML so our call to the API will need to include the Dataset ID we want to use for training as shown below:

;; Creates a linearregression with default parameters
(define my_linearregression
  (create-linearregression {"dataset" training_dataset}))

The BigML API is mostly asynchronous, that is, the above creation function will return a response before the Linear Regression creation is completed, usually the response informs that creation has started and the resource is in progress. This implies that the Linear Regression is not ready to predict with it right after the code snippet is executed, so you must wait for its completion before you can start with the predictions. A way to get it once it’s finished is to use the directive “create-and-wait-linearregression” for that:

;; Creates a linearregression with default settings. Once it's
;; completed the ID is stored in my_linearregression variable
(define my_linearregression
  (create-and-wait-linearregression {"dataset" training_dataset}))

If you prefer to use the Python Bindings, the equivalent code is this:

from bigml.api import BigML
api = BigML()

my_linearregression = \
    api.create_linearregression("dataset/59b0f8c7b95b392f12000000")

Next up, we will configure some properties of a Linear Regression with WhizzML. All the configuration properties can be easily added using property pairs such as <property_name> and <property_value> as in the example below. For instance, to create an optimized Linear Regression from a dataset, BigML sets the number of model candidates to 128. If you prefer a lower number of steps, you should add the property “number_of_model_candidates and set it to 10. Additionally, you might want to set the value used by the Linear Regression when numeric fields are missing. Then, you need to set thedefault_numeric_valueproperty to the right value. In the example below, it’s replaced by the mean value.

;; Creates a linearregression with some settings. Once it's
;; completed the ID is stored in my_linearregression variable
(define my_linearregression
  (create-and-wait-linearregression {"dataset" training_dataset
                            "number_of_model_candidates" 10
                            "default_numeric_value" "mean"}))

NOTE: Property names always need to be between quotes and the value should be expressed in the appropriate type, a string or a number in the previous example. The equivalent code for the BigML Python Bindings becomes:

from bigml.api import BigML
api = BigML()
args = {"max_iterations": 100000, "default_numeric_value": "mean"}
training_dataset ="dataset/59b0f8c7b95b392f12000000"
my_linearregression = api.create_prediction(training_dataset, args)

For the complete list of properties that BigML offers, please check the dedicated API documentation.

Once the Linear Regression has been created,  as usual for supervised resources, we can evaluate how good its performance is. Now, we will use a different dataset with non-overlapping data to check the Linear Regression performance.  The “test_dataset” parameter in the code shown below represents the second dataset. Following the motto of “less is more”, the WhizzML code that performs an evaluation has only two mandatory parameters: a Linear Regression to be evaluated and a Dataset to use as test data.

;; Creates an evaluation of a linear regression
(define my_linearregression_ev
 (create-evaluation {"linearregression" my_linearregression "dataset" test_dataset}))

Handy, right? Similarly, using Python bindings, the evaluation is done with the following snippet:

from bigml.api import BigML
api = BigML()
my_linearregression = "linearregression/59b0f8c7b95b392f12000000"
test_dataset = "dataset/59b0f8c7b95b392f12000002"
evaluation = api.create_evaluation(my_linearregression, test_dataset)

Following the steps of a typical workflow, after a good evaluation of your Linear Regression, you can make predictions for new sets of observations. In the following code, we demonstrate the simplest setting, where the prediction is made only for some fields in the dataset.

;; Creates a prediction using a linearregression with specific input data
(define my_prediction
 (create-prediction {"linearregression" my_linearregression
                     "input_data" {"sepal length" 2 "sepal width" 3}}))

The equivalent code for the BigML Python bindings is:

from bigml.api import BigML
api = BigML()
input_data = {"sepal length": 2, "sepal width": 3}
my_linearregression = "linearregression/59b0f8c7b95b392f12000000"
prediction = api.create_prediction(my_linearregression, input_data)

In both cases, WhizzML or Python bindings, in the input data you can use either the field names or the field IDs. In other words, “000002”: 3 or “sepal width”: 3 are equivalent expressions.

As opposed to this prediction, which is calculated and stored in BigML servers, the Python Bindings (and other available bindings) also allow you to instantly create single local predictions on your computer or device. The Linear Regression information will be downloaded to your computer the first time you use it (connectivity is needed only the first time you access the model), and the predictions will be computed locally on your machine, without any incremental costs or latency:

from bigml.linearregression import Linearregression
local_linearregression = Linearregression("linearregression/59b0f8c7b95b392f12000000")
input_data = {"sepal length": 2, "sepal width": 3}
local_linearregression.predict(input_data)

It is similarly pretty straightforward to create a Batch Prediction in the cloud from an existing Linear Regression, where the dataset named “my_dataset” contains a new set of instances to predict by the model:

;; Creates a batch prediction using a linearregression 'my_linearregression'
;; and the dataset 'my_dataset' as data to predict for
(define my_batchprediction
 (create-batchprediction {"linearregression" my_linearregression
                          "dataset" my_dataset}))

The code in Python Bindings that performs the same task is:

from bigml.api import BigML
api = BigML()
my_linearregression = "linearregression/59d1f57ab95b39750c000000"
my_dataset = "dataset/59b0f8c7b95b392f12000000"
my_batchprediction = api.create_batch_prediction(my_linearregression, my_dataset)

Want to know more about Linear Regressions?

Our next blog post, the last one of this series, will cover how Linear regressions work behind the scenes, diving into the technical implementation aspects of BigML’s latest resource. If you have any questions or you’d like to learn more about how Linear Regressions work, please visit the dedicated release page. It includes links to this series of six blog posts, in addition to the BigML Dashboard and API documentation.

Machine Learning Boosts Startups and Industry

BigML, the leading Machine Learning platform, and GoHub from Global Omnium join forces with a strategic partnership to boost Machine Learning adoption throughout the startup and industry sectors. This partnership helps the tech and business sectors apply Machine Learning in their companies, provides them with Machine Learning education and helps them remain competitive in the marketplace.

BigML can now enjoy the GoHub offices, the new open innovation hub created by Global Omnium (the leading company in the water sector) and first startup accelerator specialized in Machine Learning with the collaboration of BigML. BigML, with headquarters in Corvallis, Oregon. USA, and Valencia, Spain, offers since 2011 the leading Machine Learning platform that helps almost 90.000 analysts, scientists and developers worldwide implement their own predictive applications with BigML technology.

Plenty of startups and many business sectors already apply Machine Learning techniques to automate processes in HR departments to hire the right employees; predict demand to avoid inventory problems; perform predictive maintenance to avoid production loss; timely fraud detection; predict energy savings, among many other Machine Learning applications in the real world.

BigML and GoHub will launch a full program of Machine Learning activities and events that will focus on providing the best quality content for institutions, big corporations, middle and small companies as well as startups. To achieve this goal, there will be events in Valencia, across Spain, and in several countries that will be announced shortly and will explain the impact that Machine Learning is having and can have on businesses for industry, finance, marketing, Human Resources, security, and more. All of them to be explained by companies that are already working with this technology. Additionally, there will be Machine Learning workshops to help ease the learning curve of those companies that wish to learn and implement Machine Learning.

Moreover, this partnership will encourage the creation of new synergies between products and services from both parties, GoHub and BigML. An example of this is the IoT Industrial platform Nexus Integra, which is already creating its own predictive application with BigML.

Francisco Martin, BigML’s CEO highlights: “I’m very happy that in Valencia there are companies like Global Omnium that champion and work hard for these disruptive technologies to succeed, allowing local talent to produce high-quality exportable technology bringing wealth to the city. With that, young talent won’t have the urge to emigrate to other countries in order to find a good job (as I did some time ago), which is a very positive development.”

Patricia Pastor, GoHub Director says: “Innovation ecosystems, especially when there is a collaboration between startups and big corporations, are a great engine of growth and competitiveness. These kinds of partnerships are key to accelerating the potential of such ecosystems. Having BigML as a partner will allow the Valencian ecosystem to reach a higher level when it comes to disruptive technologies.”

Programming Linear Regressions

In this fourth post of our series, we want to provide a brief summary of all the necessary steps to create a Linear Regression using the BigML API. As mentioned in our earlier posts, Linear Regression is a supervised learning method to solve regression problems, i.e., the objective field must be numeric.

The API workflow to create a Linear Regression and use it to make predictions is very similar to the one we explained for the Dashboard in our previous post. It’s worth mentioning that any resource created with the API will automatically be created in your Dashboard too so you can take advantage of BigML’s intuitive visualizations at any time.

linear_regression_workflow

In case you never used the BigML API before, all requests to manage your resources must use HTTPS and be authenticated using your username and API key to verify your identity. Find below a base URL example to manage Linear Regressions.

https://bigml.io/linearregression?username=$BIGML_USERNAME;api_key=$BIGML_API_KEY

You can find your authentication details in your Dashboard account by clicking in the API Key icon in the top menu.

Screen Shot 2019-02-28 at 6.22.34 PM

The first step in any BigML workflow using the API is setting up authentication. Once authentication is successfully set up, you can begin executing the rest of this workflow.

export BIGML_USERNAME=nickwilson
export BIGML_API_KEY=98ftd66e7f089af7201db795f46d8956b714268a
export BIGML_AUTH="username=$BIGML_USERNAME;api_key=$BIGML_API_KEY;"

1. Upload Your Data

You can upload your data in your preferred format, from a local file, a remote file (using a URL) or from your cloud repository e.g., AWS, Azure etc. This will automatically create a source in your BigML account.

First, you need to open up a terminal with curl or any other command-line tool that implements standard HTTPS methods. In the example below, we are creating a source from a local CSV file containing some house data listed in Airbnb, each row representing one house’s information.

curl "https://bigml.io/source?$BIGML_AUTH" -F file=@airbnb.csv

2. Create a Dataset

After the source is created, you need to build a dataset, which serializes your data and transforms it into a suitable input for the Machine Learning algorithm.

curl "https://bigml.io/dataset?$BIGML_AUTH"
       -X POST
       -H 'content-type: application/json'
       -d '{"source":"source/5c7631694e17272d410007aa"}'

Then, split your recently created dataset into two subsets: one for training the model and another for testing it. It is essential to evaluate your model with data that the model hasn’t seen before. You need to do this in two separate API calls that create two different datasets.

  • To create the training dataset, you need the original dataset ID and the sample_rate  (the proportion of instances to include in the sample) as arguments. In the example below, we are including 80% of the instances in our training dataset. We also set a particular seed argument to ensure that the sampling will be deterministic. This will ensure that the instances selected in the training dataset will never be part of the test dataset created with the same sampling hold out.

curl "https://bigml.io/dataset?$BIGML_AUTH"
       -X POST
       -H 'content-type: application/json'
       -d '{"origin_dataset":"dataset/5c762fcd4e17272d4100072d", 
            "sample_rate":0.8, "seed":"myairbnb"}'
  • For the testing dataset, you also need the original dataset ID and the sample_rate, but this time we combine it with the out_of_bag argument. The out of bag takes the (1- sample_rate) instances, in this case, 1-0.8=0.2. Using those two arguments along with the same seed used to create the training dataset, we ensure that the training and testing datasets are mutually exclusive.

curl "https://bigml.io/dataset?$BIGML_AUTH"
       -X POST
       -H 'content-type: application/json'
       -d '{"origin_dataset":"dataset/5c762fcd4e17272d4100072d", 
            "sample_rate":0.8, "out_of_bag":true, "seed":"myairbnb"}'

3. Create a Linear Regression

Next, use your training dataset to create a Linear Regression. Remember that the field you want to predict must be numeric. BigML takes the last numerical field in your dataset as the objective field by default unless it is specified. In the example below, we are creating a Linear Regression including an argument to indicate the objective field. To specify the objective field you can either use the field name or the field ID:

curl "https://bigml.io/linearregression?$BIGML_AUTH"
       -X POST
       -H 'content-type: application/json'
       -d '{"dataset":"dataset/68b5627b3c1920186f000325", 
            "objective_field":"price"}'

You can also configure a wide range of the Linear Regression parameters at creation time. Read about all of them in the API documentation.

Usually, Linear Regressions can only handle numeric fields as inputs, but BigML automatically performs a set of transformations such that it can also support categorical, text and items input fields. Keep in mind that BigML uses dummy encoding by default, but you can configure other types of transformations using the different encoding options provided.

4. Evaluate the Linear Regression

Evaluating your Linear Regression is key to measure its predictive performance against unseen data.

You need the linear regression ID and the testing dataset ID as arguments to create an evaluation using the API:

curl "https://bigml.io/evaluation?$BIGML_AUTH"
       -X POST
       -H 'content-type: application/json'
       -d '{"linearregression":"linearregression/5c762c6b4e17272d42000617",
            "dataset":"dataset/5c762f3a4e17272d41000724"}'

5. Make Predictions

Finally, once you are satisfied with your model’s performance, use your Logistic Regression to make predictions by feeding it new data. Linear Regression in BigML can gracefully handle missing values for your categorical, text or items fields.

In BigML you can make predictions for a single instance or multiple instances (in batch). See below an example for each case.

To predict one new data point, just input the values for the fields used by the Linear Regression to make your prediction. In turn, you get a prediction result for your objective field along with confidence and probability intervals.

curl "https://bigml.io/prediction?$BIGML_AUTH"
       -X POST
       -H 'content-type: application/json'
       -d '{"linearregression":"linearregression/5c762c6b4e17272d42000617",
            "input_data":{"room":4, "bathroom":2, ...}}'

To make predictions for multiple instances simultaneously, use the Linear Regression ID and the new dataset ID containing the observations you want to predict.

curl "https://bigml.io/batchprediction?$BIGML_AUTH"
       -X POST
       -H 'content-type: application/json'
       -d '{"linearregression":"linearregression/5c762c6b4e17272d42000617",
            "dataset":"dataset/5c128e694e1727920d00000c",
            "output_dataset": true}'

If you want to learn more about Linear Regression please visit our release page for documentation on how to use Linear Regression with the BigML Dashboard and the BigML API. In case you still have some questions be sure to reach us at support@bigml.com anytime!

 

%d bloggers like this: