Skip to content

Introduction to Operating Thresholds

BigML’s upcoming release on Wednesday, January 31, 2018, will be presenting two new features: operating thresholds for classification models and organizations.  In this post, we’ll do a quick introduction to operating thresholds before we move on to the remainder of our series of 6 blog posts (including this one) to give you a detailed perspective of what’s behind the operating thresholds part of the release. We will then conclude the series with a post on how your company can collectively benefit from our new organizations feature.

Understanding operating thresholds

Operating Thresholds

Say you are building a classification model to predict loan risk to decide which loan applications should be approved and which ones should be denied as they represent a higher default risk than your company would be willing to underwrite. As usual, after many iterations and some clever feature engineering, your best model evaluation yields an F1-score of 0.85 (%85). Is this good or bad?  Should you present your results to your management without hesitation?  Of course, you can keep iterating forever, but this is real life and there are deadlines!

Well, in the case of most financial security portfolios even a few percentage points of defaults can turn an otherwise profitable outfit into a red ink generator. In this instance, chances are your management will lose some sleep over the use of Machine Learning in deciding the fate of their company.  Is it better to scrap the whole idea of being more data-driven by using advanced analytical tools and going back to good old rule-based loan approvals then?

Not so fast!  Luckily, operating thresholds can come to the rescue in a situation like this to fine tune your classification model to better integrate it based on your company’s risk appetite and the implied cost structure. In this particular example, assuming the “positive class” is bad loans (Approve? = No), your company has a much more important risk exposure in missing out bad loans when incorrectly predicting that they should be approved (i.e., False Positives) than it has in rejecting what would otherwise be good loans (i.e., False Negatives).

In the former case, you could lose hundreds of thousands of dollars (or even millions) with a single bad decision wiping away perhaps your entire loan portfolio profits.  In the latter, you may be leaving some money on the table by turning away a good loan (i.e., opportunity cost) but that probably will be measured by thousands (or tens of thousands) of dollars depending on the loan amount.  So there’s a magnitude of a difference between the two scenarios.

The trick then is to adjust the tradeoff between False Positives and False Negatives to lessen that chances of False Positives occurring by adjusting your model’s Operating Threshold values. This technique is especially useful for imbalanced datasets, where one or a few classes are the majority classes. As described in our loan risk example, unadjusted classification models tend to predict the majority classes at the expense of the minority (positive) class that is usually the class of interest. Keep in mind that the usefulness of thresholds is mainly about having false positive and false negative costs that are imbalanced, regardless of the class distribution. It so happens that imbalanced datasets in the real world tend to have asymmetric costs. But if you have a balanced dataset, the chance is still very high that you’ll need some kind of a threshold, because there’s a good chance the costs aren’t totally symmetric.

operating thresholds

BigML lets you adjust thresholds easily with a simple slider in the evaluation view of models, ensembles, deepnets or logistic regressions. The best Operating Threshold value that minimizes your risk and costs can change from one model to another so it’s best to judge the appropriate values on a case by case basis. However, if you have a pretty good idea of the costs involved with each of the classes, the search for the optimal threshold can very well be determined automatically with WhizzML — more on this option will be covered in the remainder of our blog series.

Pick from three types of thresholds

In BigML, operating thresholds are applicable when evaluating or predicting with your models. Classification models in BigML always return a confidence and/or a probability for each prediction, i.e., a percentage between 0% and 100% that measures the certainty of the prediction. Deepnets also return a probability measure. When evaluating or predicting with your model based on either the probability or confidence threshold for a selected (positive) class the model only predicts the positive class if the probability or the confidence is greater than the threshold set, otherwise it predicts the negative class.

Operating Threshold Example

In the simple example above, without setting any threshold, just by looking at the probabilities for each predicted class, Applicants #1 and #3 would be granted a loan. However, if we select “Approve=NO” as the positive class and we decide to set a probability threshold of 30%, the loan applications from Applicants #2 and #3 will both be denied, and only Applicant #1 will be approved.

In addition to confidence and probability thresholds, for decision forests, you can also set a vote threshold, i.e., a threshold based on the percentage of models in the ensemble voting for the positive class. The type of threshold (confidence, probability or vote threshold) for non-boosted ensembles can be configured before creating your evaluation or when making predictions.

Want to know more about operating thresholds and organizations?

Please join the FREE live webinar on Wednesday January 31, 2018, at 10:00 AM PT (Portland, Oregon. GMT -08:00) / 07:00 PM CET (Valencia, Spain. GMT +01:00), and stay tuned for the next blog post that will showcase a real example of operating thresholds used to reduce false negatives in telecommunications customer churn.

BigML Release and Webinar: Operating Thresholds and Organizations!

BigML’s first release of the year is here! Join us on Wednesday, January 31, 2018, at 10:00 AM PT (Portland, Oregon. GMT -08:00) / 07:00 PM CET (Valencia, Spain. GMT +01:00) for a FREE live webinar to discover the latest version of the BigML platform. We will be presenting two new features: operating thresholds for classification models to fine tune the performance of your models; and organizations, a convenient collaborative space that breaks down silos and makes it easy and efficient for any company to adopt Machine Learning across their entire corporate structure.

During fall 2017, the BigML Team has implemented operating thresholds, now available from the BigML Dashboard, API, and from WhizzML for automation. The ability to set an operating threshold is a key feature that allows you to tell BigML to be more or less aggressive when predicting a class for an instance. Setting the right threshold for the positive class when evaluating your classification models can be especially useful when your dataset has one class that is particularly rare. Because it makes up a small portion of the overall data, models will rarely predict this class by default. If this “minority class” is the class of interest (as is the case in fraud detection), setting an appropriate operating threshold can assure the model predicts the minority class with reasonable frequency. Applying operating thresholds is especially useful in domains like fraud detection or medical diagnosis, where the consequences of some classifications may have prohibitive costs associated.

The ultimate goal of any BigML resource is making predictions. Now, BigML provides three types of thresholds, one for each of the certainty measures that BigML offers for your classification model predictions: probabilities, confidences, and votes. All classification models (Decision Trees, Ensembles, Deepnets, and Logistic Regressions) return a per-class probability along with the prediction. Thus you can apply a probability threshold for the positive class for any model. Decision trees and non-boosted ensembles predictions also come with a per-class confidence, a pessimistic measure of the model certainty, so you can apply a confidence threshold for these models. Finally, only for non-boosted ensembles, BigML offers another metric called votes that takes into account the percentage of trees in the ensemble to predict a given class. As an alternative to the probability or the confidence, you can also apply a vote threshold for these models. These three metrics will be explained in detail during our webinar.

Finally, BigML is bringing organizations to the BigML Dashboard, a space where several customers can work on the same projects, using the same Dashboard, but at different levels of privileges. Organizations are ideal for teams that want to work collaboratively and more efficiently to get the most out of their Machine Learning models. This feature is available for customers that purchase any of the BigML subscription plans or Private Deployments.

Want to know more about operating thresholds and organizations?

Please join the FREE live webinar on Wednesday January 31, 2018, at 10:00 AM PT (Portland, Oregon. GMT -08:00) / 07:00 PM CET (Valencia, Spain. GMT +01:00), and stay tuned for the upcoming series of blog posts about operating thresholds and organizations!

10 Enterprise Machine Learning Predictions for 2018

With our 2018 Machine Learning predictions, we’re taking another shot at Machine Learning clairvoyance with some brand new calls while also upping the ante to serious “double dog dare you” territory by reiterating some of our previous calls.

We’d like to stress that the predictions made here are shared through a lense of “Machine Learning in the enterprise”. As such, we’re less concerned with predicting the twists and turns in the heady world of Machine Learning research and more concerned with the experience of the typical enterprise when looking to leverage the technology to reach its quarterly, annual or longer-term strategic business goals.

Remediating Enterprise ML False Starts

With that out of the way, let’s start by setting the tone based on some recent market research findings on the state of the industry as it stands now. It’s a well-known fact that the tech giants have put their dollars where their mouths are when it comes to acquiring Machine Learning/AI talent. In fact, McKinsey Global Institute (MGI) estimates up to $27 billion of the $39 billion poured into the category in 2016 came in the form of R&D and M&A investments by Top 35 high tech and advanced manufacturing companies — dwarfing investments from VCs or Private Equity firms. And their collective impact on ML research is undeniable. Browsing the accepted NIPS 2017 conference papers by author organization affiliation shows the likes of Google and Microsoft placed among leading universities: Google/DeepMind/Brain (210), Carnegie Mellon (108), MIT (93), Stanford (81), Berkeley (81), and Microsoft (70).

MGI’s report also reveals that the lion’s share of VC, Private Equity, and M&A activity has gone towards core Machine Learning technologies ($7b) with computer vision ($3.5b) a distant second and other niche AI areas like natural language ($0.9b), autonomous vehicles ($0.5b), smart robotics ($0.5b), and virtual agents ($0.2b) taking in much more modest sums.

With that level of investment, it’s fair to say the hopes are high for the future integration of Machine Learning into our economy. But here’s the stinker: the adoption rates are still far below potential. MGI reports “AI adoption outside of the tech sector is at an early, often experimental stage. Few firms have deployed it at scale. In our survey of 3,000 AI-aware C-level executives, across 10 countries and 14 sectors, only 20% said they currently use any AI-related technology at scale or in a core part of their businesses. Many firms say they are uncertain of the business case or return on investment. A review of more than 160 use cases shows that AI was deployed commercially in only 12% of cases.”

AI Adoption

To make matters more complicated, the adoption rates between industries are found to be very unbalanced, skewed toward large enterprises in software/internet, telecom, and fintech sectors. The study also finds significant profit margin discrepancies between proactive AI-adopter firms and others. This may very well point out to a correlation without causation, but another survey by the Economist Intelligence Unit had already found most strategic-minded executives aren’t willing to wait and take the risk of falling prey to more agile upstarts.

Exec AI Attitude by Economist

Against this complex backdrop, here are our Top 10 2018 predictions.  Some are for all practical purposes continuations of associated 2017 predictions, others yet are brand new calls:

 

  • PREDICTION #1:

Many more enterprise #MachineLearning dabblers will mature into true believers.

Tweet: PREDICTION#1: Many more enterprise #MachineLearning dabblers will mature into true believers. @BigMLcom https://ctt.ec/3_3P4

F. Chollet on ML Over:UnderestimationIn 2018, ML (“Machine Learning”) maturity will be the main theme for thousands of companies that have been dabbling in ML with limited pilot projects. According to McKinsey’s survey more than half of the enterprises that invested in AI/ML haven’t seen their investments pay off yet. As the wonderment about the technology gives way to a more objective outlook having digested the pros and cons, business leaders will collaborate more intently with their technical counterparts in order to double down on the existing efforts.

That ML isn’t necessarily plug-and-play is a reality most practitioners have come to learn through first-hand experience but is still unbeknown to many executives in the real economy. Especially moonshot-type initiatives almost always take quite longer than hoped for. So whether business executives will be patient enough to see real returns from such projects remains anybody’s guess. Meanwhile, more down to earth efforts targeting low hanging fruit areas will thrive.

 

  • PREDICTION #2:

Data do-over projects leave behind data lakes in favor of feature repositories.

Tweet: PREDICTION#2: Data do-over projects leave behind data lakes in favor of feature repositories. @BigMLcom https://ctt.ec/4fus2

The Death of Data LakesDespite what you may have heard about new techniques rendering feature engineering obsolete, experienced ML practitioners know better. Having realized a big part of failed ML/AI initiatives have to do with expensive data lakes of dubious value-add, CIOs and Chief Data Officers will pull the plug and instead accelerate data engineering efforts to create feature engineering repositories to support high-value predictive use cases. Some companies will go even further to augment internal efforts by licensing and/or outsourcing to domain-specific third-party feature repository companies. Easier to use shared feature repositories will be great success stories as they stop the need to re-invent the wheel with each bespoke application instead fostering cross-departmental collaboration. Add on top ML platforms offering more mature self-serve data wrangling capabilities and a much wider analytical audience in organizations will feel empowered to explore new use cases. This scenario still implies additional investment, but it will be economical enough to give rise to well thought out Machine Learning green shoots that ultimately justify the spend.

 

  • PREDICTION #3:

#MachineLearning talent rush will be balanced by re-skilling employees.

Tweet: PREDICTION#3: #MachineLearning talent rush will be balanced by re-skilling employees. @BigMLcom https://ctt.ec/Zrd04

ML ReskillingRecent years saw a Black Friday type rush for academic talent and research scientists with highly cited publications. Surely businesses with access to serious R&D dollars are still looking to fill their ranks with qualified candidates for what they consider industry-disrupting projects. In many cases, this means Deep Learning specialists e.g., reinforcement learning for robotics applications. As captivating as deep learning is, it’s still difficult to use and experts are few.

Therefore, in 2018, most CIOs and Chief Data Officers will look to re-skill their existing workforce to achieve broader ML-literacy. The average Data Scientist will be hard pressed to justify complex models as easily consumable core-ML platforms deliver higher quality baselines with much less effort. Data Scientists will not disappear, but may not be the sexiest job come year-end 2018.

 

  • PREDICTION #4:

Black box #MachineLearning approaches will overpromise and underdeliver.

Tweet: PREDICTION#4: Black box #MachineLearning approaches will overpromise and underdeliver. @BigMLcom https://ctt.ec/U5qY6

Black Box vs. White Box MLTo repeat our 2017 prediction, skilled humans will still be central to decision making despite further Machine Learning adoption. Fortune Global 2000 technologists will realize neither expensive consultants nor bringing in top academic talent will be a replacement for subject matter expertise in the form of a detailed understanding of both the business context and the value chain dynamics in their industries. The simple economics of Machine Learning point out that as the value of predictions near zero, the value of human judgment will be in more demand.

The tools that promise fully automated end-to-end ML overreach by limiting ways experts can intervene in the ML process by trying to fit every problem into the straightjacket of basic classification or regression modeling with hyperparameter tuning. Impactful Machine Learning is not comprised of comparing a bunch of similar algorithms based on garden-variety performance metrics, yet some companies will be finding that out the hard way.

 

  • PREDICTION #5:
  • MLaaS platforms will emerge as the “AI-backbone” for legacy companies.

Tweet: PREDICTION#5: MLaaS platforms will emerge as the “AI-backbone” for legacy companies. @BigMLcom https://ctt.ec/u6V1y

MLaaS UptakeMLaaS platform adoption will accelerate starting in “True Private Clouds” inside larger companies and in multi-tenant public cloud environments for medium-sized businesses and startups. The advantageous cost structure of such platforms as compared to expensive consultancy and custom applications combined with their right level of abstraction (i.e., ML building blocks and primitives at the right level atomization to achieve the Lego effect) will lend these platforms well for developers and ML engineers to design and deploy point applications at scale and much faster. Cloud Machine Learning platforms in particular will democratize Machine Learning by

  • significantly lowering costs by eliminating complexity or front-loaded vendor contracts
  • offering preconfigured frameworks that package the most effective algorithms
  • abstracting the complexities of infrastructure setup and management from the end user
  • providing easy integration, workflow automation and deployment options through REST APIs and bindings.

 

  • PREDICTION #6:
  • More developers than data scientists will introduce #MachineLearning into their companies.

Tweet: PREDICTION#6: More developers than data scientists will introduce #MachineLearning into their companies. @BigMLcom https://ctt.ec/7Ncfc

Wikibon - AI Form Factors for Developers

Developers will have a wealth of tools to leverage and yet little in the way of meaningful benchmarks, which will create some confusion and interoperability issues causing tensions with ML-specialists if and when they are available in the organization.  There is no winner in this argument as the much-required learning process continues. As the dust settles, Machine Learning and Software Engineering best practices will start fusing together to avoid technical debt and result in more precisely engineered and predictable end-user experiences.

The trend will be further accelerated by the availability of a growing number of specialized toolkits and SDKs optimized for vertical solutions (e.g., IoT meets ML with lightweight local predictions favoring simpler models e.g., anomaly detection, reinforcement learning) that will appear to get developers closer to an end-to-end smart app deployment experience with less and less handholding.

 

  • PREDICTION #7:
  • Throw everything and the kitchen sink open source approach to MLaaS will accrue toxic amounts of technical debt.

Tweet: PREDICTION#7: Throw everything and the kitchen sink open source approach to MLaaS will accrue toxic amounts of technical debt. @BigMLcom https://ctt.ec/6n85I

Too Many ML LibrariesThe number of possible ML techniques plus the variation and length of commercial ML pipelines together bring about an exponential number of possible combinations of algorithms. Even with massive computational power (e.g., thousands of servers), one can only ever be able to try and make work a tiny fraction of these. The truth is computational power will never truly replace cleverness (by either the algorithm or the expert) when searching through this space of possibilities. The current fashion of packaging together many disparate open source libraries and coding paradigms into a loosely integrated “Machine Learning suite” as promoted by major cloud service providers will certainly please some data scientists. They will feel at home in some of those “checkboxes” as they are simply cloud versions of the exact desktop or on-prem artifacts they are accustomed to.  Unfortunately, this myopic stance will fail to usher in the era of truly collaborative and inclusive enterprise Machine Learning due to its inherent complexity.

 

  • PREDICTION #8:
  • #MachineLearning model “Interpretability” is the new “Performance”

Tom Dietterich on Auto ML Risk

Some in the Machine Learning community treat interpretability as a nice to have in an effort to maximize other metrics related to model accuracy. However, this viewpoint represents serious risks in the business context as interpretability is the best debugging tool there is.  Algorithmic bias can easily creep in if we blindly trust the black boxes that we build. For example, a recent news story about Amazon’s same-day service in the U.S. has revealed that even seeming anonymized data can generate predictions that contain (in this case racial) bias in subtle ways through proxy variables.

In 2018, we expect more of these issues will make the headlines as European Union’s GDPR (European General Data Protection Regulation) goes into effect on May 25, 2018. GDPR is expected to have a major effect on current “Data Science” practices, with strict requirements that include the right to explanation (e.g., Can your Deep Learning model explain why this customer was denied credit?) as well as the prevention of bias and discrimination. This only means model transparency will become more and more important, both for users’ peace-of-mind and for legal/ethical reasons.

 

  • PREDICTION #9:
  • Deep Learning research will keep advancing linearly, but enterprise mass-adoption will lag due to costs and lack of talent.

Tweet: PREDICTION#9: Deep Learning research will keep advancing linearly, but enterprise mass-adoption will lag due to costs and lack of talent. @BigMLcom https://ctt.ec/M3NbE

F. Chollet on DL

Depending on the type of data you work with and the specific predictive use case, Deepnets may be the only game in town or an unnecessary and costly roundabout.  At BigML, we are of the opinion that Deepnet models should be part of the Machine Learning arsenal, thus the support for it in the platform.  Nevertheless, in 2018, the undeniable hype about DL research will likely do enterprise early adopters a disservice by pulling their attention away from more efficient and cost-effective baseline models and causing them to pour resources into specialized hardware and complex and/or unproven neural network architectures that are hard to operationalize and difficult to maintain even if access to rare DL expert is secured.

 

  • PREDICTION #10:
  • #MachineLearning will go global, but more talent will choose to stay local.

Tweet: PREDICTION#10: #MachineLearning will go global, but more talent will choose to stay local. @BigMLcom https://ctt.ec/Oegzj

AI Going Global

It’s no secret that countries like U.S., Canada, Australia, China have significant Machine Learning chops. But we predict that a more diverse group of Machine Learning technology and service providers that have been long overshadowed by the big tech nerve centers such as Silicon Valley, NYC, Boston and the Chinese megapolises will heat up the global competition against the likes of IBM and Accenture with more straightforward approaches able to deliver ROI quicker in their respective geographies.  This, in turn, will raise the global Machine Learning awakening for all types of organizations in Asia, Europe, and Latin America.  A resulting effect will be the ability to slow down and partially turn the brain drain tide to tech nerve centers that are suffering from affordability crises of their own.

Hope you enjoyed our mere mortal attempt to describe what might unfold in our industry later this year. Do you agree? What other trends are out there that we may have missed? Let us know your thoughts and experiences either in support of or countering the undercurrents we have summarized above and we’ll gladly learn from it.

Farewell to DEV Mode

Each new year is a new beginning inspiring us to think about new ways to improve BigML’s usability, which makes our platform broadly appealing to multiple user types. Having worked hard to finalize our product roadmap for 2018, part of that discussion naturally involved better ways to organize our user environments. Perhaps the most visible item we have decided to address on that front has to do with “DEV mode”. Effective Monday, January 22, 2018, Development mode will be removed from BigML Dashboard and API since we already support “Projects” to keep resources that belong together separate as needed. Anyone registering after the effective date won’t have to worry about this change, but we have some more information below for existing users.

To nip the obvious question in the bud, you will still have full access to your DEV resources as we will move DEV resources to the same environment as PROD resources under a new project named “BigML DEV Resources Archive”. That also means if you had a FREE BigML account or have been benefiting from a free account under our Education program, you will keep having uninterrupted access to all your resources without charge. If you already had DEV resources organized in various projects you’ve named yourself, we will preserve those projects as well. Furthermore, to avoid potential collisions in terms of project names between DEV and PROD environments, we will tag your migrated DEV projects as “dev-archive” so you can filter them easily by using the keywords “tags:dev-archive”.

Long-time users will remember that we launched DEV mode all the way back in 2012. The idea was simple in that we wanted to clearly demarcate a development sandbox of sorts, where users could quickly experiment with their new ideas free of charge. After all, as most of you are familiar with, Machine Learning is not a “one-shot and done” process that can be meticulously planned well in advance. On the contrary, it is a bit messy (as most creative endeavors are) and requires that you take a few steps in a new direction with each try. In hindsight, those were the days Machine Learning in the cloud sounded like “Life on Mars” does today.

BigML Free Version

Fast forward to 2018, and the world has changed quite a bit. Donald Trump is the president, Brad Pitt and Angelina Jolie are no longer together, and North Korea has nuclear weapons — oh my! We now have a variety of subscriptions including a FREE subscription that supports datasets up to 16MB and 2 parallel tasks. Sure, 16MB is not meant for a big database dump, you don’t get an SLA, your jobs can be queued, and your resources might be deleted after a while, but it still is an excellent low cost (as in “Nada!”) way of experiencing BigML with your own data. In fact, thousands of users have been taking advantage of it all along.

Given this, we have been observing that more people are getting confused between the DEV mode and the FREE plan asking questions like “If start free, can I move my resources from DEV mode to PROD later?” or “What happens to my resources if I stop paying for my subscription?” (In case you’re wondering, we don’t immediately delete your resources in the latter case.) The removal of DEV mode will clear all these concerns and further streamline the progression from a newly signed up user to paid subscriber driving significant business value from insights discovered on BigML. As for nomenclature, FREE is free (Duh!) our paid plans are now referred to as PRIME — easy enough!

So, here you have it, we will no longer support the DEV mode as of Monday, January 22, 2018. We’ll do our best to make this as smooth a transition for all our users as possible as outlined above. However, if you do have questions we may not have covered please feel free to reach out to us at support@bigml.com anytime.

Grading our Machine Learning Predictions for 2017

Some say the easiest way to make a fool of yourself is to try and predict the evolution of technology. As such, making predictions about a field as fast-moving as Machine Learning is definitely not for the faint of heart.  Nevertheless, in our line of business, it’s essential to anticipate the trends that will shape the future impact of Machine Learning across industries. So we’ll continue the tradition for 2018. However, to level set, we’ll start by grading our 2017 predictions and see if our 10 Offbeat Machine Learning Predictions held any water.

Grading Predictions

  • PREDICTION #1:

“Big Data” soul searching leads to the gates of #MachineLearning.

The premise was the underreporting of failed Big Data projects due to technical complexity and the resulting dubious ROI. Although Big Data as a buzzword has not completely disappeared from planet earth, it certainly has lost its luster in the conference and thought leadership circuit in 2017.  However, Machine Learning (and AI) have remained hot topics of interest throughout the year pretty much dominating the airwaves and digital channels in the business media. The industry is past the need to label today’s data “Big Data” much like we as a society are way past calling household electricity “high-voltage alternating current“.

No surprises here, so we have gotten this one R-I-G-H-T!

ML vs. Big Data

  • PREDICTION #2:

VCs investing in algorithm-based startups are in for a surprise.

2017 saw continued VC interest in all things Machine Learning, but for all practical purposes throwing money at any startup with the mention of the term has been abandoned by now.  A new type of wisdom on the role of VC money for AI/ML startups is starting to shaping up, which we feel took place more rapidly in favor of the Machine Learning as an enabler than we originally thought. This, no doubt, is a good thing for the longer term viability of the space that needs to move forward with actual products and real-life applications rather than slide decks and unverified claims of some world-beating algorithm.  In general, it’s healthy to remain skeptical of any such algorithm that is not published and peer-reviewed.

The verdict: we may have misjudged the speed of the funding dollars moving away from pure algorithmic outfits but the fact that there has been no significant exits from such companies in 2017 makes us think we got this trend partially R-I-G-H-T!

Benedict Evans on Machine Learning

  • PREDICTION #3:

#MachineLearning talent arbitrage will continue at full speed.

It’s safe to say that the media frenzy has shifted towards Bitcoin and other cryptocurrencies in the second half of the year giving AI and Machine Learning a breather.  AI/ML wouldn’t be able to match the roller coaster ride of Cagecoin even if they tried anyway. With that said, the talent hunt for more experienced academics and practitioners in the space is as heated as ever. Nearly all job market predictions for 2018 and beyond show a growing interest in such profiles by the likes of major corporates or their research labs.

So let’s put a check mark next to this prediction: R-I-G-H-T!

Machine Learning Talent Arbitrage

  • PREDICTION #4:

Top down #MachineLearning initiatives built on Powerpoint slides will end with a whimper.

We’re happy to report that 2017 had its fair share of PowerPoint slides showing a cyborg hand shake hands with that of a human — heartfelt congratulations to the fellow who took that stock photo by the way…he’s a winner!  The suits on stage reciting how many minutes of video is being uploaded to YouTube every minute or which distant star we would have reached by now if we stacked all the hard drives AWS uses in its data centers — not so much.  By now most top-down efforts have been realized for what they are: dead-ends.  The industry is prioritizing getting their hands dirty with their own data and investing further into their infrastructure in the hope of turning things around in 2018.

This prediction has fulfilled its mission: R-I-G-H-T!

Top Down Data Science Consulting Fail

  • PREDICTION #5:

#DeepLearning commercial success stories will be few and far in between.

Deep Learning frenzy continued in 2017 unabated. There are more courses on the subject and a greater number of open source tools and packages by the day.  Many bright minds still sing praises to its virtues and it continues to dominate research budgets and academic grants.  Part of this has to do with the recent advances in AlphaGo like systems conquering more board and video games. Beating the best Go-playing system in quick succession or beating expert human poker players are no easy feats for sure and they fully deserve the attention they enjoy even though some experts question whether they represent real breakthroughs. However, beyond those controlled experiments, there have been few blockbuster use cases for Deep Learning in the enterprise just yet. This is not to say corporations aren’t doubling down on Deepnets and exploring new areas to apply this somewhat enigmatic approach that still requires lots of trial and error.  Maybe 2018 will showcase tangible strides in interpretability issues or unsupervised approaches. Deep Learning is here to stay, but 2017 didn’t necessarily register as a banner year for Deep Learning in the enterprise.  It was more of a dip your toe in and see if the water is warm kind of experience for many companies that aren’t called Google especially because such systems call for specialized and expensive hardware even if the talent scarcity issue is addressed.

We’ll go with partially R-I-G-H-T!

Deep Learning Hype

  • PREDICTION #6:

Exploration of reasoning and planning under uncertainty will pave the way to new #MachineLearning heights.

Perhaps related to the eminence of Deep Learning, no serious diffusion was observed in a wide enough section of enterprises when it came to advances and applications in reasoning and planning.

Let’s just say that we were too early with this one: W-R-O-N-G!

Mark Zuckerberg's Jarvis AI

  • PREDICTION #7:

Humans will still be central to decision making despite further #MachineLearning adoption.

Despite all the hoopla about fully autonomous systems and the media’s unbelievable knack to cover news predicting doomsday scenarios with killer robots in charge, we’re still a long way from either being our daily reality.  Skilled humans are still heavily involved in every aspect of Machine Learning systems from data wrangling, exploration, model training to evaluation, deployment, monitoring, and maintenance.  Consider the highly visible example of the self-driving cars. We still haven’t been able to make it a reality even though significant strides have been made in the level of autonomy these vehicles have achieved. Full autonomy is a tricky goal as it has implications beyond the algorithms such as human factors, regulations, economic incentives, and externalities.

This prediction definitely held true in 2017: R-I-G-H-T!

s. Machine Intelligence

  • PREDICTION #8:

Agile #MachineLearning will quietly take hold beneath the cacophony of AI marketing speak.

Agile Machine Learning is not necessarily a buzzword at the moment, but a lot of companies new to the practice have seen the value of starting small with low hanging predictive use cases in their context instead of launching way too ambitious “AI strategies” without legs.  Many have already come to the realization that the iterative nature of Machine Learning projects is better suited to an experimental approach. Breakthrough moments ultimately take place if one steadfastly pursues the business objectives yet they are seldom delivered in a linear fashion. This situation makes it even more important to a) prioritize the right problems for the business, b) arm more of your subject matter experts with the right analytics capabilities so that they can conduct many experiments in parallel. There certainly are such value lessons learned in 2017, but the agile approach to ML is not yet the dominant model for all as the wonderment and hype phase wasn’t fully behind us as or year-end 2017.

So we’ll grade this prediction as partially R-I-G-H-T!

Lean, Agile Data Science Stack

  • PREDICTION #9:

MLaaS platforms will emerge as the “AI-backbone” for enterprise #MachineLearning adoption by legacy companies.

In 2017, more technology companies have gone on to reveal what makes their internal MLaaS platforms tick and why they’ve invested in them as much as they did.  Without a doubt, the common denominator between these companies is their understanding of the links between Machine Learning and their ability to constantly improve their core products and processes.  Hundreds if not thousands of legacy businesses with limited development resources have started evaluating cloud ML solutions such as BigML.  BigML’s service is now utilized by over 60,000 people and we do get regular inquiries from new users looking to adopt BigML companywide or across adjacent departments.

Still, we realize that we’re in the early stages of this sea change so we’ll give it a partially R-I-G-H-T!

Developer-driven Machine Learning

  • PREDICTION #10:

Data Scientists or not, more Developers will introduce #MachineLearning into their companies.

Have you noticed what was common in the developer events for AWS, Microsoft, Google, Salesforce etc.?  Yes, the need to get developers into Machine Learning!  More and more, it feels like the moment we as BigML can interject “We told you so!” However, we’ll let it pass as we’re happy to see this trend take hold and become the industry norm as it will go a long way to ease the Machine Learning talent bottleneck for many of businesses.

We were very much R-I-G-H-T on this one.

Machine Learning Platforms for Developers

Phew! This covers all our 2017 predictions.  Our total score: 7/10  Not too bad for a first shot.  Do you agree?  Were we too generous with our self-grading?  Let us know in the comments or on Twitter (@bigmlcom).

In the next post, we’ll delve into our Top 10 Predictions for 2018, so stay tuned!

Reflecting on BigML’s 2017 in Numbers

It’s hard to believe how fast 2017 has already gone by here at BigML. It has been a banner year with many firsts thanks to the Machine Learning freight train running on all cylinders across the global economy. Gone are the days, when we often found ourselves describing what Machine Learning is and why it matters for businesses. Instead, here we are in the closing days of 2017 exchanging ideas on new use cases Machine Learning can be applied towards with business leaders. When things happen so fast, one can sometimes find it a challenge to stop and reflect on milestones and achievements. So below are the highlights of what made 2017 a special year for BigML.

First off, the BigML platform crossed the 50,000 registered customer mark in 2017 and is already past 60,000. We also took the opportunity to launch a milestones page to better summarize the progression of BigML since the early days in 2011. Our users are all over the world making a difference in their workplaces, government agencies as well as educational institutions. Some of them have gone on to share their positive experiences and the impact Machine Learning has made in their contexts by extending testimonials.

Notably, 2017 also saw BigML add an institutional investor to its board as SAIC Motor took a strategic stake in BigML.  SAIC Motor Corporation Limited is the largest auto company on China’s A-share market, and its business covers the research, production, and sales of both passenger cars and commercial vehicles.

3 Major Releases + 15 Enhancements

In 2017, we brought BigML users many new sought-after features that made the platform as a whole more versatile than ever. Deepnets (Summer 2017 Release) and Boosted Trees (Winter 2017 Release) techniques were added to an already impressive array of supervised learning resources for tackling classification and regression problems, while Time Series (Spring 2017 Release) gave the platform a whole new dimension to help users gain insights from their time-based data for use cases such as demand forecasting.

Our most recent Deepnets Release deserves a special mention as it coupled this sophisticated algorithm with an unprecedented level of ease of use for even complete beginners to be able to train effective models matching or exceeding those from experts. This became possible thanks to the pioneering automatic network structure search options that have shown impressive benchmark performances that our VP or Machine Learning Algorithms, Dr. Charles Parker made public.

Deepnets Benchmark

Aside from those, we also made 15 smaller but noteworthy improvements to the BigML platform including but not limited to: Reify Complex Workflows, Email Notifications for Scripts, Resources Configuration Information and Evaluation Curves for Classification Models.  You can find a full list of enhancements on our What’s New page in case you’d like to try out the ones you may have missed.

Finally, we’ve also kept BigML Tools updated to make sure insights from BigML resources find their way to different platforms.  One such example is the newest version of our Predict App for Zapier, which we announced back in August 2017.

684,896 Code Changes via 115 Deploys!

All our releases and the new features were made possible due to some serious heavy lifting by our product development group, which make up a large percentage of our 31 FTE strong BigML Team.  Just to give a glimpse of the level of non-stop activity, our team updated our backend codebase 293,891 times, API codebase 83,552 times and our Web codebase 287,453 times.  These improvements and additions were carried out through 115 production deployments dotting the entire year.

36 Events in 5 Continents

As great as it is having an online offering that can be experienced for free with a simple sign up process, nothing can match the excitement of connecting with BigMLers in real life events to hear their stories and receive their feedback.

This year we continued the tradition of organizing and delivering Machine Learning schools with VSSML17 and BSSML17 with record attendance.

Industry events in Europe, North America, Latin America led the way, but 2017 also took our team members to locations in Asia and Australia with Smartcon 2017WSBI Innovation Workshop, and IJCAI 2017.

Specifically, BigML sponsored or co-organized 2MLa series of three events about the importance of protecting Intellectual Property in Spain, PAPIs Boston and São Paulo, where a variety of topics on the applications of real-world Machine Learning were showcased. For a full list of events  that we participated in feel free to visit our events page.

202 Ambassador Applications, 10 Certification Rounds

Our Education Program saw more growth in 2017, with the addition of 202 new applicants to help promote Machine Learning on their campuses. BigML ambassadors span the globe and include students as well as professors. Stay tuned for more ways to engage with fellow ambassadors in 2018 as we have plan to devote time to take the initiative to the next level. Adding to the list of BigML ambassadors contributing to the Machine Learning cause, 2017 proved to be a fruitful start for BigML’s Internship Program with our 5 first interns “graduating” with honors.We must also mention that in only its first year of existence, our team of instructors completed 10 rounds of BigML’s Certification Program passing on their deep expertise to newly minted BigML Engineers.  Not bad, eh?

77 Blog Posts (so far)

We’re proud that BigML remains one of the “bloggiest” Machine Learning companies around. 2017 saw 77 new posts added to our blog, which is recognized as one of the Top 20 Machine Learning blogs on the Web. Below is a selection of posts that drew a healthy level of attention and shares social channels in case you’re interested in some utilitarian holiday reading.

Looking Forward to 2018

Hope this post gave a good tour of what’s been happening around our neck of the woods. Given the acute need for positive ROI Machine Learning projects to justify data investments in companies, we expect that 2018 will be our busiest year in existence. As part of our commitment to democratize Machine Learning by making it simple and beautiful for everyone, we will be sharing more of our insights, customer success stories and all the new features we will bring you with each new release in 2018. Thanks for being part of BigML’s journey. As always, be sure to reach out to us with your ideas at support@bigml.com no matter how crazy they seem!

Come One, Come All! BigML Customers Share Testimonials With Our ML Community

As the adoption of Machine Learning as a Service continues to pick up, we want to take a moment to thank all our customers who have joined our mission to bring Machine Learning to everyone. We are excited to announce a BigML Customers page that shares testimonials from various users who have chosen BigML as their preferred tool for Machine Learning. From CEOs and Chief Scientists to analysts, software developers, and students, BigML enables anyone to become a master of their data, regardless if they have prior experience in Machine Learning. 

Today, BigML proudly serves over 57,000 customers from 161 countries around the world. Whether you have been with us for years, or you are new to BigML, we are glad to have you as a part of our global community of practitioners. 

BigML Customers

BigML not only helps large companies and organizations of all kinds, but we also actively support the teaching of Machine Learning with our Education Program. Over 600 universities worldwide use BigML and nearly 200 BigML Ambassadors actively promote the BigML platform on their campuses. The program continues to grow and help more students and educators every day. BigML’s Machine Learning Schools are another key component of our effort to bring Machine Learning to everyone. In addition to the events we host, BigML regularly participates in industry events worldwide, such as the upcoming Machine Learning Prague 2018 conference. See our dedicated events page for the full list.

Education Program and Events

With all that said, BigML’s growth wouldn’t be possible without YOU, so thanks for helping us democratize Machine Learning one step at a time! If you are interested in being featured on our Customers page and providing a testimonial, please contact us at marketing@bigml.com. If you are starting to learn Machine Learning with BigML or are working on a project, let us know how we can help at support@bigml.com. We always appreciate your feedback and like to hear how you are using the BigML platform.

 

Brazilian Entrepreneurs Meet at the BSSML17 and Bet on ML to Increase Business Competitiveness

The BigML Team continues traveling around the world to help democratize Machine Learning across geographies, industries, and organizations of all kinds. BigML, together with Sebrae and Telefónica Open Future_, brought to Curitiba, Parana, the fifth edition of our series of international Machine Learning Summer Schools, the second in Brazil, to prepare companies specialized in Information Technology to see business opportunities.

The one-day crash course took place this week, on November 29, and gave 45 entrepreneurs, programmers and students, from Paraná and other states in Brazil, a quick and practical introduction to Machine Learning. Subjects included supervised and unsupervised learning techniques, data transformations and feature engineering, as well as more advanced topics to learn how to automate Machine Learning workflows. The workshop provided practical knowledge of fast and useful techniques for how to use Machine Learning to improve performance and increase the competitiveness of companies. You can check the packed agenda on the BigML’s SlideShare account and visit the BSSML17 photo albums on Facebook and Google+ to see more pictures from the event.

The Chief Infrastructure Officer of BiGML, Poul Petersen, conducted all sessions and said that despite the technical theory, which involves mathematical calculations and statistical knowledge, the course balanced the content with educational and practical aspects that the tool entails. “We wanted to give a sense of how things work, in a simple way. Another relevant aspect is that the focus is not on the immediate result, but the long term … to prepare companies and professionals to succeed in the next five to 10 years,” he said.

A common and relevant question among participants and organizers was, “Are Machine Learning concepts applicable to small businesses?” Petersen’s response was emphatic: “Yes, they certainly are. Many companies that participated in the course were startups that often compete in the market with much larger and well-structured companies. And there they were gaining the knowledge to innovate and increase their competitiveness in the disputed Brazilian market. These companies, if they continue investing in knowledge, will certainly bother, in a good sense, the big players in the market,” he added.

“The concepts of Machine Learning are not restricted to technology companies. They will be incorporated into products and services in all branches of business. They make data correlations to project, with greater precision, what will happen in the future. That is why they are strategic and very important tools to generate competitiveness, boost growth and improve results,” summarized Julio Agostini, Sebrae/PR Operations Director. Regarding the Machine Learning Summer School in Curitiba, the BSSML17, Julio Agostini commented that “the difference of this course is that the attendees participate in a workshop, where they had practical experiences of how to apply the concepts to be able to evaluate the use in their businesses and to see opportunities for innovation, modernization of processes and development of new products and services to expand their activities.”

Pedro Riviere, head of strategic partnerships of Wayra at Telefónica Open Future_, said he was happy to contribute to the viability of the event. “Thanks to this global Telefónica project, Open Future_, which circulates in many countries and has already invested in projects of more than 700 startups in the world, including BigML, we managed to include Brazil in the script. It was a great learning opportunity for the Paraná entrepreneurs to have access to the content of BigML, the company that pioneered the creation of Machine Learning as a Service (MLAAS).

Among the attendees were relevant companies from Brazil such as Cinq, DP6, and Escotta Consulting, that wanted to delve deeper into learning leading Machine Learning tools like BigML to really get the insights of their data with ease in order to keep their businesses competitive in the marketplace. This great spirit only served to further validate BigML’s enthusiasm to continue organizing more Machine Learning Schools, so a big thank you to everyone for coming and following our educational activities! For more information on future Machine Learning Summer Schools please visit the dedicated page and stay tuned for future announcements!

BigML for Education: Getting to Know the BigML Ambassadors

BigML is actively being used in many educational institutions across the globe thanks to our Education Program. We would like to present several personal stories of our ambassadors on how they inspire their students or classmates that are looking to become more data-driven with a solid understanding of Machine Learning with BigML. Today we start with Iván Robles, an engineer in telecommunications with a Bachelor Degree in Mathematics that teaches Machine Learning at the EAE Business School and at ICEMD Business School in Madrid, Spain, while working for the telecom company Orange. Let’s get to know Iván a bit more!


BigML: How did you get into working with data and Machine Learning in particular?

Iván Robles: I love Mathematics and just after I finished my degree, back in 2006, a friend of mine told me about a company that used math to solve real-world problems. The idea of using math and statistics in my daily life was pretty exciting to me, so I applied for an open position and started working with them. Before 2006 I did not know what Machine Learning was, but since that moment I haven’t stopped working and learning in this field. Machine Learning has the perfect mix between math and programming, which really got my attention, especially when I could see that I was helping solve real-world problems in several areas such as marketing, networks, finance, among others. Naturally, this is what I currently teach to my students.

BigML: How do you see that Machine Learning is transforming the world?

Iván Robles: My students come from different companies, some of them are working for big corporations and others are entrepreneurs that are building their own company, and in both scenarios, they do invest resources in applying Machine Learning techniques to learn how to make decisions based on data instead of human intuitions alone. This, as well as all the news related to governments from many countries investing in Machine Learning, tells me we are on the right path to transforming all types of organizations into data-driven companies. This was non-existent a few years ago, which tells us that Machine Learning is already transforming many industries. Some good examples are self-driving cars, chatbots that answer questions automatically, and human-robot interactions, but in my opinion, this is just the beginning.

BigML: How do you currently apply Machine Learning?

Iván Robles: I teach Machine Learning in two business schools located in Madrid, the EAE Business School, and ICEMD. My students have different profiles, so my goal is to showcase different domains where they can apply Machine Learning. For instance, in my classes I use BigML’s time series models to find out the number of calls that a given call center will receive per day, to prepare budgets, and to forecast sales of a given product, among other examples. With classification models like BigML’s decision trees and ensembles, we analyze and predict churn, as well as what clients will buy a certain product based on their characteristics, and other similar use cases.

BigML: What’s the biggest advantage of applying Machine Learning in your field?

Iván Robles: As a professor that teaches Machine Learning, the main advantage that I see is that many non-experts can actually enjoy the benefits of Machine Learning without having to figure out exactly how the math behind the algorithms work. For example, Machine Learning allows you to analyze in minutes a multitude of data and relationships among them that without it would take years. This inspires my students and it certainly has a very positive impact in their business life.

BigML: What is your goal using BigML?

Iván Robles: I use BigML in my classes because unlike other Machine Learning platforms, BigML is very intuitive and accessible. It is built to not only make data scientists more productive, but to enable anyone to harness the potential of Machine Learning. I always like to showcase real use cases to my students that can be solved by applying Machine Learning techniques, therefore, at EAE Business School and ICEMD we take advantage of the BigML Education Program to accomplish that goal.

BigML: How do you find BigML different from other Machine Learning platforms?

Iván Robles: BigML is very easy to use, understand, and interpret, thanks to your powerful visualizations. This is obviously very much linked to your mission of democratizing Machine Learning. I can tell this clearly works in my classes because some of my students don’t have any technical background, yet they do understand BigML and they see how they can use the results obtained with the Machine Learning models they create. They really like the fact that they don’t need to compute or calculate anything, as BigML does it for them. They only need to drag and drop their data and in as little as a few clicks get the results they are looking for to continue building their projects, which is pretty awesome.

For that matter, I do see the tremendous added value of the BigML platform and totally identify it with your vision of democratizing Machine Learning. The ease of use allows everyone to be able to use Machine Learning in their projects. And the powerful visualizations generate a “wow” effect. They are very interactive, and really help us see how the problems are solved by the rules suggested by the Machine Learning algorithms. Indeed, it’s very impressive!

BigML: What is the reaction of your students when they use BigML?

Iván Robles: My classmates love BIGML! They also have other subjects like programming languages for instance, but especially the business profiles find these subjects more difficult. However, with BigML it is different because the non-technical students often mention that BigML is their saviour, as they get the expected results very easily without being an expert in Machine Learning programming. On the other hand, the programmers also enjoy using BigML because they find that with your platform they don’t need to invest as much time and effort as they do with other tools. In fact, for both types of students, when they need to work on their final projects, they prefer easier and capable tools like BigML.

BigML: Any advice you would give to our readers to get started with Machine Learning?

Iván Robles: Start with BigML, and maybe you don’t need anything else! I mean it, BigML covers a large variety of use cases that can easily solve real-world problems with very good results applicable to plenty of organizations.

We hope Iván’s story inspired you to become a BigML Ambassador. Stay tuned for future blog posts to get to know more BigML community members and their personal stories. If you are not part of the BigML community yet you can change that right now, simply register here for free! Also, for those working in academia or still studying, feel free to join our Education Program and apply here to become a BigML Ambassador which also empowers you to promote our platform on your campus. Thanks for helping us in delivering #MachineLearning made beautifully simple for everyone!

BSSML17: Machine Learning Summer School in Curitiba

If you follow the BigML blog, you may be familiar with our popular Machine Learning Schools held globally throughout the year. These crash courses are key to contribute to the democratization of Machine Learning and to produce a much larger group of ML-literate professionals such as developers, analysts, managers, and subject-matter experts, to help them remain competitive in the marketplace.

We are proud to announce the next course, the second edition of our Machine Learning Summer School in Brazil that will take place on November 29, 2017, in Curitiba, Paraná. The BigML Chief Infrastructure Officer, Poul Petersen, will be giving a one-day course ideal for industry practitioners, advanced undergraduates, as well as graduate students seeking a quick, practical, and hands-on introduction to Machine Learning. The Summer School, co-organized by Sebrae, BigML, and Telefónica Open Future_, will serve as a good introduction to the kind of work that students can expect if they enroll in Machine Learning masters.

Where?

Sebrae Building: Caeté Street, 150 – Prado Velho, Curitiba, Paraná, Brazil.

When?

November 29, 2017, from 8:30 AM to 6:30 PM BRST.

Apply now!

Find more details about the program and register here to attend the event before it’s fully booked, as space is limited.

BigML’s ML Schools Evolution

Since we ran the first edition of our Machine Learning Schools in Valencia, Spain, the interest of many ML practitioners has increased over the years. The best proof of that is in the applicant and attendee stats since the first edition.

In September 2015, at the very first Valencian Summer School in Machine Learning we welcomed 95 attendees coming from 7 different countries, and out of those, 92 were from Europe. 42 of them came from the academia representing 13 universities and the remaining 53 attendees came from 40 organizations.

One year later, in September 2016, we celebrated the second edition and went from 95 attendees we had in the first edition to 142 attendees from 19 different countries. We realized that the world was ready to learn and apply Machine Learning techniques more than ever. Out of the 142 attendees, 125 of them were from Europe, and out of those, 109 from Spain. We also had more business profiles join the event compared to the first edition. 39 attendees represented 21 universities versus the 82 remaining representing 53 organizations.

The positive response from the audience encouraged us to do more; that’s when we decided to organize the first Brazilian Summer School in Machine Learning. That event took place in São Paulo, in December 2016. We were surrounded by 202 Brazilian attendees coming from 6 different states such as Minas Gerais, São Paulo, Rio de Janeiro, Paraná, Santa Catarina, and Rio Grande do Sul. And again, many more attendees, 186, coming from private companies versus the 16 attendees from academic institutions.

This year, in September 2017, the BigML Team ran the third edition of our Valencian Summer School in Machine Learning, that brought together 204 attendees from 14 countries and 183 of them from Europe, mostly from Spain. Among the crowd, there were 45 attendees from 28 universities, and 159 from 92 organizations.

Now, with your help, it’s time to beat the records with the second edition of our Machine Learning School in Brazil, this year in Curitiba!

%d bloggers like this: