Skip to content

BigML Customer Success Highlights – Part 2

In this post, we continue revealing BigML customer success stories that we kicked off with our last post detailing how a number of startups are basing their smart applications and services on the BigML platform. Those companies have profited from adopting BigML rather than taking the costly and risky approach of trying to build their own Machine Learning infrastructure that could divert their attention away from their core predictive use cases.

Today, we get into a potpourri of business problems tackled with the help of the BigML platform by large multi-national businesses. We see multiple scenarios play out as businesses with global footprints go about consuming Machine Learning. This also holds true for the sample of predictive use cases outlined in this post as we give you a glimpse of the motivation behind solving each reference application.

BigML Customers

Industry-specific use cases

Every industry contains a portfolio of data-rich workflows as part of the associated core operations and standard practices. Hard-coded business rules or knowledge-based approaches tend to govern many of those processes leaving room for further improvement with the introduction of Machine Learning approaches that frequently yield dramatic increases in productivity.

  • Rabobank, one of the largest banks in The Netherlands, is a great example of such a use case. Rabobank was faced with the challenge of having to manually analyze a very large volume of payment transactions to guard against potential financial transaction fraud. A set of heuristics and business rules existed but were difficult and time-consuming to manage. The team tasked with the monitoring was overwhelmed with the number of payments flagged by existing systems. There had to be a smarter way to deal with this situation without having to lose the gains made so far or multiplying headcount continually. As a result, they chose to focus efforts on a new Machine Learning-driven approach letting the algorithms do the hard work of sifting through hundreds of thousands of transactions to reveal the highly anomalous ones. The resulting models were able to pinpoint problematic transactions in a highly accurate manner, which is why they were eventually embedded in Rabobank’s commercial fraud detection point solution. Fraud detection is not a “one and done” type problem so the models are continually monitored against covariate shift and are automatically refreshed as new data arrives in order to stay ahead of the fraudsters.
  • In a somewhat similar vein, Seagate, the world-renown manufacturer of computer hardware headquartered in Silicon Valley routinely manufactures and services millions of parts such as hard drives, which are covered under the company’s product warranty programs that can at times be abused by fraudsters that are always looking to game those programs by inventing schemes like returning counterfeit parts in the hope to receive back the genuine article. BigML-based fraud detection models have been able to successfully identify suspicious return patterns that have helped Seagate’s customer service and security teams focus their limited attention on truly anomalous instances while minimizing false positives that could negatively affect customer satisfaction metrics.

Enterprise support functions

Modern enterprises have complex ways to organize themselves into a multitude of functions, e.g., finance, marketing, sales, operations, legal, HR and more. Some of these functions are considered to be ‘core’ such as operations while others can be portrayed as ‘support’ functions. Because most companies that begin investing in Machine Learning have done so by creating central teams with advanced technical degrees, they tend to concentrate on a few use cases revolving around the core activities. This results in an imbalanced picture that starves ‘non-core’ functions of any Machine Learning capabilities save for basic ones baked into standard third-party SaaS tools.

  • Experiencing a similar challenge with their Human Resources function, ABN-AMRO chose to get on board with BigML to predict key employee metrics, e.g., likelihood to vacate positions in upcoming periods. With positive results in supporting ongoing retention efforts, this use case has proven that with a Machine Learning platform like BigML (and some training) any enterprise function or department can reskill employees and employ a self-serve analytical approach by creating custom workflows and optionally integrating the resulting predictions in relevant IT systems to better adapt to challenges they face.

B2B platform use cases

  • In addition to the above, there are certain situations that involve embedding predictive capabilities in platforms offering B2B services as the primary beneficiary.  In such instances, the need for automation is paramount besides the ability to offer analytical end-users of client businesses ways to visually interpret the underlying custom models they can build on the subject-matter B2B platform in a self-serve manner.  Dun & Bradstreet represents such a scenario as they have chosen to integrate BigML’s resources into their Analytics-as-Service B2B platform gaining time-to-market and scale while being able to control cost by fully automating workflows on behalf of their clients.

There are too many use cases to list here among those explored on the BigML platform either by our Private Deployment customers or by more than 100,000 registered users on our multi-tenant cloud platform offering a wide spectrum of subscription choices.

The main lesson learned here is that the Machine Learning consumption behavior of large organizations cannot be pigeonholed into a few perfunctory scenarios e.g., build vs. buy. The shades of grey do matter here. However, we can make some broad-based recommendations. Presented with such a foundational piece of technology that has the potential to eventually touch every operational process, businesses can benefit from a longer-term strategic approach to ML adoption rather than solely a use case-specific outlook saving the day with incremental improvements.

The latter approach may at times be satisfactorily implemented through third-party point solutions baking in some predictive capabilities generally based on standard data models such tools contain, e.g., predictive features baked into a CRM tool. Nonetheless, this piece-meal approach may fall short if further customization is desirable to better leverage custom data sources and may, in fact, result in unwanted system integration costs leaving host businesses with siloed bespoke systems.

If you have a similar business problem as the above or have an idea of a new and potentially game-changing analytical use case in your industry, be sure to get in touch with us at We can swiftly match you with a BigML expert, who can help you better formulate your approach by advising you on your data strategy, modeling (and evaluation) strategy, as well as your run-time prediction and deployment strategies.

In short, the BigML team is ready to help you have a merry Machine Learning-filled new year in 2020!

Registration Open for 2nd Edition of Machine Learning School in Seville: March 26-27, 2020

Based on the successful reception of our First EditionBigML, in collaboration with the EOI Business School, is launching the Second Edition of our Machine Learning School in Seville, which will take place on March 26 and 27, 2020. The #MLSEV will be an introductory two-day course optimized to learn the basic Machine Learning concepts and techniques that are impacting all economic sectors. This training event is ideal for professionals that wish to solve real-world problems by applying Machine Learning in a hands-on manner, e.g., analysts, business leaders, industry practitioners, and anyone looking to do more with fewer resources by leveraging the power of automated data-driven decision making.

Machine Learning School in Seville, 2nd Edition

Besides the basic concepts, the course will cover a selection of state-of-the-art techniques with relevant business-oriented examples such as smart applications, real-world use cases in multiple industries, practical workshops, and much more.


EOI Andalucía, Leonardo da Vinci Street, 12. 41092. Cartuja Island, Seville, Spain. See the map here.


2-day event: March 26-27, 2020 from 8:30 AM to 6:30 PM CET.


Please complete this form to apply. After your application is processed, you will receive an invitation to purchase your ticket. We recommend that you register soon, space is limited and per previous editions, the event may sell out quickly.


Lecturer details can be accessed here and the full agenda will be published as the event nears.

Beyond Machine Learning

In addition to the core sessions of the course, we wish to get to know all attendees better. As such, we’re organizing the following activities for you and will be sharing more details shortly:

  • Genius Bar. A one-on-one appointment to help you with your questions regarding your business, use cases, or any ideas related to Machine Learning. If you’re coming to the Machine Learning School in Seville and would like to share your thoughts with the BigML Team, be sure to book your 30-minute slot by contacting us at  
  • Fun runs. We will also go for a healthy and fun 30-minute run after the sessions. Details on the meeting point and time will follow. Stay tuned!
  • International networking. Meet the lecturers and attendees during the breaks. We expect hundreds of local business leaders and other experts coming from several regions of Spain as well as from other countries around the globe.

We look forward to your participation in our first Machine Learning school of the next decade!


BigML Customer Success Highlights – Part 1

Our post on the 100,000 registered customers milestone this summer included an infographic of sample use cases being explored by BigML users, which naturally span many different sectors and industries. Today, we’d like to start a series of posts that further highlight a subset of those business problems to give our readers some clues on how a comprehensive platform as ours can be utilized in different business contexts in case they’re considering new Machine Learning solutions.

BigML Use Cases

There are many ways to organize use cases, e.g., by industry, function, geography. In this post, we will focus on startups and SMBs as we give you a glimpse of the motivation behind solving each reference use case. In a later post, we’ll concentrate on large multinational companies also finding success with the BigML platform.

Startups and SMBs have good reasons to prefer the BigML platform because it lets them to affordably step into Machine Learning with ample room to further scale efforts as data volumes and the number of use cases implemented grow over time. Some startups have products and services that cannot even be launched without Machine Learning at their core (e.g., sensor-based medical diagnosis), whereas others grow into Machine Learning as they realize they are sitting on top of a hard-to-replicate and/or completely unique dataset that can fuel high-value predictive use cases that help differentiate their existing products.

Once useful models are in place, the systems integration and deployment choices are multiple. On the lighter side, predictions can be served in real-time through the BigML REST API and be included in a customer-facing user interface, say for instance in a given module like the product or next-best-action recommendations. On the other hand, if the end-user is expected to interact with and interpret the models first hand (rather than just consuming their predictions), the visualizations from BigML models can be made available in the host application in a white-label manner.

Predictive use case examples at ML-driven startups

Startup ML Use Cases

  • Juriblox B.V. is a European startup active in the legal services space. Their SaaS solution takes care of an oft-overlooked aspect of legal contract review and management: non-disclosure agreements (aka NDAs).  The Juriblox service named NDALynn can quickly grade any NDA uploaded by its subscribers to let them know not only the overall aggressiveness of the subject-matter NDA but also highlights specific clauses that are likely to cause problems down the road. All of these predictive capabilities baked into their web user interface are made possible thanks to a number of BigML models tapped into via the BigML API. Juriblox achievement is especially remarkable given that they didn’t have a data scientist or other highly-paid dedicated analytical expert on staff. This example shows that a group of subject-matter experts with access to relevant data and armed with a good understanding of their customers’ context coupled with a developer team can deploy sophisticated Machine Learning systems that are core to their offering.
  • Another BigML customer,, helps B2C businesses optimize demand generation by combining their customers’ CRM data with their national database containing key traits on over 125 million U.S. households. Faraday customers have been able to attribute as much as 1/3 of their sales to the ML-driven cross-channel campaigns addressing all stages of the B2C revenue lifecycle from customer acquisition to upsell and retention, e.g., social media advertisement performance comparable to the best targeting that Facebook ML models can support.
  • On the other hand, Frogtek helps Mexican micro-retailers to better control and grow their businesses as the company’s point-of-sale (POS) systems register every transaction. This data is a boon for Consumer Packaged Goods companies that are starved for visibility into consumer behavior and preferences to optimize operational efficiencies such as inventory management with Machine Learning.
  • The potential applications of ML to automate accounting are many. For example, Anfix, a Spanish startup, can help clients predict the correct expense account that a given invoice belongs to. Before Machine Learning, this process could only be performed by an accountant with an in-depth understanding of the company operations. The automation of such bookkeeping tasks allows financial professionals to use their time to focus on other activities that either result in more value to their customers or help them find new customers. Additional predictive efforts include determining in advance whether a given company will run out of money at some point in time, allowing scenario planning based on short, mid, or long term funding options. Knowing this information, the company can anticipate the negotiations with a bank to get a loan under more advantageous terms.


We hope these use cases give you some ideas about the wide range of Machine Learning opportunities in your setting. Please stay tuned as we will follow up this one with use case examples from large multinational companies in our next post.

Are you a manager or professional at a startup (or SMB) evaluating your options to better take advantage of your proprietary data sources by implementing Machine Learning systems to integrate predictions into your value proposition?  Be sure to get in touch with us at We can swiftly match you with a BigML expert, who can help you better formulate your approach by advising you on your data strategy, modeling (and evaluation) approaches, as well as your run-time prediction and deployment options.

Accelerating Machine Learning Adoption in the Automotive Industry

A few weeks ago, I had the chance to participate in the Ford Innovation Day organized by BigML partner Thirdware. The two-day event included innovation projects ranging from conversational agents to predictive maintenance systems leveraging Machine Learning.


My presentation was titled the same as the title of this blog post, thus mainly concentrating on the prospects of Machine Learning for automotive companies. In some ways, the automotive industry is not that different from many other industries of the global economy as it has been struggling to find its footing when it comes to putting Machine Learning front and central at scale. That’s not necessarily just due to a lack of meaningful investment either.

Let’s take a step back and take a look at the broader trends first, McKinsey Global Institute finds in its forward-looking Vision 2030 report on the automotive industry that the next decade will bring about slow (2%) growth in the traditional vehicle sales and related aftermarket services. To boot, most of this growth in the traditional business segments will occur in emerging markets driven by demographic and macroeconomic factors. Yet the global automotive industry revenue is expected to increase by $1.5T (+30%) thanks to new business models such as shared mobility and connectivity services materializing by 2030. As a side note, shared mobility examples include car-sharing and e-hailing while data connectivity services include specialized applications such as entertainment, remote services, and subscription-based software upgrades. In fact, 10% of cars sold in 2030 are expected to be shared vehicles adding to special purpose fleets and mobility-as-a-service solutions popular in dense urban areas. Various flavors (e.g., hybrid, plug-in, battery-electric, fuel cell) of Electric Vehicles will make up to 50% of vehicles sold by that time!

To make this new landscape possible new competing ecosystems with more diverse players will need to emerge to deliver a much more integrated customer experience. However, the common denominator of this future vision seems to be highly integrated intelligent software applications giving way to data-driven insights acting as the connective tissue in between. Permissioned data becomes the new currency of collaboration, software becomes much more central to everything and it doesn’t take a genius to figure out that Machine Learning gets to play a big role to play in this scenario.

Sounds great, right? Not so fast!

Back to today’s reality, another 2018 report, this time from CapGemini, found only modest gains (quantified as an increase from 7% to 10% of surveyees year over year) of ML systems deployed at scale among automotive OEMs, suppliers, and dealers. Yet 80% of respondents mentioned Machine Learning as a strategic initiative. There’s a tremendous gap between 10% and 80% that’s worth re-emphasizing.

ML is Hard

Potential use cases for automotive companies are many, touching both core operations (e.g., Predictive Maintenance, Supplier Risk Management) as well as support functions like Sales, Marketing, Finance or HR. So what explains the slow uptake? Part of this outcome has been shaped by previous expensive and failed attempts by these companies in dealing with the inherent complexity of Machine Learning. In response to this, most industry players seemed to have changed tack to apply a more measured approach in selecting use cases and projects. The surveyed companies that deployed at least three use cases at scale across their entire operations were dubbed as “Scale Champions” in the report, which frankly is a pretty low bar considering the true upside. Whatsmore, the “champs” did a markedly better job with the re-skilling of their workforces and putting in place a Machine Learning governance process.

Tale of Two Innovations

There’s something to be said of this top-down approach defined often by executive mandates and buttressed by committees defining and prioritizing the use cases of interest, identifying risks and the rules of the road. Things then get handed down to IT teams and Data Scientists to implement and roll out to production in collaboration. It’s certainly possible to make headway through this waterfall-like modus operandi albeit at greater cost and a slower speed.

However, there’s also a newly emerging bottom-up approach that is synergistic. Thanks to a new set of easy-to-use MLaaS tools with low-code visual interfaces and built-in AutoML capabilities subject matter experts, analysts and even business folks can be upskilled faster to autonomously explore their own predictive ideas that would otherwise go unexplored altogether. In this decentralized model of embedding ML in many more business processes, standardized workflows, and RESTful APIs play a critical role in deploying the worthy predictive models with high signal to noise ratios to production systems, thus eliminating the need to rewrite them from scratch with heavy IT involvement. As a bonus, the fact that a working ML governance framework from the previous waterfall projects exists serves to make this agile approach even more effective in managing related organizational risks.

This new way of thinking seems to be gathering steam with more thought leaders in the industry who are already singing the praises of an elevated level of accessibility. Take for instance Andrew Moore, who proclaimed:

“After years of hype around mysterious neural networks and the Ph.D. researchers who design them, we’re entering an age in which just about anyone can leverage the power of intelligent algorithms to solve the problems that matter to them. Ironically, although breakthroughs get the headlines, it’s accessibility that really changes the world. That’s why, after such an eventful decade, a lack of hype around machine learning may be the most exciting development yet.

We wholeheartedly agree!




How Machine Learning will Transform the Automotive Industry

NOTE: The following is a guest blog post authored by Kristin Slanina, Chief Transformation Officer with the BigML sales and delivery partner, Thirdware.

When most people think of Machine Learning in automotive, it’s in relation to how it can help in plant operations – predictive maintenance, diagnostic predictions, process optimization, etc. Effective use of Machine Learning at plants can significantly save costs, improve quality and minimize downtime. All positive things!

Automotive Factory

Automakers are in the midst of an amazing opportunity to transform themselves. As exponential technologies are changing the way we move today and especially in the future, automakers are looking at new business models and services to help move people and goods in more novel and different ways. Machine Learning can play an important role in how this shapes out. These new service offerings OEMs are piloting are new areas of play for them, so guidance from Machine Learning on the market opportunity and target audience (e.g., which generation or generations) can guide them in employing the right business model for the right area.

For example, cities are very unique and differ in significant ways around the world with some common threads as well. A service that works well in one city might fail in another. These data-driven insights can truly hone the investment and market strategy as well as the scaling of these services. Competition is fierce and to the extent that Machine Learning can minimize “trial and error” pilots that most OEMs are currently conducting in favor of fewer but primed-for-success innovation projects, it can be truly transformational.

On the other hand, enterprise support functions such as human resources can also gain significant new capabilities powered by Machine Learning. The war on talent is real and predictive models can help automakers in hiring, predicting employee attrition and employee benefit personalization that can all aid in attracting and retaining the right kind of talent for delivering on these new needs.

As the Automotive industry moves from a 100-year-old traditional product industry to redefine itself in this era of mobility services and as companies and society as a whole collect more and more data, how to synthesize and utilize that in real-time will be key to success. Those that can figure out how to truly leverage Machine Learning will put themselves in a position to drive how future automotive services ecosystems will deliver value to the consumers. Fasten your seat belts!

Automated Decision Engineering for Everyone: INFORM and BigML to enable Next Generation Data-driven Applications

We are happy to report that INFORM GmbH and BigML have agreed to terms on a preferred partner program to further ingrain best practice machine-learned algorithms to the daily fight against financial crime with RiskShield.

Inform Gmbh & BigML

We have collectively recognized great synergies between our respective technologies and will bring a unique approach to the fight against fraud, money laundering, and other financial crimes by seamlessly combining knowledge-based and machine-learned models. The financial crime-fighting space itself presents a multi-billion euro market opportunity. However, we have already jointly identified other application areas as well for the powerful RiskShield decision engine with its expansive feature engineering capabilities to address.

Powerful Real-Time Decision Engine Meets Beautifully Simple Machine Learning Environment

Under this agreement, INFORM will be developing and offering access to RiskShield ML, powered by BigML. RiskShield ML serves as an enhancement to the current RiskShield machine learning offering, making model creation easy and accessible to all organizational functions of its customer base. Customers can develop and train their own models, which can be validated and applied in real-time for detecting new modus operandi of financial criminals. INFORM’s proprietary interface to BigML allows for the seamless integration of data regarding, among other things, transaction decisions, which will be used to dynamically train and update the financial crime-fighting models. INFORM will also serve as a sales and delivery partner for the standalone BigML platform.

BigML’s promise to bring machine learning to everyone in a beautifully simple environment combined with RiskShield’s powerful decision engine further enables financial institutions to take a hybrid Artificial Intelligence (AI) approach to their fraud and money laundering prevention efforts. On the one hand, users benefit from knowledge-based methods such as mixed logic rule sets, fuzzy logic scorecards, dynamic profiling, and blacklists. On the other hand, RiskShield ML, powered by BigML, offers users the ability to take historical transactional data and learn from the decisions using both supervised and unsupervised learning methods.

INFORM and BigML will take the stage together at the upcoming RiskShield International Networking Event on November 28 and 29 to present their new joint solution. 


INFORM GmbH is a global company in advanced optimization software systems and a leader in providing intelligent, customer-centric fraud prevention and AML compliance solutions. With RiskShield we offer a multi-channel platform that detects and manages suspicious activities, minimizing losses and optimizing efficiencies using advanced analytics, machine learning, and intuitive rule management controls. RiskShield provides a robust solution with proven fraud detection results that are reliable, fast and responsive. More than 1,000 companies worldwide benefit from using advanced optimization software systems by INFORM in industries such as financial services, insurance, health care, transport logistics, airport resource management, and production planning. INFORM employs over 750 staff from more than 40 countries.

Launching the BigML Certified Architect course: October 15, 2019 (First Wave)

BigML’s education initiatives are an integral part of what makes the platform useful and popular as they continually generate a new class of autonomous power users of Machine Learning that can creatively explore predictive problems to serve their customers.

BigML Certifications

For almost two years, we’ve been offering our BigML Certified Engineer program, which has produced an impressive 23 waves of graduates to date. At this point, the interest from the existing pool of Certified Engineers has made us decide to launch our brand new BigML Architect Certification program that requires the successful completion of the Certified Engineer program. Effective immediately, you can sign up for the first wave that starts on the week of October 15, 2019!

This new Architect course is aimed at advanced BigML users and BigML Certified Engineers who want to learn how to design, architect, and implement end-to-end Machine Learning applications. The students will learn how to make the best decisions for their smart application depending on the volume of data, the Machine Learning tasks to be automated, and the specific requirements of the problem being solved. 

The Architect certification process consists of 8 online classes of 1.5 hours each. The evaluation will be based on solving a set of theoretical questions and exercises presented during the course. The sessions detailed below will be delivered in pairs as online classes.

  1. Machine Learning Engineering
    • Real-world Machine Learning
    • Building end-to-end Machine Learning applications  
    • How to size and address your project
  2. BigML Predictions
    • How to generate thousands of predictions per second 
    • How to store predictions for further analyses
    • How to implement robust predictions.
  3. Model Risk Management
    • Local models vs. remote models 
    • How to use and operate models
    • How to monitor your models
  4. Machine Learning Models: How to Automatically Create Models
    • Automated model and parameter selection 
    • When good is “good enough”
    • What your actual test set tells you about your model
  5. Model Retraining: When and How to Retrain Models 
    • Tracking models over time. You can learn from everything. 
    • Automating covariate shift detection
    • Active Learning
  6. Building Datasets for Machine Learning 
    • Diversity vs. volume
    • Detecting biases
    • Detecting blind spots
  7. Automatically Preparing Your Data for Machine Learning
    • Choice of data engineering tools
    • Automating feature selection
    • Automating feature generation
  8. Putting It All Together
    • Anatomy of a robust Machine Learning application
    • Lessons learned and best practices
    • Design patterns: beyond lessons learned and best practices

This program is also a great opportunity for BigML delivery partners to demonstrate their mastery of the rapidly growing BigML Machine Learning-as-a-Service platform while further differentiating themselves from competitive analytical services organizations.

Not yet a BigML partner? Well, you can change that by contacting us today to find out more on how the new wave MLaaS platforms can help you deliver actionable insights and real-world smart applications to your clients within weeks.

AutoML: BigML Automated Machine Learning

Last year, BigML launched the OptiML resource for Automatic Model Optimization. Without a doubt, it has marked a milestone in our Machine Learning platform. Since then, many users have included OptiML in their Machine Learning toolboxes. However, some users are asking us to go further than model selection, so today we’re presenting BigML’s AutoML, an Automated Machine Learning tool for BigML.

automl icon

This first version of AutoML helps automate the complete Machine Learning pipeline, not only the model selection. To boot, it’s pretty easy to execute. Give it training and validation datasets and it will give you a Fusion with the best possible models using the least possible number of features: ready to predict!

The returned model will be the result of three AutoML stages: Feature Generation, Feature Selection and Model Selection. AutoML will also return the evaluation of the model, in order to show the user its performance.

AutoML is provided as a WhizzML script and a library. You can find it in the WhizzML’s public repository on Github.

AutoML steps

As mentioned, behind the scenes, BigML’s AutoML is performing three main operations: Feature Generation, Feature Selection, and Model Selection.

The first stage is Feature Generation. During this stage, some new features are added to the original datasets. These features are obtained by applying Unsupervised Learning models to them. The new synthetic features added to the dataset come from:

  • Cluster Batch Centroids (Clustering)
  • Anomaly Scores (Anomaly Detection)
  • Batch Association Sets (Association Discovery):  Using the objective field from your dataset as consequent and using leverage and lift as search_strategy
  • PCA Batch Projections (Principal Component Analysis)
  • Batch Topic Distributions (Topic Model): Created only when the dataset contains text fields.

The next stage is Feature Selection. After Feature Generation, we usually end up with very wide datasets, so, AutoML applies Recursive Feature Elimination to remove unimportant or redundant features. Those that may need a refresher can revisit the post we shared a few months ago about the topic. The steps that AutoML follow are identical to the ones shown in that blog post.

The final stage is Model Selection. Guess what? OptiML will help us with this task. The full power of Bayesian Optimization is leveraged at this step to arrive at the best possible models. At the end of the process, by choosing only the best models evaluated by the OptiML we can create a Fusion (e.g., top-performing Deepnet plus top tree ensemble). The final Fusion model should also be evaluated against the validation dataset in order to record and display its performance for the end-user.

It’s your turn now…

This is only the opening salvo of Automated Machine Learning within BigML. If you want to start using AutoML, please check the WhizzML public repository on Github. There, you will find all the information needed to install and run AutoML. This is only one example of how WhizzML lets us further extend BigML’s capabilities. And anybody can do it the entire AutoML code is made public which means you can modify or extend it to fit your specific needs. You asked us to go one step further and so we did and now, we’d love to receive your feedback on ways to further improve on this. In subsequent blog posts, we will also showcase AutoML in action with some example use cases. Until then, stay tuned!

Bring Machine Learning to PostgreSQL with BigML

As of late, we’ve been using PostgreSQL in BigML quite a lot, and so do some of our customers. We love the features that they are bringing in the next release (which is in Beta as I write) and particularly the one that allows creating what they call generated columns. In this post, I’ll be explaining what these generated columns are and how they can use the Machine Learning models in BigML to fill in any numeric or categorical field in your table.

What is a generated column?

A generated column is a special column that is defined as the result of a computation that involves any of the values of the regular columns in the row. Let’s see an example.

Say you have a table of contacts in your database where you fill in their first and last name plus their email. You might want to keep also the full name for output purposes, but of course, you don’t want that to be a column to be filled independently. Here’s where a generated column will come handy:

CREATE OR REPLACE FUNCTION my_concat(text, text)
RETURNS TEXT AS 'text_concat_ws' LANGUAGE internal immutable;

CREATE TABLE contacts (
    first_name TEXT,
    last_name TEXT,
        my_concat(' ',first_name,last_name)) STORED,
    email TEXT);

The full_name column is defined as generated always, so you will not be able to insert values in that column. Instead, the column is automatically filled by a concatenation of the contents of first_name and last_name with a blank between them.

testdb=# INSERT INTO contacts (first_name, last_name, email)
    VALUES ('John', 'Doe', '');
testdb=# SELECT * FROM contacts;
 first_name | last_name | full_name |     email      
 John       | Doe       | John Doe  |
(1 row)

Ok, that’s not bad at all and will both ensure consistency and ease maintenance. However, the information in the table has not increased. The generated column is not telling us anything that we do not know in advance. What if we could use Machine Learning to add more information to our table?

Machine Learning insights

For those of you who are not familiar with Machine Learning, it’s a branch of Artificial Intelligence that has been proven to be very useful to the Enterprise so far. The basic idea behind Machine Learning is using computers to label things for us by just providing a collection of previously labeled examples. The computer uses some algorithms to learn from these examples and is able to predict the label for the new incoming cases.

For instance, imagine that you run a telecom. Some of your customers will churn, but which ones? Wouldn’t it be nice to be told who is likely to churn? Maybe you could offer a discount or other offer to convince them to stay.

That’s one of many things that Machine Learning can do for you. Based on the examples of customers that churned, the computer can learn which patterns lead to churn and predicts whether a user calling your customer service line matches any of those patterns and therefore at risk of churning (for more details, check this post by Chris Mohritz).

Back to our example, how could we add that label to our call center table?

Powering tables with AI

In order to predict the likeliness of a customer to churn, the Machine Learning algorithms build models. You can learn more about the different types of models and their uses in our videos. As this post is not focused on how to build a model, let’s use an existing model for the telecom churn problem that we have in our model gallery. You can easily clone that into your BigML account for free.

Churn telecom model

Feeding what we know about the user to the model (the total day minutes, voice mail plan, total day charge, total intl minutes, total intl calls, etc.) the model will tell us what we don’t know: whether the user is likely to churn. Could we add that information as one more column in our table?

The good news is that PostgreSQL offers extensions that allow you to define functions using several general purpose languages. One of them is plpythonu that you can use to embed Python code in the postgreSQL functions. Also, BigML offers bindings to several languages (Python included) that know how to use the Machine Learning models to create predictions for your input data. Let’s put all of that together in five steps:

  1. Register or login to BigML so as to use its models.
  2. Clone the model available in BigML’s gallery to your account.
  3. Install the Python bindings.
  4. Create a function to generate the prediction.
  5. Create a table to store your input data and the generated column.

Step 1 is easily done by using the signup form at that will ask for your email and basic information. Then you can follow the link to the model and click on the buy link to copy it. From that moment, you’ll be able to use the model to make predictions. At this point, your model is stored in your private environment in BigML’s servers.

The next step is installing the bindings, where some classes are able to download that model to your local computer and use the information therein to predict the churn output for each set of inputs. The detail about how to install them can be found in the bindings documentation, but basically, it means using pip to install them:

pip install bigml

Now comes the time for defining the function that will predict whether the customer is going to churn in PostgreSQL:

CREATE OR REPLACE FUNCTION predict_churn(total_day_minutes REAL,
                                         voice_mail_plan TEXT,
                                         total_day_charge REAL,
                                         total_intl_minutes REAL,
                                         total_intl_calls REAL)
          RETURNS text
          AS $$
            from bigml.model import Model
            from bigml.api import BigML

            # ------ user's data -------------- #
            model_id = "model/52bc7fd03c1920e4a3000016" # model ID
            username = "my_username" # replace with your username
            api_key = "*****" # replace with your API key
            # ---------------------------------- #
            local_model = Model(model_id,
                                api = BigML(username, api_key,
            return local_model.predict( \
                {"total day minutes": total_day_minutes,
                 "voice mail plan": voice_mail_plan,
                 "total day charge": total_day_charge,
                 "total intl minutes": total_intl_minutes,
                 "total intl calls": total_intl_calls})
          $$ LANGUAGE plpythonu immutable;

The Python code uses the ID of the model, which can be retrieved from BigML’s dashboard, and your credentials (username and API key). Thanks to that, the Model class will download the model information to your computer the first time that function is called. The model will be stored in a ./storage folder, and from then on this local copy will be used to make the predictions. In order to use the function, we just need to create the table with a generated column as before:

    total_day_minutes REAL,
    voice_mail_plan TEXT,
    total_day_charge REAL,
    total_intl_minutes REAL,
    total_intl_calls REAL,
    churn_prediction TEXT GENERATED ALWAYS AS
                       total_intl_calls)) STORED);

And voilà! Next time a customer calls your call center, insert the information about him in the table

INSERT INTO churn (total_day_minutes,
VALUES (45,'yes',55,120,3);

INSERT INTO churn (total_day_minutes,
VALUES (55,'no',50,100,12);

The model will immediately add the prediction for the churn

SELECT churn_prediction FROM churn;
(2 rows)

Can’t wait to use it!

For those of you that want to try this right away, there’s an alternative to generated columns: using triggers. A trigger is a function that will be called on the event of inserting, updating or deleting a row. Triggers can be attached to tables so that tasks are performed before or after one of these designated events take place.

To mimic our example, we could create a regular table with plain columns

CREATE TABLE plain_churn (
    total_day_minutes REAL,
    voice_mail_plan TEXT,
    total_day_charge REAL,
    total_intl_minutes REAL,
    total_intl_calls REAL,
    churn_prediction TEXT);

but add a trigger on insert or update, so that the content of churn_prediction is computed as the prediction based on the rest of columns

CREATE or REPLACE FUNCTION predict_churn_trg()
          AS $$
            from bigml.model import Model
            from bigml.api import BigML
            # ------ user's data -------------- #
            model_id = "model/52bc7fd03c1920e4a3000016" # model ID
            username = "my_username" # replace with your username
            api_key = "*****" # replace with your API key
            # ---------------------------------- #
            local_model = Model(model_id,
                                api = BigML(username, api_key,
            new_values = TD["new"] # values to be stored
            new_values["churn_prediction"] = local_model.predict( \
                {"total day minutes": new_values["total_day_minutes"],
                 "voice mail plan": new_values["voice_mail_plan"],
                 "total day charge": new_values["total_day_charge"],
                 "total intl minutes": new_values["total_intl_minutes"],
                 "total intl calls": new_values["total_intl_calls"]})
            return "MODIFY"
          $$ LANGUAGE plpythonu;

By associating the trigger to the previous table, the churn_prediction column will also be automatically generated when the rest of the values change.

EXECUTE PROCEDURE predict_churn_trg();

So we are ready to go!

INSERT INTO plain_churn (total_day_minutes,
VALUES (55,'no',50,100,12);
SELECT churn_prediction FROM plain_churn;
(1 rows)

Cool right? Let us know how your experience goes and meanwhile happy predicting!

Celebrating 100,000 Registered Customers!

It’s not every day that one comes across a commercial software platform hitting the 100,000 registrations mark in the Machine Learning world. After all, Machine Learning is only now shedding its reputation as mostly an academic endeavor and becoming a business imperative for both large and mid-sized (or even small businesses) that represent many industries and are looking to implement a wide variety of use cases.

BigML Use Cases

In the case of BigML, it took about 6 years to get to our first 50,000 registrations starting from our inception in 2011. However, it has taken less than 2 years to add the next 50,000, which is a testament to BigML’s staying power despite the existence of a dizzying array of Machine Learning tools including highly specialized open source tools and libraries.100k Customers

Naturally, one wonders what forces are driving the accelerating adoption of BigML given our recent experience. As you’d expect, while some reasons are exogenous, others are endogenous to BigML’s product design and go-to-market choices. With that stated, the following waves of change come to the fore:

  • Without a doubt, the interest in Machine Learning has seen an exponential increase in the business world. The routine mentions of “Machine Learning” and/or “AI” in public company earning calls by many executives demonstrate how related initiatives are perceived to introduce strategic implications for many industries.


  • The BigML platform has continually evolved and improved over the course of the last two years making it more comprehensive and able to handle many diverse use cases once out of its reach. It’s somewhat nostalgic to remember that the first version of BigML only featured decision trees as part of a very simple workflow that supported flat file imports and the ability to make form-based single predictions. Over time, BigML has evolved to not only support more algorithms but also multiple options for automation of workflows all the while abstracting infrastructure layer concerns from the analytical end-user in a scalable manner.


  • We must give a special attribution to our auto-ML capability OptiML, which has leveled the playing field for even the novices not familiar with the intricacies of hyperparameter tuning by automating the chore of picking just the best set of parameters for any classification or regression technique available on the platform. More competent models, in turn, mean higher potential business impact and even more interest in iterating with better features, more data, etc. Before you know it, it becomes a positive feedback loop!



  • Our insistence on skipping online forms to fill and sales calls to have before an interested party can even get to experience BigML has also been paying off handsomely. We like to refer to this as Free and Immediate Access as there are no large downloads or painful setup or installation routines or worse yet credit card verifications needed to actually tackle your first predictive use case. Just enter your email and kickstart your personal Machine Learning journey.


  • The next factor is what we can sum up as the human touch aspect and it includes a mix of affordable summer schools, certifications, timely customer support provided to both paying and non-paying users as well as customized assistance that is tailored to desired predictive use cases and ML techniques.


The BigML Team

Late to the party? Get started today…

So regardless of your level of understanding of Machine Learning or the sophistication at your workplace about the matter, you have a spectrum of options to engage with the BigML platform to get real value in the shortest amount of time. We suggest you try any or all of the following routes as your first step and don’t hesitate to reach out to us anytime.

  • FREE Forever Subscription: If you haven’t done so there’s never a better time than now to sign up for the FREE version of BigML.  It only takes an email.
  • FREE Education videos: Unlike the typical advice on how to become a Data Scientist (HINT: take many online courses, read many books on statistics, etc.) you can find a comprehensive set of education videos on each BigML resource that assumes no prior Machine Learning background.
  • BigML Lite for Small Business or Pilot Projects: Larger businesses usually require their own dedicated instance of BigML due to internal rules or preferences but for SMBs or a single business unit of a large organization, it makes more sense to deploy BigML Lite for cost and speed to market reasons.

As the BigML Team, we’re proud to serve our community of early adopters and wish to add another 100,000 users in the next year!

%d bloggers like this: