Skip to content

Celebrating BigML’s 10th Anniversary

Today marks the completion of a decade since a small group of Machine Learning and Software Engineering experts founded BigML back in 2011 with the audacious mission to make Machine Learning easy and beautiful for everyone. So we can’t help but proudly reminisce about a decade of hard work in building our platform while helping deliver a multitude of customer projects. As we witnessed our vision gradually turn into reality what we’re left with is a decade of valuable lessons in Machine Learning platform evolution and industry adoption.


As they say, time is the best teacher and patience is the best lesson. If we are to set our controls to January 2011 and travel back in time to remember the state of Machine Learning adoption in the business world a decade ago we can quickly share with you the following memories:

  • First off, none of the players in today’s Machine Learning software category existed back then. That means only earlier versions of a few open-source libraries (i.e., Weka, scikit-learn, Mahout) were around. They all required users to download and install packages on their computers to analyze smaller static datasets mainly for research purposes. Only Google had a small project called Google Prediction API, but it was deprecated a few years later before any meaningful commercialization.
  • Machine Learning academics had not yet started their mass migration towards Silicon Valley employers and were primarily concerned about their research, which they thought only appealed to their academic peers. For instance, Professor Pedro Dominguez of the University of Washington didn’t yet have Twitter account. In 2013, when BigML published the guest blog post series, Everything You Wanted to Know About Machine Learning, But Were Too Afraid To Ask, it suddenly topped Hacker News. Capitalizing on the new business interest in Machine Learning in the 2010s, Professor Dominguez went on to publish his much-praised book, The Master Algorithm, effectively simplifying the core concepts of Machine Learning for a much broader audience that was hungry for a better understanding of the potential impact of the technology in their respective industries.

ML Trends

  • Back then, VCs weren’t even curious about this thing called “Machine Learning.” Many didn’t know exactly how it contrasted with Artificial Intelligence or Deep Learning. They strongly believed that the recently open-sourced Hadoop and MapReduce were going to solve all the problems of the world. By the middle of the decade “Big Data” was becoming all the rage on the back of the attention the open-sourced Hadoop library was getting from enterprise software developers and architects as fueled by ungodly amounts of VC dollars and a continuum of conferences. As seen above, it’s not until 2016 and 2017 when Machine Learning became more popular, which meant pure-play ML companies had to paddle through the tricky “Big Data” rapids and its noisy and often misleading ripple effects. In the end, as expectations and valuations came down to earth, even the largest VC-backed open-core companies had to consolidate to stay alive.
  • The number of attendees to NIPS (now NeurIPs) hadn’t yet surpassed even a measly 1000. The mass hiring of successful ML academics by Big Tech resulted in that same list of companies and their joint venture organizations dominating the top published research, in turn, causing controversy.
  • After BigML’s beta version was launched in early 2012, the majority of the people that signed up were the Machine Learning specialists drawn to the elaborate visualizations the BigML Dashboard brought to the market. However, over time, we started observing a different type of end-user like developers looking to build smart applications without having to go back to school for a Master’s Degree. Most came for the beautiful visualizations but stayed for the comprehensive REST API and accompanying developer tools that we’ve been offering as an “API-first” company since inception.
  • Google’s “mysterious X lab” hadn’t yet publicly shared the results of their headline-grabbing search for cat videos based on their “Artificial Brain.” In typical popular press style, the history of Deep Learning and the fact that it was based on mid-20th century concepts was not given much mention when the news came out. To this day, this unabashed pursuit of “Shiny Objects” by popular press persists diverting precious attention from what would be real-world business applications however unsexy they may be.

BigML Founders

In summary, at the beginning of the 2010s, the main motivations behind BigML founders’ charge to create a brand new Machine Learning platform from scratch were threefold:

  • There were simply no well-engineered frameworks to develop predictive applications, e.g., Weka crashed when it was fed datasets larger than 1GB.
  • The lack of automation was a glaring need to slow down development efforts. There were no well-defined APIs to automate sophisticated Machine Learning workflows.
  • The existing toolset was not only incomplete but also overly complex as they were designed by scientists for scientists. This was perhaps fine for research purposes but sorely lacking in an enterprise setting, where aspects such as repeatability, traceability, and scalability were paramount.

How times change! Fast Forward to the Present Day…

Enough nostalgia, let’s change tunes and briefly touch on how the world of enterprise Machine Learning today contrasts with the previous decade.

The Good

Unlike the early 2010s, today, the awareness of the potential impact of Machine Learning in the business context has leaped to much higher levels. Gone are the days when we had to define Machine Learning while being greeted with blank stares from business prospects. By now, almost every industry has some positive examples of data-driven predictive applications experts can point at even if groundbreaking projects are still relatively few and far in between as compared to the technology’s full potential.

The Not So Good

The total cost of ownership of production deployed applications still leaves room for improvement from a financial perspective. We expect that the number of smart applications and their underlying predictive workflows will multiply calling into question the long-term viability of the high-touch, siloed processes that support existing implementations if we are to target massive expansion involving many more automated tasks. Unfortunately, the AI-hype machine fueled by popular press keeps churning out articles that incorrectly extrapolate research achievements but the tide is starting to turn as a more down-to-earth perspective more apt for the enterprise audience is finally starting to receive some oxygen.

The Potentially Ugly

The K-shaped recovery is an often discussed topic in these pandemic-stricken days thanks to the unequal rate of recovery between sectors and even between companies within sectors. While those at the top keep enjoying unprecedented access to capital markets in the form of new debt issuance or stock market offerings many other companies and sectors are starving at the edge of insolvency with highly doubtful futures in the absence of massive central bank and government stimulus.

Unfortunately, we expect a continuation of this trend in the near future among companies that are in the midst of their digital transformation vs. those that have their data and analytics houses already in good order. The market share shift from the former to the latter is likely to be of epic proportions in the following years as the world economies strive to get back on track with their pre-pandemic growth trajectories. This contrast can be partially explained by the compounding nature of data-driven innovations.

The Silver Lining

As long as they are willing to make a serious go at it, nimble startups and mid-market companies now stand a much better chance to dominate their niches thanks to predictive solutions built on top of more mature Machine Learning platforms like BigML readily available to them at a fraction of the cost of a single Data Scientist.

In the newly accelerating digitization wave and against the backdrop of antitrust proceedings that may slow down Big Tech’s relentless march to digitize and rule all markets, new billion-dollar opportunities may be there for the taking for new players that can reach their own version of the critical 88 mph innovation threshold warp speed to transform themselves into formidable regional or even global competitors by the end of 2020s.

To conclude this quick tour, we’d like to thank our 138,000+ users for making BigML part of their Machine Learning journey so far. We’re looking forward to serving you and many more to join the ride for another decade!

Reviewing BigML’s 2020 in Numbers

There is no doubt that 2020 has been an intense year for all of us due to the unforeseen COVID-19 pandemic gripping the world. However, as the famous Queen tune goes, “The Show Must Go On”, and indeed the show went on at BigML despite hardships. This year we witnessed the further rise of interest in Machine Learning across all industries and sectors. In hindsight, it seems that as the pandemic forced businesses to make swift adjustments against the challenging backdrop, they turned in greater numbers to digitalization powered by the insights hidden in their data. This trend has manifested itself through more and more smart applications with Machine Learning models and timely predictions at their core. Having left behind another year in powering such mission-critical solutions for our customers and partners, here are the highlights that made 2020 a memorable (if a bit twisted) one in our journey.

BigML registered customers are the thrust behind the BigML Team to keep on improving our Machine Learning platform each year. We enthusiastically work on making our platform more complete and accessible for all types of business professionals and organizations such as government agencies, educational institutions, big corporations, and small businesses alike. In this blog post we present some examples of what we have been doing during 2020.

Product Enhancements

In 2020 the BigML platform released more options to import data directly from external databases which makes it easier for all our customers to easily work with their data wherever they have it stored. We are well aware that gathering the data to start a Machine Learning project can be a hard and tedious process, so focusing on our vision to make Machine Learning beautifully simple for everyone our platform now supports MySQL, SQL Server, and Elasticsearch in addition to PostgreSQL.

Additionally, we have also made veritable improvements to the BigML platform that are included on our What’s New page, e.g., API request preview, WhizzML optional inputs, and the latest updates on BigML’s add-on for Google Sheets. All of this was made possible via 82 production deployments.

But the best is yet to come! We will shortly announce our upcoming major release that is going to provide a great upgrade to existing BigML’s capabilities. So stay tuned for future announcements to find out more!

Machine Learning Events and Webinars 

The worldwide lockdown caused by COVID-19 is having a huge impact on the events industry, which has drastically changed since March 2020. The sector had to adapt to the new situation as best they could and replace in-situ events and conferences with online ones. As such, with everything ready and set for the live Machine Learning School that was to take place in Seville last March 2020, days before, we decided to reboot the event and run it virtually for the safety of our attendees and speakers since they could attend from the comfort of their homes. The response from the virtual audience was overwhelmingly positive, easily exceeding the attendance in prior BigML events and reaching 2,500 registrations from 89 countries! Feel free to visit the MLSEV event page to watch the video recordings of each presentation as you please.

In addition to our Machine Learning Schools, we also enjoy organizing joint webinars and last October we had one together with our partner INFORM GmbH. The webinar “Machine Learning Fights Financial Crime” presented how to detect fraud with RiskShieldML, the Hybrid AI solution from INFORM powered by BigML. 

This year BigML speakers, advisors and board directors attended 16 business conferences to present and discuss how they are implementing predictive applications or to illustrate the latest product improvements of BigML’s comprehensive platform with other Machine Learning experts and practitioners.

We always look forward to organizing more events every year, either in person or online, and we have already been focusing on our upcoming Machine Learning School for Business Schools that will take place early next year, starting with a free, virtual one on February 17, 2021. Check out the event page for information on the speakers and the agenda. Remember to follow on the footsteps of the 700 people so far and register to reserve your spot to learn from the esteemed professors that will be teaching the next generation of business leaders!

Real-World Use Cases: BigML in Action! 

More BigML customers are automating processes with smart applications as seen in the examples use cases below. We are looking forward to adding to this list in 2021 to help even more businesses gain efficiency. 

Machine Learning for Healthcare

Machine Learning in the Retail Industry 

Machine Learning in Industrial Chemicals and Machinery

Machine Learning in Construction

Machine Learning in Business Services

Machine Learning for Marketing Professionals

General Machine Learning Blog Posts 

You can visit the BigML Blog for more use case examples in the new year.

Strategic Partner, Alliance, and Client Highlights 

2020 has brought interesting strategic partnerships to BigML that let us expand our technology without limits. We are excited to highlight our alliance with Fundación ONCE, a Spanish foundation that has a long history of improving the quality of life of people with disabilities. Both organizations are jointly evolving the BigML platform to make it more accessible and allow all people, with or without disabilities, to effectively use Machine Learning and build predictive applications. On the other hand, we have also solidified the BigML brand in South America, with the special help of our partner, Panda ID Soluciones, the leading company in the development of smart solutions in Colombia and Venezuela. We are jointly helping enterprises in Central and South America adopt Machine Learning with customized predictive applications for their businesses.

In April, we announced our collaboration with Claire Global, a marketplace devoted to the global B2B food trading business. It is purpose-built to implement Machine Learning-driven solutions to optimize the buying and selling processes that are critical to the wide-reaching global supply chains of the food industry. 

On the other hand, long-time BigML customer Rabobank’s, whose Master of AI Jan Veldsink, was featured on ‘AI in Banking’ Podcast last January. We invite you to listen to this illuminating podcast about how Rabobank utilizes NLP and unsupervised learning techniques such as Topic Modeling to make sense of large collections of documents and their underlying textual content.

Teaching and Learning Machine Learning with BigML

Education is key to changing the world. Thus, in addition to our Machine Learning Schools and other events described above, we actively collaborate with education institutions that wish to use BigML in their classrooms. Since we launched the Education Program back in 2016 we kept adding to the number of universities and schools that use BigML to spread Machine Learning know-how in academia. This year we added 112 new education institutions to our map, received 165 new ambassador applications, and participated in a large amount of workshops and seminars presented by the BigML Team in many universities distributed around the world. Last but not the least, we ran 8 new batches of our BigML Certification courses.

The BigML Team 

All activities listed above have been possible thanks to the BigML Team, working remotely since 2011 and distributed worldwide. 2020 has been the year when plenty of companies discovered virtual meetings and understood that it is perfectly possible to work efficiently without forcing employees to attend the same office. In that regard, we did not have to adapt our way of working because BigML has operated that way since inception.

Once again, we are ready for more Machine Learning highlights to add to our Milestones page in 2021 and would love to have you and your company be a big part of it. Do you want to join the ride? Send us an email to and let us know how we can help you!

Virtual Machine Learning School For Business Schools: Registrations are Open!

At BigML, we know that Machine Learning is transforming all kinds of industries and education is key to make this transformation a reality. This is the main reason why we launched the BigML Education Program back in 2016 to spread the word and help democratize Machine Learning. Since then, more than 700 universities have been regularly teaching Machine Learning with BigML in their classrooms. To keep actively contributing to this wave of change in academia, BigML is organizing a one-day online event that will be held on February 17, 2021: our First International School on Teaching Machine Learning in Business Schools presented by educators from Business Schools.

The conference will take place in a safe and virtual environment, and it is ideal for Machine Learning professors, business professionals, industry practitioners, and advanced undergraduates that wish to enroll in an MBA course to hone their data-driven business skills. This event will cover the different ways professors bring Machine Learning into the classrooms, how to best teach the next generation of business leaders what Machine Learning is, how to practically use it to optimize business processes, and the level of maturity in Machine Learning they need to reach to make data-driven decisions on a daily basis. We invite you to join our virtual Machine Learning School and meet these innovative professors that are at the forefront of the changing business education landscape.

This crash course is organized by BigML in collaboration with several Universities and Business Schools such as Northeastern University in Silicon Valley (USA), the University of Portsmouth in the United Kingdom, Nyenrode Business University in The Netherlands, Prague University of Economics and Business in the Czech Republic, IE Business School, ESADE Business School, ICADE Business School, and EOI Business School in Spain, among others.


The event will be held virtually, for you to get acquainted with Machine Learning in the comfort of your home or office. All lectures will be delivered via live webinars from different parts of the world and you will also have the opportunity to interact with our presenters LIVE


1-day event: February 17, 2021 from 9:00 AM to 6:00 PM CET.


This edition is quite different from our past Machine Learning Schools, as all lecturers are professors from several Universities and Business Schools that currently use BigML to teach Machine Learning to their students. 

If you wish to know the lecturers better and continue discussing their presentations, we invite you to join the Google Meet sessions that we will run during the breaks. To attend, first you will need to register for the event. As the event date comes close, registered users will receive an email from with links to join every virtual room with the speakers.


Besides the core concepts of Machine Learning, our expert instructors will cover a wide variety of business-oriented use cases that they teach in their classrooms. We will also have an additional session on the hot topic of situational ethics, a hands-on workshop, as well as a panel with all participating lecturers to learn from their experience. Feel free to check the agenda for more details.


Please register here to join the event. Once your spot is reserved, you will receive an email with the relevant instructions on how to connect to the event.


The event is FREE of charge and no prior experience in Machine Learning is required, so please feel free to share the news with friends and colleagues that may be interested.

We look forward to seeing you all online at this unique event! Check out the event page for any updates.

Webinar Video: Machine Learning Fights Financial Crime

We are excited to share the video of the joint webinar we hosted yesterday with BigML’s partner, INFORM GmbH, where they shared their expertise on fighting financial crime with Hybrid AI. The webinar covered how the utilization of Machine Learning as a new tool against fraud detection can augment existing rule-based systems financial institutions employ to stop fraudulent transactions. In fact, the results are more promising when combining a systematic Machine Learning approach with knowledge-based methods such as mixed logic rule sets, fuzzy logic scorecards, or dynamic profiling. 

This new way is now generally available, in essence, commercializing the Hybrid Artificial Intelligence concept in the form of RiskShieldML, a world-class risk assessment, fraud prevention, and Anti-Money Laundering (AML) compliance monitoring solution built on top of the BigML platform to more efficiently protect any organization against financial crime.

In this video, Kevin Nagel, Consultant and Data Scientist at INFORM, showcases the benefits of the Hybrid AI strategy in relation to alternative approaches and complements his presentation with a live demo. In case you could not make it to the live webinar yesterday, you can still catch up with the webinar recording we have just posted on the BigML Youtube channel. We also invite you to check out the accompanying presentation, available on the BigML SlideShare page.

Do not hesitate to contact or to leave your feedback and ask any follow-up questions specific to your company or data.

Stay tuned for future webinars with similar concrete examples of Machine Learning applications!

More Machine Learning in your Google Sheets

It’s been a while since the first version of BigML’s add-on for Google Sheets. The post announcing it described how one could add predictions to Google Sheets cells by using BigML’s Decision Trees. It was also possible to apply segmentation to the rows in a spreadsheet by tapping into Clustering Models previously created in BigML.

During these years, BigML has been adding new supervised and unsupervised models to its portfolio of native resources. All along, the add-on has been steadily updated to include most of them, like Logistic and Linear Regressions. However, so far it has not been possible to upload to or download the information in the spreadsheet to BigML. On the contrary, the models in BigML were downloaded to Google machines, where predictions were computed. That implied some limitations, because Google sets some limits to the size of the objects that can be downloaded. Therefore, heavier models like Deepnets or Anomaly Detectors could not be added to the add-on model list.

We’re happy to share that this last version of BigML’s add-on has overcome these limits to provide more flexibility and options to users. This video shows a quick taste of the add-on functions that will be explained in this post.

Now, the add-on includes two more options: upload to and download from BigML.

When uploading the information from Google Sheets to BigML the result is a Source resource that contains the data dictionary describing how data is parsed i.e., the number of fields, their names, and types. From that, a Dataset containing the values of the fields can be built. That opens up plenty of possibilities to extract insights from your data, because datasets are the starting point for all Machine Learning procedures, like modeling, scoring, or evaluating.

Let’s learn by example about the new capabilities in the add-on. I was curious about Tenet, the last film by Christopher Nolan, so I searched for twits talking about Nolan’s films and created a small sample in a Google Sheet.

My goal would be to predict the sentiment associated with each sentence so that I don’t need to read the opinions to know if it would be worth seeing the movie. In order to do that, we need a large enough Dataset that contains sentences and the sentiment label (positive or negative) associated with each one. In BigML’s datasets gallery, we can easily find a Review Text Sentiment dataset that seems fortunately fit that description:

We can clone the dataset to our account by clicking on the FREE label that you see in the top right corner. Once the dataset is cloned, we can inspect the kind of information that it contains.

There are two fields: sentiment, a categorical field that contains only two labels (positive or negative), and text, a text field that contains the sentences that have been previously labeled and will be used as training data. We can also see the kind of topics discussed by looking at the text field tag cloud.

We observe that the dataset contains opinions about movies and they are already classified as positive or negative. That’s exactly what the algorithm needs to build a model to predict the sentiment associated with a particular sentence. Therefore, we can create a Deepnet in 1-click:

The next step is using BigML’s Review Text Sentiment dataset information to assign a label to those opinions. BigML’s add-on will allow us to locate the Deepnet we just created. Simply select the Start action in the add-on menu and search for Deepnets in the dropdown.

The list of your Deepnets will appear. Clicking on the link of the Review Text Sentiment Deepnet, you should end up in the predict view. Pressing the predict button, the add-on sends every sentence to BigML and runs them through the model and brings back the corresponding sentiment labels and the confidences associated with each prediction.

Of course, this one-by-one process can be slow if you need to classify a lot of rows. In this case, a different approach is recommended. Open the add-on menu and use the Upload to BigML action to upload the contents of the active Sheet to BigML, where a Source will be created.

The Source’s view menu allows creating a Dataset in 1-click, summarizing all the contents of your sheet.

At this point, you’re ready to go back to the Deepnet view, where the actions menu offers a Batch Prediction action.

It applies the selected model to each row of your dataset and adds a new column along with the prediction results. Simply select the Dataset that was created after uploading your active sheet to BigML in the right combo box and press the Predict button when activated. The list of datasets appears when typing the first characters of the name of your active sheet.

There you are! A new dataset with a sentiment column appended is ready for you in BigML. You just need to download it to Google Sheets. To do this, open the add-on menu and select the Download from BigML action.

The newly created dataset should appear first in the list. Click the link to download the information.

A new Sheet will appear in your file with both the original sentences and the sentiment label associated with them.

Of course, the size of data that can be uploaded or downloaded using the add-on is limited. Google sets different limits depending on the kind of account you are running on their site. Still, you can always upload any amount of data by creating a CSV and dragging and dropping it to BigML. Similarly, any Batch Prediction can be downloaded from BigML directly as a CSV.

As you can see, the new options in BigML’s add-on for Google Sheets offer great ease of use. It also enriches your data with all the insights that can be drawn from the entire set of models and workflows available in BigML. What are you waiting for? Give it a try and let us know how you like it!

Registration Open for FREE Webinar: ‘Detecting Fraud with Hybrid AI’ (October 28, 2020)

In collaboration with BigML partner, INFORM Gmbh, we’re pleased to bring the BigML community a new educational webinar: Machine Learning Fights Financial Crime. This FREE virtual event will take place on October 28, 2020, at 8:00 AM PDT / 9:00 AM PDT and it’s the ideal learning opportunity for Financial institutions, banking sector professionals, credit professionals, risk advisers, crime fighters, fraud professionals, and anyone interested in finding out about the latest financial crime-fighting and risk analysis strategies and trends.

Moving from Lab to Production

Financial institutions must innovate to stop the onslaught of fraudulent transactions. The utilization of Machine Learning as a tool for fraud detection is trending. Combining Machine Learning with existing intelligent and dynamic rule sets produces a sustainable strategy to address this challenge. Hybrid Artificial Intelligence takes historical transactional data and learns from past decisions using both supervised and unsupervised learning algorithms and combines it with knowledge-based methods such as mixed logic rule sets, fuzzy logic scorecards, and dynamic profiling. Financial institutions and professionals can benefit from this powerful combination that detects new modus operandi in a digital environment.

The webinar will showcase the RiskShield Machine Learning solution, a world-class risk assessment, fraud prevention, and Anti-Money Laundering (AML) compliance monitoring solution built on top of the BigML platform to protect your organization against financial crime.



  • 1-hour event: Wed, Oct 28, 2020 8:00 AM – 9:00 AM PDT


  • Please follow this link and reserve your spot. We will do our best to accommodate as many participants as possible, however, we recommend that you register soon.


  • Speaker details can be accessed here as well as the full agenda.

We look forward to your participation!


Perspectives on Self-serve Machine Learning for Rapid Insights in Healthcare

BigML users keep inspiring us with their creativity every day. Many take on Machine Learning with little to no background or education in the field. Why? Because they have access to relevant data and they are smart professionals with the kind of intuition only years of study in a certain field can bring about. It’s no surprise, then, that many of them come to suspect that there must be a better way than the good old descriptive spreadsheet analytics or plain vanilla business intelligence tool reports to solve their business problems. A natural curiosity and a self-starter attitude to actively experiment don’t hurt either!

Dr. Patrick Gladding

Long time BigML user Dr. Patrick Gladding is no exception. Dr. Gladding is practicing medicine in New Zealand, which is, fortunately, doing pretty good in their fight to eradicate COVID-19 these days. We’ve approached Dr. Gladding to find out his motivation behind picking up BigML in the first place as well as his informed opinions on how healthcare professionals can transition into more self-sufficient “man + machine” teams leveraging the full power of Machine Learning in their day-to-day routines at hospitals, clinics, and healthcare research institutions across the globe.

  1. Can you please tell us about your background and how you first developed an interest in Machine Learning?
    • I’m a clinical cardiologist trained in echocardiography and with an interest in genetics. I was involved in some pharmacogenetics studies years ago, where a combinatorial approach was needed to analyze the influence of a number of genetic variants. We used simple neural networks then well over ten years ago, so that would have been my first experience with Machine Learning. Over time, the terminology has changed from predictive analytics to Machine Learning and artificial intelligence, and the technology has evolved to become more capable and accessible. It once took a Ph.D. and access to a computer science lab with proprietary algorithms but now many of the most effective techniques are freely available and usable by anyone. Simultaneous to this has been the digitization of health records, which means it is much easier to aggregate and link patient data to outcomes. This along with the availability of excellent, user-friendly Machine Learning tools like BigML has made it very easy to do some high-quality projects in AI.
  2. Can you tell us (more) about one of your Machine Learning projects and how it made an impact in your field?
    • Given the ample availability of data now, there are almost infinite questions that can be posed. We’ve looked at many conditions and outcomes in cardiovascular medicine to see whether they can be predicted with existing clinical data. This includes predicting mortality from routine blood tests and echocardiography results and we’ve had very similar results to studies in the USA performed at Geisinger Health.
    • Some things seem very hard to predict such as atrial fibrillation as this is a more chaotic, stochastic condition. There are some very strong signals for heart failure both in predicting the presence of the condition as well as predicting outcomes. Heart failure is a common condition that has very poor health outcomes and is a diagnosis that is sometimes missed. We’ve shown the possibility of predicting the presence of heart failure using simple blood results such as a complete blood count and electrocardiogram with BigML.
    • What is really appreciated about BigML is that it provides a wide array of Machine Learning methods including more traditional logistic regression, decision trees as well as deep neural networks. It’s great that BigML provides explainability and transparency of the features that make up the predictive model. This allows me to vet these for things that shouldn’t be included and verifies the importance of features that have previously been shown to predict outcomes. For instance, albumin is a strong feature in models of mortality and this has been shown in previous population studies using standard biostatistics.
    • Similarly, hematocrit is a feature of heart failure predictions, which has been shown to be important in other studies. Using the unsupervised Machine Learning clustering features in BigML, we’ve been able to validate work by others (Shah et al) in the clustering of heart failure subtypes, e.g., heart failure with preserved ejection fraction (HFpEF). This opens up the potential for mass screening hospital patients and then subtyping patients, who might benefit from different treatments. This could improve the diagnosis of management of heart failure considerably but does carry with it implications of altering downstream use of diagnostic testing, therapies, cost, and also potential mislabelling of patients as no predictive model is perfect. There are a lot of ethical and service issues we have yet to work through, but the benefit of using BigML is that everything is transparent, explainable, verifiable, and easily validated. It is quite easy for instance to upload a new large dataset, apply a Machine Learning algorithm, and get some results within minutes, without the need for buying an expensive high-performance computer.
  3. What are the top three challenges for doctors and healthcare professionals in adopting Machine Learning? What’s the easiest path to go from data to insights and predictions? Any words of caution?
    • First comes understanding what Machine Learning can do and its limitations, which includes not completely buying into the hype. Conventional biostatistics are still really important.
    • Second, data is sometimes difficult to access. It takes time, persistence, good ethics, privacy, and security measures such as anonymizing data. Getting the data may require some coding abilities, e.g., SQL queries or Python scripting but we have integrated a system called Qlik into our hospital, which means that data is more easily linked and exportable as a CSV file. However, the overall benefit of BigML is that no coding experience is required to use it.
    • Third, a lot of data is required to generate models that cover the wide variability of human disease and interactions. This means collaborating with others, sharing data, and getting insight into a particular field. Most doctors have a wealth of knowledge and expertise. This means they know what the important questions are but working with Machine Learning still requires talking to data scientists, statisticians, and others to ensure that it’s applied appropriately. Combining a domain expert (doctor) with experts in Machine Learning is a very potent force, and BigML is a great platform to take on some of that role. Sharing health data can be a challenge, but it is really the only way forward for this field.
    • What is really important is to obtain quality data, and spend time tidying it up. The axiom “garbage in, garbage out” still applies regardless of whatever fancy new Machine Learning method is used. Machine Learning is a powerful tool but the ease of use and automation in BigML should not substitute for laziness in terms of evaluating, retesting, and validating models. The exciting thing about predictive algorithms is that they could have a very large impact and improve health. At the same time a biased, poorly fitting model that makes bad predictions could worsen outcomes and health disparities. This includes models that could include racial bias, which has been demonstrated before by Google and others. The scale to do good is equal to the scale to do bad so caution must be applied. This would also be relevant to non-healthcare predictions such as loan defaults.
  4. Once you built models and uncovered key insights from your data, how did you use them to mobilize your organization around findings? What is the best way to collaborate with others that may not be much into Machine Learning?
    • BigML has great visualization features and in-built evaluation models that demonstrate sensitivity and specificity for predictions. These outputs are much more understandable to clinicians, e.g., phi, F-measure, and recall. There is a bit of jargon in this field that doctors need to get their heads around. Mobilizing our organization was easy showing them the BigML visualizations. BigML has an enterprise version, which means it can be run on-site within a hospital network, so as to preserve the data security measures already in place.
    • Collaboration through BigML is a really good way of having a kind of escrow for health data as there is not a lot of data sharing that goes on. The reticence to share is probably due to a number of reasons like confidentiality and privacy around the use of health data but also the desire to commercialize results or preserve proprietary algorithms. Basically, if you have data it is now incredibly valuable, so even if you are not into Machine Learning there is no doubt that someone will want to talk to you about it if you have a lot of high-quality data. I would love to see a portal through BigML, where health care professionals could share either data or models or both.
  5. Do you have anything else to share regarding your experience with the BigML platform?
    • BigML is a great system for non-coding doctors and others who want an easy to use and understandable system. It took only about two hours of one-on-one tutoring to get my head around the platform, as well as watching YouTube videos. The customer service is excellent with friendly experts willing to give you help with any application. It’s great that BigML has integrated several Machine Learning methods, which include logistic regression which has been used for decades in the medical literature. It’s something doctors are very familiar with and it is worth noting that it often predicts better than deep neural networks. So despite the hype around deep learning, it is not the perfect tool for everything. Deep learning however is very good when applied to images and unstructured data so I am looking forward to new applications coming from BigML including AI image analysis.

Hope you enjoyed this interview and found something useful to directly apply in your projects. Do you have a similar Machine Learning adoption story you’d like to share with the BigML community? We’d be more than happy to spread the lessons learned for the greater good.  Please don’t hesitate to contact us at and stay safe!

Panda ID Soluciones and BigML Join Forces to Lead the Adoption of Machine Learning in Business

We are thrilled to announce that Panda ID Soluciones, a leading company in the development of smart solutions, and BigML, the leading Machine Learning platform since 2011, are coming together to help organizations that are motivated to capitalize on the intrinsic value of their data to build and deploy intelligent Machine Learning applications to improve processes, maximize profits, and stand out in their sectors.

Panda ID Soluciones has been developing specialized solutions for each business for 25 years based on the treatment of data and identification of people, assets, and transactions. The company has always been characterized by the development of solutions that add value to businesses by stressing the importance and relevance of data and the information hidden in it as well as transactions and in the added value that they bring in being able to understand the activities that their clients are performing in real-time. Panda ID teaches in a practical way how to identify threats, generate patterns, measure trends, detect anomalies, and propose strategies in a timely manner.

Currently, there are still companies that don’t know what Machine Learning is, the positive impact it could have on their businesses, or how to optimize resources and processes to improve results by applying the appropriate Machine Learning techniques. Consequently, they do not foresee that the adoption of Machine Learning in their organizations can be a simple process. This lack of understanding can lead to these companies wasting resources, efforts, and valuable projects that often don’t go into production because they have been poorly managed.

Panda ID Soluciones has taken into account this widespread misinformation about Machine Learning, which has been a key factor in carrying out this alliance with BigML. Our objective with this alliance is to help organizations from all sectors understand that there are specific and successful Machine Learning applications for each business and for each project. With our new, practical, and result-oriented approach, those firms will find the hidden value that their data contains to rapidly realize quantifiable benefits.

BigML and Panda ID Soluciones will create change that leads organizations to prioritize and give priority to the treatment of their data. Both companies will expertly guide businesses in each sector and industry, thus making clients actively participate in the Machine Learning adoption process to arrive at results that generate value, meaning, and greater benefits.

Introducing the BigML Machine Learning platform to the Panda ID Soluciones portfolio, the multinational with headquarters in Colombia and Venezuela will achieve a transformation in the way of seeing and understanding the true meaning of Machine Learning. Business data grows every day and the pressure to obtain the best information from it has become paramount. This is why organizations must act now, competition is intensifying requiring more precise models to be built in order to create and identify new opportunities that are profitable. The era of Machine Learning is here and now.

Alberto González, CEO of Panda ID Soluciones, highlights: “BigML is the most suitable company with which we could partner to offer Machine Learning solutions to the market. Among its main premises is offering real solutions to real problems that can be quickly implemented and put into production. This means sincerity towards clients and that goes hand in hand with what we profess. But without a doubt who wins more, is the market, our clients, and that is a reason for great joy.”

Alberto Ariza, VP of Strategic Alliances at BigML, points out: “With Panda ID Soluciones we have found the ideal partner to lean on in Colombia and the entire geographical area of ​​the Caribbean, to achieve a solid entry that allows us to apply in its business sector one of the most innovative technologies worldwide in the field of Machine Learning, which is exactly what we bring to the table at BigML.”

Machine Learning Fights Cannibalization in the Retail Industry

This guest post is authored by Olena Skarlat, Stefanie Pichler, Beatrice Bunjaku, and Pamela Martin, from Vertical Market Solutions at A1 Digital, a BigML Partner. 

Machine Learning has proven effective in providing insights into data and processes that drive business decisions in any industry domain. However, the high volume and velocity of data make it challenging to get those insights both proactively, depending on already established processes, and reactively, accounting for the unknown. In this blog post, we give you an overview of the methodology on identifying and addressing the problem of cannibalization of products caused by promotional campaigns in the retail domain. The use case discussed here aims to analyze the impact of promotional products, specifically to identify cannibalization effects, i.e., when the promotions decrease sales of the non-promotional products dramatically.

An example: A chain of supermarkets decides to have a promotion on 500 grams of chicken breasts. This promotion has significant effects on the sales figures for certain other products. In our example, the sales of beef steaks and various turkey products dropped by more than 20% within the period of the promotion. However, many of those associations are not discovered upfront. If the retail management could tackle such an influence in advance of the promotion of other products, they could adjust their demand planning to order products that will be in demand and order fewer products that will be less popular during the promotional period.

Therefore, in this use case, we predict the demand for products accounting for promotional campaigns and as a result, stocks can be adjusted, and waste is reduced. We have implemented this use case as a fully automated Machine Learning application that is capable of (i) learning Machine Learning models on data, (ii) providing valuable insights, and (iii) performing monitoring and assessing Machine Learning model competence over time.

The input data for this use case is from a retailer that has multiple supermarkets in different regions. It contains sales transactions for meat products over several years. Data also includes information on whether the product was on promotion over a certain period. The outputs of this use case can be used by two different roles: (i) a demand planner (operational outputs) and (ii) a business analyst (Machine Learning analytics). The operational output contains insights about the negative impacts of promotional products on other products. The purpose of getting such insights is to adjust the stock in supermarkets accordingly and not to order, for instance, foods that spoil fast if it is expected that their sales will be decreasing. The Machine Learning analytics output contains Machine Learning models, performance parameters, and evaluation results over time. These insights provide the opportunity to assess the performance of Machine Learning models over time by using new incoming daily sales transaction data while intermittently retraining those models when their performance takes a dive.

The methodology includes building Machine Learning models to get novel promotional cannibalization insights and creating workflows to enable the Machine Learning model life cycle, monitoring, and evaluation. The vital part of this use case is the A1 Digital Machine Learning Platform powered by BigML. The platform fully automates the time-consuming work of hand-tuning Machine Learning models and executing complex custom workflows they are part of. Figures 1 and 2 accordingly show the life cycle of this use case along with the main machine learning workflow. Other automated workflows deal with data transformation, receiving predictions, and performing daily evaluations of models.

Figure 1: Use case life cycle overview.

Figure 2: Machine Learning workflow overview.

Machine Learning models in this use case include regressions, association discovery, and anomaly detectors. Regression models are used to predict expected sales for products to analyze and estimate if the actual sales of non-promotional products are decreasing or increasing because of promotional campaigns. Association discovery finds ‘significant’ associations, or so-called association rules, between promotional and non-promotional products. We are interested in identifying the negative impact of promotions, i.e., instances where the expected sales of products are decreased by more than 20%. For example, Figure 3 shows the associations generated for the pairs of products with a considerable sales decline compared to the expected sales. A promotional product with the identifier id=141 is affected by the non-promotional product with the id=89, non-promotional products with id=44, and id=62 are affected by the promotional product id=196 and so on.

Figure 3: Association rules between promotional and non-promotional products.

The results of the association discovery are converted into sales decrease percentages, i.e., showing how sales for certain products will be slipping depending on promotional products during the promotion period. These results can be used to proactively analyze promotions during their planning phase and to adjust the impacted retail SKUs ahead of time. This is especially important when the products spoil fast, i.e., food items and drinks. For example, as it is shown in Figure 4, the promotional product PRODUCT-45 is expected to negatively affect sales of products PRODUCT-98, PRODUCT-53, PRODUCT-144, and so on. This means that it will be efficient to stock less of those non-promotional products during the promotional period of the PRODUCT-45 to save money and reduce any possible waste.

Figure 4: The impact of promotional PRODUCT-45 on related non-promotional products.

Once the association discovery model is created, it’s a good idea to monitor how well those association rules perform on a daily basis to constantly learn from the new incoming sales transaction data. Therefore, we also train anomaly detectors, which is a powerful tool to measure the reliability of association rules. We build an anomaly detector every time the association rules are produced. Having quantified how anomalous the new daily sales transaction data distribution is, we can get a sense of how different the new data is from the data that was used to produce the original association rules (see Figure 5). This approach tells Machine Learning analysts when to retrain the association rules. Having a high anomaly score for a certain period means that the association rules do not particularly apply to the new sales transactions perhaps due to changes in customer behavior or a major event such as the Coronavirus outbreak causing dramatic societal shifts. When the association rules are updated, it is once again advisable to allow for a testing period to evaluate if the new rules perform better on the new incoming sales transaction data.

Figure 5: The rate of anomalous sales transactions data incoming every day.

This use case is a good example of how Machine Learning can provide an objective overview of how various promotional campaigns affect sales of the non-promotional products in the retail domain. Machine Learning comes to the rescue to identify and tackle the negative effects of promotional campaigns on other products to more proactively adjust their stock, reduce waste, and most importantly protect the retailer’s margin.

Let us know if you have a similar problem and stay tuned for more case studies in the near future!

Machine Learning in Retail and Wholesale: accurate and affordable Demand Forecasting by catsAi

This guest post is originally authored by Stephen Kinns, Founder and CEO of catsAi.

Many business decisions can be traced back to a simple question: ‘How much will we sell?’. Firms, both large and small, widely rely on experience and historical trends to make that assessment yet the accuracy of these approaches can be very poor, which in turn translates to missed efficiency savings through the business.

Machine Learning-based predictions can be much more accurate, but historically the cost and complexity of such technology have made this an uneconomical option for many firms. Despite this challenging backdrop, catsAi’s unique approach makes Machine Learning useful, easy to implement, and cheap for retail, wholesale and other businesses. As such, we have created a lightweight, off-the-shelf solution for demand prediction, which drives easy and rapid adoption at firms of all sizes through supply chain intelligence. With the aid of state-of-the-art Machine Learning powered by BigML, catsAi offers reliable predictive sales numbers on a daily basis, for the week ahead, on each and every product. 


Our existing clients and partners make up a wide variety of retail firms from the smallest high-street store such as bakeries, through to large global wholesaler enterprises. The core challenge is that every client, every location, and every product is different. Therefore, being able to adapt to a wide variety of products and clients has proven key to catsAi’s burgeoning success.

From Raw Data to Production and Benchmarking

Machine Learning as a tool excels in exploring historical patterns. For many firms, the factors that influence sales patterns are diverse; no two firms are alike. Location, weather, cultural influences, and of course changing inventories may all affect likely sales. This means that should a firm wish to implement a Machine Learning solution themselves all this data must be acquired, cleaned, assessed, and analyzed. Datasets can come in a huge variety of shapes and sizes ranging from a few thousand rows to 10+ million observations.

To solve for this variety and keep costs-down, catsAi continually builds bespoke datasets, trains models, evaluates, and then deploys them automatically without any human intervention. When done, catsAi’s data pipelines paint a detailed picture of the influencing factors behind changing sales dynamics complete with custom-developed features for the specific to the client. 

The datasets are then securely sent to the BigML system to initiate the model training process, which we manage through the use of available tuning parameters, configuration options, and event handlers. catsAi assesses and evaluates the results of each training run before making a final decision on deployment for ‘live’ predictions in an agile manner.

Over the years, we have evolved from a neural network on a laptop, to a full-fledged cloud-based system thanks to BigML’s support. The BigML suite of tools, both at the REST API and graphical Dashboard level, has considerably accelerated our deployment time-frames. We are now able to scale to match our clients’ expectations while simultaneously maturing our models iteratively.

The resulting effect of our approach replaces the typical complex and time-consuming data-science process by breaking it into small manageable pieces that can be executed automatically. This means customers can autonomously deploy the predictions swiftly and affordably whilst maintaining accuracy and control. 

Although a few successful runs can help secure the client’s trust we often need to prove the ongoing value of the predictive models to a client, so we continually set some simple benchmarks. In the absence of Machine Learning, in a typical retail business, common methods of prediction are either based on the sales of a given product last week/month or a moving average.  At a minimum, we use those as easily relatable, effective benchmarks. As seen below, traditional methods have a tendency to over-shoot and suffer from a forecast accuracy standpoint in comparison with catsAi model predictions.

CatsAi Benchmark

Delivering Real Value

Indeed, our experience has shown that catsAi predictions are commonly between 85% and 94% accurate, often anywhere from 30% to a whopping 70% more accurate than initial state or benchmarks. This translates into up to 80% reduced waste based on category or SKU being analyzed.

Furthermore, our customers love that we can go from initial contact to the first set of predictions in as little as 48 hours, iterating on from there. They also highly value the lightweight process and client journey which can be summarized as sales data in, on-the-mark predictions out. Did I mention, no setup charges and low subscription options? All of this really means, with Machine Learning-as-a-Service platforms like BigML everyone from the smallest high-street companies to global enterprises can easily deploy Machine Learning. This is no longer a wish list item but a day-to-day reality of many early adopter businesses willing to experiment with this foundational technology that will determine whether their businesses can withstand the macro challenges of our world as well as increased competition.

%d bloggers like this: