Skip to content

Webinar Video: Machine Learning Fights Financial Crime

We are excited to share the video of the joint webinar we hosted yesterday with BigML’s partner, INFORM GmbH, where they shared their expertise on fighting financial crime with Hybrid AI. The webinar covered how the utilization of Machine Learning as a new tool against fraud detection can augment existing rule-based systems financial institutions employ to stop fraudulent transactions. In fact, the results are more promising when combining a systematic Machine Learning approach with knowledge-based methods such as mixed logic rule sets, fuzzy logic scorecards, or dynamic profiling. 

This new way is now generally available, in essence, commercializing the Hybrid Artificial Intelligence concept in the form of RiskShieldML, a world-class risk assessment, fraud prevention, and Anti-Money Laundering (AML) compliance monitoring solution built on top of the BigML platform to more efficiently protect any organization against financial crime.

In this video, Kevin Nagel, Consultant and Data Scientist at INFORM, showcases the benefits of the Hybrid AI strategy in relation to alternative approaches and complements his presentation with a live demo. In case you could not make it to the live webinar yesterday, you can still catch up with the webinar recording we have just posted on the BigML Youtube channel. We also invite you to check out the accompanying presentation, available on the BigML SlideShare page.

Do not hesitate to contact or to leave your feedback and ask any follow-up questions specific to your company or data.

Stay tuned for future webinars with similar concrete examples of Machine Learning applications!

More Machine Learning in your Google Sheets

It’s been a while since the first version of BigML’s add-on for Google Sheets. The post announcing it described how one could add predictions to Google Sheets cells by using BigML’s Decision Trees. It was also possible to apply segmentation to the rows in a spreadsheet by tapping into Clustering Models previously created in BigML.

During these years, BigML has been adding new supervised and unsupervised models to its portfolio of native resources. All along, the add-on has been steadily updated to include most of them, like Logistic and Linear Regressions. However, so far it has not been possible to upload to or download the information in the spreadsheet to BigML. On the contrary, the models in BigML were downloaded to Google machines, where predictions were computed. That implied some limitations, because Google sets some limits to the size of the objects that can be downloaded. Therefore, heavier models like Deepnets or Anomaly Detectors could not be added to the add-on model list.

We’re happy to share that this last version of BigML’s add-on has overcome these limits to provide more flexibility and options to users. This video shows a quick taste of the add-on functions that will be explained in this post.

Now, the add-on includes two more options: upload to and download from BigML.

When uploading the information from Google Sheets to BigML the result is a Source resource that contains the data dictionary describing how data is parsed i.e., the number of fields, their names, and types. From that, a Dataset containing the values of the fields can be built. That opens up plenty of possibilities to extract insights from your data, because datasets are the starting point for all Machine Learning procedures, like modeling, scoring, or evaluating.

Let’s learn by example about the new capabilities in the add-on. I was curious about Tenet, the last film by Christopher Nolan, so I searched for twits talking about Nolan’s films and created a small sample in a Google Sheet.

My goal would be to predict the sentiment associated with each sentence so that I don’t need to read the opinions to know if it would be worth seeing the movie. In order to do that, we need a large enough Dataset that contains sentences and the sentiment label (positive or negative) associated with each one. In BigML’s datasets gallery, we can easily find a Review Text Sentiment dataset that seems fortunately fit that description:

We can clone the dataset to our account by clicking on the FREE label that you see in the top right corner. Once the dataset is cloned, we can inspect the kind of information that it contains.

There are two fields: sentiment, a categorical field that contains only two labels (positive or negative), and text, a text field that contains the sentences that have been previously labeled and will be used as training data. We can also see the kind of topics discussed by looking at the text field tag cloud.

We observe that the dataset contains opinions about movies and they are already classified as positive or negative. That’s exactly what the algorithm needs to build a model to predict the sentiment associated with a particular sentence. Therefore, we can create a Deepnet in 1-click:

The next step is using BigML’s Review Text Sentiment dataset information to assign a label to those opinions. BigML’s add-on will allow us to locate the Deepnet we just created. Simply select the Start action in the add-on menu and search for Deepnets in the dropdown.

The list of your Deepnets will appear. Clicking on the link of the Review Text Sentiment Deepnet, you should end up in the predict view. Pressing the predict button, the add-on sends every sentence to BigML and runs them through the model and brings back the corresponding sentiment labels and the confidences associated with each prediction.

Of course, this one-by-one process can be slow if you need to classify a lot of rows. In this case, a different approach is recommended. Open the add-on menu and use the Upload to BigML action to upload the contents of the active Sheet to BigML, where a Source will be created.

The Source’s view menu allows creating a Dataset in 1-click, summarizing all the contents of your sheet.

At this point, you’re ready to go back to the Deepnet view, where the actions menu offers a Batch Prediction action.

It applies the selected model to each row of your dataset and adds a new column along with the prediction results. Simply select the Dataset that was created after uploading your active sheet to BigML in the right combo box and press the Predict button when activated. The list of datasets appears when typing the first characters of the name of your active sheet.

There you are! A new dataset with a sentiment column appended is ready for you in BigML. You just need to download it to Google Sheets. To do this, open the add-on menu and select the Download from BigML action.

The newly created dataset should appear first in the list. Click the link to download the information.

A new Sheet will appear in your file with both the original sentences and the sentiment label associated with them.

Of course, the size of data that can be uploaded or downloaded using the add-on is limited. Google sets different limits depending on the kind of account you are running on their site. Still, you can always upload any amount of data by creating a CSV and dragging and dropping it to BigML. Similarly, any Batch Prediction can be downloaded from BigML directly as a CSV.

As you can see, the new options in BigML’s add-on for Google Sheets offer great ease of use. It also enriches your data with all the insights that can be drawn from the entire set of models and workflows available in BigML. What are you waiting for? Give it a try and let us know how you like it!

Registration Open for FREE Webinar: ‘Detecting Fraud with Hybrid AI’ (October 28, 2020)

In collaboration with BigML partner, INFORM Gmbh, we’re pleased to bring the BigML community a new educational webinar: Machine Learning Fights Financial Crime. This FREE virtual event will take place on October 28, 2020, at 8:00 AM PDT / 9:00 AM PDT and it’s the ideal learning opportunity for Financial institutions, banking sector professionals, credit professionals, risk advisers, crime fighters, fraud professionals, and anyone interested in finding out about the latest financial crime-fighting and risk analysis strategies and trends.

Moving from Lab to Production

Financial institutions must innovate to stop the onslaught of fraudulent transactions. The utilization of Machine Learning as a tool for fraud detection is trending. Combining Machine Learning with existing intelligent and dynamic rule sets produces a sustainable strategy to address this challenge. Hybrid Artificial Intelligence takes historical transactional data and learns from past decisions using both supervised and unsupervised learning algorithms and combines it with knowledge-based methods such as mixed logic rule sets, fuzzy logic scorecards, and dynamic profiling. Financial institutions and professionals can benefit from this powerful combination that detects new modus operandi in a digital environment.

The webinar will showcase the RiskShield Machine Learning solution, a world-class risk assessment, fraud prevention, and Anti-Money Laundering (AML) compliance monitoring solution built on top of the BigML platform to protect your organization against financial crime.



  • 1-hour event: Wed, Oct 28, 2020 8:00 AM – 9:00 AM PDT


  • Please follow this link and reserve your spot. We will do our best to accommodate as many participants as possible, however, we recommend that you register soon.


  • Speaker details can be accessed here as well as the full agenda.

We look forward to your participation!


Perspectives on Self-serve Machine Learning for Rapid Insights in Healthcare

BigML users keep inspiring us with their creativity every day. Many take on Machine Learning with little to no background or education in the field. Why? Because they have access to relevant data and they are smart professionals with the kind of intuition only years of study in a certain field can bring about. It’s no surprise, then, that many of them come to suspect that there must be a better way than the good old descriptive spreadsheet analytics or plain vanilla business intelligence tool reports to solve their business problems. A natural curiosity and a self-starter attitude to actively experiment don’t hurt either!

Dr. Patrick Gladding

Long time BigML user Dr. Patrick Gladding is no exception. Dr. Gladding is practicing medicine in New Zealand, which is, fortunately, doing pretty good in their fight to eradicate COVID-19 these days. We’ve approached Dr. Gladding to find out his motivation behind picking up BigML in the first place as well as his informed opinions on how healthcare professionals can transition into more self-sufficient “man + machine” teams leveraging the full power of Machine Learning in their day-to-day routines at hospitals, clinics, and healthcare research institutions across the globe.

  1. Can you please tell us about your background and how you first developed an interest in Machine Learning?
    • I’m a clinical cardiologist trained in echocardiography and with an interest in genetics. I was involved in some pharmacogenetics studies years ago, where a combinatorial approach was needed to analyze the influence of a number of genetic variants. We used simple neural networks then well over ten years ago, so that would have been my first experience with Machine Learning. Over time, the terminology has changed from predictive analytics to Machine Learning and artificial intelligence, and the technology has evolved to become more capable and accessible. It once took a Ph.D. and access to a computer science lab with proprietary algorithms but now many of the most effective techniques are freely available and usable by anyone. Simultaneous to this has been the digitization of health records, which means it is much easier to aggregate and link patient data to outcomes. This along with the availability of excellent, user-friendly Machine Learning tools like BigML has made it very easy to do some high-quality projects in AI.
  2. Can you tell us (more) about one of your Machine Learning projects and how it made an impact in your field?
    • Given the ample availability of data now, there are almost infinite questions that can be posed. We’ve looked at many conditions and outcomes in cardiovascular medicine to see whether they can be predicted with existing clinical data. This includes predicting mortality from routine blood tests and echocardiography results and we’ve had very similar results to studies in the USA performed at Geisinger Health.
    • Some things seem very hard to predict such as atrial fibrillation as this is a more chaotic, stochastic condition. There are some very strong signals for heart failure both in predicting the presence of the condition as well as predicting outcomes. Heart failure is a common condition that has very poor health outcomes and is a diagnosis that is sometimes missed. We’ve shown the possibility of predicting the presence of heart failure using simple blood results such as a complete blood count and electrocardiogram with BigML.
    • What is really appreciated about BigML is that it provides a wide array of Machine Learning methods including more traditional logistic regression, decision trees as well as deep neural networks. It’s great that BigML provides explainability and transparency of the features that make up the predictive model. This allows me to vet these for things that shouldn’t be included and verifies the importance of features that have previously been shown to predict outcomes. For instance, albumin is a strong feature in models of mortality and this has been shown in previous population studies using standard biostatistics.
    • Similarly, hematocrit is a feature of heart failure predictions, which has been shown to be important in other studies. Using the unsupervised Machine Learning clustering features in BigML, we’ve been able to validate work by others (Shah et al) in the clustering of heart failure subtypes, e.g., heart failure with preserved ejection fraction (HFpEF). This opens up the potential for mass screening hospital patients and then subtyping patients, who might benefit from different treatments. This could improve the diagnosis of management of heart failure considerably but does carry with it implications of altering downstream use of diagnostic testing, therapies, cost, and also potential mislabelling of patients as no predictive model is perfect. There are a lot of ethical and service issues we have yet to work through, but the benefit of using BigML is that everything is transparent, explainable, verifiable, and easily validated. It is quite easy for instance to upload a new large dataset, apply a Machine Learning algorithm, and get some results within minutes, without the need for buying an expensive high-performance computer.
  3. What are the top three challenges for doctors and healthcare professionals in adopting Machine Learning? What’s the easiest path to go from data to insights and predictions? Any words of caution?
    • First comes understanding what Machine Learning can do and its limitations, which includes not completely buying into the hype. Conventional biostatistics are still really important.
    • Second, data is sometimes difficult to access. It takes time, persistence, good ethics, privacy, and security measures such as anonymizing data. Getting the data may require some coding abilities, e.g., SQL queries or Python scripting but we have integrated a system called Qlik into our hospital, which means that data is more easily linked and exportable as a CSV file. However, the overall benefit of BigML is that no coding experience is required to use it.
    • Third, a lot of data is required to generate models that cover the wide variability of human disease and interactions. This means collaborating with others, sharing data, and getting insight into a particular field. Most doctors have a wealth of knowledge and expertise. This means they know what the important questions are but working with Machine Learning still requires talking to data scientists, statisticians, and others to ensure that it’s applied appropriately. Combining a domain expert (doctor) with experts in Machine Learning is a very potent force, and BigML is a great platform to take on some of that role. Sharing health data can be a challenge, but it is really the only way forward for this field.
    • What is really important is to obtain quality data, and spend time tidying it up. The axiom “garbage in, garbage out” still applies regardless of whatever fancy new Machine Learning method is used. Machine Learning is a powerful tool but the ease of use and automation in BigML should not substitute for laziness in terms of evaluating, retesting, and validating models. The exciting thing about predictive algorithms is that they could have a very large impact and improve health. At the same time a biased, poorly fitting model that makes bad predictions could worsen outcomes and health disparities. This includes models that could include racial bias, which has been demonstrated before by Google and others. The scale to do good is equal to the scale to do bad so caution must be applied. This would also be relevant to non-healthcare predictions such as loan defaults.
  4. Once you built models and uncovered key insights from your data, how did you use them to mobilize your organization around findings? What is the best way to collaborate with others that may not be much into Machine Learning?
    • BigML has great visualization features and in-built evaluation models that demonstrate sensitivity and specificity for predictions. These outputs are much more understandable to clinicians, e.g., phi, F-measure, and recall. There is a bit of jargon in this field that doctors need to get their heads around. Mobilizing our organization was easy showing them the BigML visualizations. BigML has an enterprise version, which means it can be run on-site within a hospital network, so as to preserve the data security measures already in place.
    • Collaboration through BigML is a really good way of having a kind of escrow for health data as there is not a lot of data sharing that goes on. The reticence to share is probably due to a number of reasons like confidentiality and privacy around the use of health data but also the desire to commercialize results or preserve proprietary algorithms. Basically, if you have data it is now incredibly valuable, so even if you are not into Machine Learning there is no doubt that someone will want to talk to you about it if you have a lot of high-quality data. I would love to see a portal through BigML, where health care professionals could share either data or models or both.
  5. Do you have anything else to share regarding your experience with the BigML platform?
    • BigML is a great system for non-coding doctors and others who want an easy to use and understandable system. It took only about two hours of one-on-one tutoring to get my head around the platform, as well as watching YouTube videos. The customer service is excellent with friendly experts willing to give you help with any application. It’s great that BigML has integrated several Machine Learning methods, which include logistic regression which has been used for decades in the medical literature. It’s something doctors are very familiar with and it is worth noting that it often predicts better than deep neural networks. So despite the hype around deep learning, it is not the perfect tool for everything. Deep learning however is very good when applied to images and unstructured data so I am looking forward to new applications coming from BigML including AI image analysis.

Hope you enjoyed this interview and found something useful to directly apply in your projects. Do you have a similar Machine Learning adoption story you’d like to share with the BigML community? We’d be more than happy to spread the lessons learned for the greater good.  Please don’t hesitate to contact us at and stay safe!

Panda ID Soluciones and BigML Join Forces to Lead the Adoption of Machine Learning in Business

We are thrilled to announce that Panda ID Soluciones, a leading company in the development of smart solutions, and BigML, the leading Machine Learning platform since 2011, are coming together to help organizations that are motivated to capitalize on the intrinsic value of their data to build and deploy intelligent Machine Learning applications to improve processes, maximize profits, and stand out in their sectors.

Panda ID Soluciones has been developing specialized solutions for each business for 25 years based on the treatment of data and identification of people, assets, and transactions. The company has always been characterized by the development of solutions that add value to businesses by stressing the importance and relevance of data and the information hidden in it as well as transactions and in the added value that they bring in being able to understand the activities that their clients are performing in real-time. Panda ID teaches in a practical way how to identify threats, generate patterns, measure trends, detect anomalies, and propose strategies in a timely manner.

Currently, there are still companies that don’t know what Machine Learning is, the positive impact it could have on their businesses, or how to optimize resources and processes to improve results by applying the appropriate Machine Learning techniques. Consequently, they do not foresee that the adoption of Machine Learning in their organizations can be a simple process. This lack of understanding can lead to these companies wasting resources, efforts, and valuable projects that often don’t go into production because they have been poorly managed.

Panda ID Soluciones has taken into account this widespread misinformation about Machine Learning, which has been a key factor in carrying out this alliance with BigML. Our objective with this alliance is to help organizations from all sectors understand that there are specific and successful Machine Learning applications for each business and for each project. With our new, practical, and result-oriented approach, those firms will find the hidden value that their data contains to rapidly realize quantifiable benefits.

BigML and Panda ID Soluciones will create change that leads organizations to prioritize and give priority to the treatment of their data. Both companies will expertly guide businesses in each sector and industry, thus making clients actively participate in the Machine Learning adoption process to arrive at results that generate value, meaning, and greater benefits.

Introducing the BigML Machine Learning platform to the Panda ID Soluciones portfolio, the multinational with headquarters in Colombia and Venezuela will achieve a transformation in the way of seeing and understanding the true meaning of Machine Learning. Business data grows every day and the pressure to obtain the best information from it has become paramount. This is why organizations must act now, competition is intensifying requiring more precise models to be built in order to create and identify new opportunities that are profitable. The era of Machine Learning is here and now.

Alberto González, CEO of Panda ID Soluciones, highlights: “BigML is the most suitable company with which we could partner to offer Machine Learning solutions to the market. Among its main premises is offering real solutions to real problems that can be quickly implemented and put into production. This means sincerity towards clients and that goes hand in hand with what we profess. But without a doubt who wins more, is the market, our clients, and that is a reason for great joy.”

Alberto Ariza, VP of Strategic Alliances at BigML, points out: “With Panda ID Soluciones we have found the ideal partner to lean on in Colombia and the entire geographical area of ​​the Caribbean, to achieve a solid entry that allows us to apply in its business sector one of the most innovative technologies worldwide in the field of Machine Learning, which is exactly what we bring to the table at BigML.”

Machine Learning Fights Cannibalization in the Retail Industry

This guest post is authored by Olena Skarlat, Stefanie Pichler, Beatrice Bunjaku, and Pamela Martin, from Vertical Market Solutions at A1 Digital, a BigML Partner. 

Machine Learning has proven effective in providing insights into data and processes that drive business decisions in any industry domain. However, the high volume and velocity of data make it challenging to get those insights both proactively, depending on already established processes, and reactively, accounting for the unknown. In this blog post, we give you an overview of the methodology on identifying and addressing the problem of cannibalization of products caused by promotional campaigns in the retail domain. The use case discussed here aims to analyze the impact of promotional products, specifically to identify cannibalization effects, i.e., when the promotions decrease sales of the non-promotional products dramatically.

An example: A chain of supermarkets decides to have a promotion on 500 grams of chicken breasts. This promotion has significant effects on the sales figures for certain other products. In our example, the sales of beef steaks and various turkey products dropped by more than 20% within the period of the promotion. However, many of those associations are not discovered upfront. If the retail management could tackle such an influence in advance of the promotion of other products, they could adjust their demand planning to order products that will be in demand and order fewer products that will be less popular during the promotional period.

Therefore, in this use case, we predict the demand for products accounting for promotional campaigns and as a result, stocks can be adjusted, and waste is reduced. We have implemented this use case as a fully automated Machine Learning application that is capable of (i) learning Machine Learning models on data, (ii) providing valuable insights, and (iii) performing monitoring and assessing Machine Learning model competence over time.

The input data for this use case is from a retailer that has multiple supermarkets in different regions. It contains sales transactions for meat products over several years. Data also includes information on whether the product was on promotion over a certain period. The outputs of this use case can be used by two different roles: (i) a demand planner (operational outputs) and (ii) a business analyst (Machine Learning analytics). The operational output contains insights about the negative impacts of promotional products on other products. The purpose of getting such insights is to adjust the stock in supermarkets accordingly and not to order, for instance, foods that spoil fast if it is expected that their sales will be decreasing. The Machine Learning analytics output contains Machine Learning models, performance parameters, and evaluation results over time. These insights provide the opportunity to assess the performance of Machine Learning models over time by using new incoming daily sales transaction data while intermittently retraining those models when their performance takes a dive.

The methodology includes building Machine Learning models to get novel promotional cannibalization insights and creating workflows to enable the Machine Learning model life cycle, monitoring, and evaluation. The vital part of this use case is the A1 Digital Machine Learning Platform powered by BigML. The platform fully automates the time-consuming work of hand-tuning Machine Learning models and executing complex custom workflows they are part of. Figures 1 and 2 accordingly show the life cycle of this use case along with the main machine learning workflow. Other automated workflows deal with data transformation, receiving predictions, and performing daily evaluations of models.

Figure 1: Use case life cycle overview.

Figure 2: Machine Learning workflow overview.

Machine Learning models in this use case include regressions, association discovery, and anomaly detectors. Regression models are used to predict expected sales for products to analyze and estimate if the actual sales of non-promotional products are decreasing or increasing because of promotional campaigns. Association discovery finds ‘significant’ associations, or so-called association rules, between promotional and non-promotional products. We are interested in identifying the negative impact of promotions, i.e., instances where the expected sales of products are decreased by more than 20%. For example, Figure 3 shows the associations generated for the pairs of products with a considerable sales decline compared to the expected sales. A promotional product with the identifier id=141 is affected by the non-promotional product with the id=89, non-promotional products with id=44, and id=62 are affected by the promotional product id=196 and so on.

Figure 3: Association rules between promotional and non-promotional products.

The results of the association discovery are converted into sales decrease percentages, i.e., showing how sales for certain products will be slipping depending on promotional products during the promotion period. These results can be used to proactively analyze promotions during their planning phase and to adjust the impacted retail SKUs ahead of time. This is especially important when the products spoil fast, i.e., food items and drinks. For example, as it is shown in Figure 4, the promotional product PRODUCT-45 is expected to negatively affect sales of products PRODUCT-98, PRODUCT-53, PRODUCT-144, and so on. This means that it will be efficient to stock less of those non-promotional products during the promotional period of the PRODUCT-45 to save money and reduce any possible waste.

Figure 4: The impact of promotional PRODUCT-45 on related non-promotional products.

Once the association discovery model is created, it’s a good idea to monitor how well those association rules perform on a daily basis to constantly learn from the new incoming sales transaction data. Therefore, we also train anomaly detectors, which is a powerful tool to measure the reliability of association rules. We build an anomaly detector every time the association rules are produced. Having quantified how anomalous the new daily sales transaction data distribution is, we can get a sense of how different the new data is from the data that was used to produce the original association rules (see Figure 5). This approach tells Machine Learning analysts when to retrain the association rules. Having a high anomaly score for a certain period means that the association rules do not particularly apply to the new sales transactions perhaps due to changes in customer behavior or a major event such as the Coronavirus outbreak causing dramatic societal shifts. When the association rules are updated, it is once again advisable to allow for a testing period to evaluate if the new rules perform better on the new incoming sales transaction data.

Figure 5: The rate of anomalous sales transactions data incoming every day.

This use case is a good example of how Machine Learning can provide an objective overview of how various promotional campaigns affect sales of the non-promotional products in the retail domain. Machine Learning comes to the rescue to identify and tackle the negative effects of promotional campaigns on other products to more proactively adjust their stock, reduce waste, and most importantly protect the retailer’s margin.

Let us know if you have a similar problem and stay tuned for more case studies in the near future!

Machine Learning in Retail and Wholesale: accurate and affordable Demand Forecasting by catsAi

This guest post is originally authored by Stephen Kinns, Founder and CEO of catsAi.

Many business decisions can be traced back to a simple question: ‘How much will we sell?’. Firms, both large and small, widely rely on experience and historical trends to make that assessment yet the accuracy of these approaches can be very poor, which in turn translates to missed efficiency savings through the business.

Machine Learning-based predictions can be much more accurate, but historically the cost and complexity of such technology have made this an uneconomical option for many firms. Despite this challenging backdrop, catsAi’s unique approach makes Machine Learning useful, easy to implement, and cheap for retail, wholesale and other businesses. As such, we have created a lightweight, off-the-shelf solution for demand prediction, which drives easy and rapid adoption at firms of all sizes through supply chain intelligence. With the aid of state-of-the-art Machine Learning powered by BigML, catsAi offers reliable predictive sales numbers on a daily basis, for the week ahead, on each and every product. 


Our existing clients and partners make up a wide variety of retail firms from the smallest high-street store such as bakeries, through to large global wholesaler enterprises. The core challenge is that every client, every location, and every product is different. Therefore, being able to adapt to a wide variety of products and clients has proven key to catsAi’s burgeoning success.

From Raw Data to Production and Benchmarking

Machine Learning as a tool excels in exploring historical patterns. For many firms, the factors that influence sales patterns are diverse; no two firms are alike. Location, weather, cultural influences, and of course changing inventories may all affect likely sales. This means that should a firm wish to implement a Machine Learning solution themselves all this data must be acquired, cleaned, assessed, and analyzed. Datasets can come in a huge variety of shapes and sizes ranging from a few thousand rows to 10+ million observations.

To solve for this variety and keep costs-down, catsAi continually builds bespoke datasets, trains models, evaluates, and then deploys them automatically without any human intervention. When done, catsAi’s data pipelines paint a detailed picture of the influencing factors behind changing sales dynamics complete with custom-developed features for the specific to the client. 

The datasets are then securely sent to the BigML system to initiate the model training process, which we manage through the use of available tuning parameters, configuration options, and event handlers. catsAi assesses and evaluates the results of each training run before making a final decision on deployment for ‘live’ predictions in an agile manner.

Over the years, we have evolved from a neural network on a laptop, to a full-fledged cloud-based system thanks to BigML’s support. The BigML suite of tools, both at the REST API and graphical Dashboard level, has considerably accelerated our deployment time-frames. We are now able to scale to match our clients’ expectations while simultaneously maturing our models iteratively.

The resulting effect of our approach replaces the typical complex and time-consuming data-science process by breaking it into small manageable pieces that can be executed automatically. This means customers can autonomously deploy the predictions swiftly and affordably whilst maintaining accuracy and control. 

Although a few successful runs can help secure the client’s trust we often need to prove the ongoing value of the predictive models to a client, so we continually set some simple benchmarks. In the absence of Machine Learning, in a typical retail business, common methods of prediction are either based on the sales of a given product last week/month or a moving average.  At a minimum, we use those as easily relatable, effective benchmarks. As seen below, traditional methods have a tendency to over-shoot and suffer from a forecast accuracy standpoint in comparison with catsAi model predictions.

CatsAi Benchmark

Delivering Real Value

Indeed, our experience has shown that catsAi predictions are commonly between 85% and 94% accurate, often anywhere from 30% to a whopping 70% more accurate than initial state or benchmarks. This translates into up to 80% reduced waste based on category or SKU being analyzed.

Furthermore, our customers love that we can go from initial contact to the first set of predictions in as little as 48 hours, iterating on from there. They also highly value the lightweight process and client journey which can be summarized as sales data in, on-the-mark predictions out. Did I mention, no setup charges and low subscription options? All of this really means, with Machine Learning-as-a-Service platforms like BigML everyone from the smallest high-street companies to global enterprises can easily deploy Machine Learning. This is no longer a wish list item but a day-to-day reality of many early adopter businesses willing to experiment with this foundational technology that will determine whether their businesses can withstand the macro challenges of our world as well as increased competition.

PlusVitech Uses Machine Learning for Drug Treatment to Win the #EUvsVirus Hackathon

This guest post is originally authored by Fran Guillen, CBO of PlusVitech, and Vicente Salinas, CEO of Plusvitech.

A Little Bit of Context

PlusVitech is a Spanish company that was founded in 2013 with the key mission to improve people’s quality of life by finding solutions for high-impact diseases such as cancer. In particular, our strategy has always been to search for treatments that already exist in the market, which can be used for cancer. This strategy has many advantages: the cost of development is much lower than that of new drugs, candidate drugs have already been shown to be safe when administered to humans, and they can be immediately available for the new indication after approval. This strategy is called repositioning in the pharma world and it has already taken place with Viagra or Propecia, among others. In our case, we have very promising evidence with complete remissions in different types of cancer, even in more advanced stages.

However, this past March, when the COVID-19 epidemic broke out, we realized that some of the solutions we had for cancer could also be useful in treating COVID-19 infections in some people. It is not the virus itself that causes fatalities, but rather the reactions that take place inside our body. After all, the mechanisms activated by the human body for different situations are very similar. In particular, for COVID-19, there is a cascaded lung inflammation, very similar to an allergic reaction, or to the pulmonary inflammation that occurs in lung cancer. Often, it’s this inflammation that generates severe lung damage and pneumonia that leads to death from Coronavirus. Therefore, according to our thesis, if we were able to solve the lung inflammation, we could also stop COVID-19 deaths and any of its mutations, which is what we have patented worldwide.

PlusVitech and BigML against Covid19

About a month after our discovery, the European Commission, in collaboration with the EU Member States, held the Pan-European Hackathon #EUvsVirus to identify effective proposals towards curing the adverse effects of the pandemic. We decided to attend this call from the EU with our PVT-COVID project.

The PVT-COVID Project

The weekend was quite intense. For starters, our PlusVitech team and two more people from the hackathon joined the project disinterestedly, as well as various mentors and experts. We developed the business model based on licensing the patent to pharma companies who already produce this type of drug to ensure its availability immediately after obtaining approval by the regulatory agencies of each country. Also, we worked diligently on defining the necessary clinical trial protocol to approve the treatment for COVID-19 and the contacts to be made with hospitals and the Spanish Drug Agency.

However, in the process of designing the protocol, we found that each COVID-19 patient is different from others which means the patient is in a different clinical state. This requires different treatments to address different individual needs. Some of these patients are at home, others hospitalized, and the most severe ones in the ICU, with various levels of oxygen saturation, cough, or fever. This scenario is quite challenging for health professionals, as the treatments need to be personalized in order to be effective. For instance, there are patients so severe that they are intubated thus they can’t take medication orally. This simple idea made us realize that we can do better than just having a single treatment. Instead, we focused on personalized treatments with factors such as dosage, time, and even in combinations with other treatments to play with. Taking this idea into account, ideally, a hospital doctor could enter the patient’s data into an online system and obtain the most appropriate treatment for him in real-time.

Preparing such a system, even if it was only a prototype, was too much in the few hours that remained, as we only had a few more hours before the hackathon was to be over at 9:00 AM on Monday morning. At night, while sleeping, Fran Guillen, CBO of PlusVitech, had a dream about using BigML to solve this problem! So he got up at 5:00 AM, opened a free account in BigML, and, in a few hours, prepared an initial table in Google Sheets with characteristics and clinical states of patients, crossing it with preliminary results that we have from our treatment.

Sample of the table of anonymous patient data imported directly from Google Sheets.

Fran has almost no background in Machine Learning, but he was able to upload it to BigML, generating a dataset with the 1-click option and, henceforth, a Model, again with the 1-click option.

One of the COVID-19 treatment models that we generated with the 1-click option.

When the rest of the team woke up a few hours later they were amazed! The system allowed us to generate predictions of what would be the best drug treatment to apply for different COVID-19 patient cases, as it considers each of their health characteristics.

Prediction from the COVID-19 model generated with the 1-click option.

Just a few hours later, and after the sleepless night on Sunday, we finally presented the project a couple of hours before the end of the hackathon term, including in the pitch deck the explanation of the work done in the predictive system created using the BigML Dashboard. 

The #EUvsVirus Hackathon Results

The #EUvsVirus hackathon has been the largest hackathon in history worldwide, surpassing even those held previously by Google or Facebook, among others. More than 20,900 participants and 2,100 solutions were presented to fight the COVID-19 pandemic, judged on the potential of their social impact, their scalability prospects, the real possibility of launching the prototype, and a coherent business plan. Fortunately, our project called PVT-COVID, turned out to be one of the winners in the Life and Health category and the single winner in the pharmacological area! Additionally, this award has had an extraordinary reception in the Spanish media, appearing in print and digital newspapers, radio, and television.

Subsequently, PVT-COVID has been selected among the winners specifically for the DemoDay, which was last Thursday, May 21, where we were able to present our project to hundreds of European institutions and investors with the aim of obtaining partners and financing of the clinical trial that allows us to approve our treatment against COVID-19, since PlusVitech is seeking funding to carry out the phase 2 clinical trial that costs about €500,000 and we could have it approved in just 2 months, in addition to another €500,000 to continue with the approval of cancer treatment.

For all this, we want to thank the BigML team that enables so many projects like ours to become reality — especially in a field as crucial as healthcare presenting humanity with complex challenges like cancer and COVID-19. We hope you enjoyed our story on how Machine Learning helped us better predict the ideal drug treatment for COVID-19 patients. We will soon be authoring another article, which will explain our cancer treatment prediction system we are developing on top of BigML. So please stay tuned!

A1 Digital and BigML join forces to Support COVID-19 Research

We’re happy to report that BigML along with our partner A1 Digital, an expert in digitization and part of Telekom Austria Group, are jointly taking the initiative to extend Machine Learning capacity to research institutions free of charge. With this timely offer, A1 Digital is making the full potential of BigML’s state-of-the-art analytics capabilities available to combat the COVID 19 pandemic.

The Machine Learning-as-a-Service offering, hosted in A1 Digital’s secure, EU-GDPR-compliant Exoscale Cloud, is available to medical as well as commercial and non-profit research institutions dealing with the economic and social consequences of the pandemic. It is limited to a number of qualified institutions in Europe. Interested institutions and research groups can find more information and an application form here.

A1 Digital COVID-19

Francis Cepero, Director Vertical Market Solutions at A1 Digital, commented “The current exceptional situation confronts us all with completely new challenges of a medical, political, economic and social nature. To get the pandemic under control, research institutions are showing unprecedented efforts. In doing so, they are generating large amounts of data that they must analyze as quickly as possible in order to arrive at relevant results. This is where our Machine Learning Platform powered by BigML offers the necessary support. We have analyzed how we can best support the efforts to overcome the current crisis and are therefore making the Machine Learning-as-a-Service offering available to qualified groups and institutions free of charge.”

BigML’s CEO, Francisco Martin, shared “When A1 Digital approached us with a proposal to make our Machine Learning platform available to qualified organizations free of charge to help fight the COVID-19 pandemic, we were thrilled. Our platform is particularly suited for this context as the need for streamlining Machine Learning workflows from raw data to insights and production models is paramount given the time pressure public healthcare professionals are under. BigML requires no programming knowledge or prior Machine Learning experience to produce interpretable models. It also fosters collaboration to engage domain experts, who need to weigh in on those results before models are deployed in the field. Finally, because the platform runs in A1 Digital’s Exoscale Cloud, it can be used immediately and users do not need to worry about data security and compliance with data protection regulations.”

In summary, we’re looking forward to positive contributions from interested research institutions towards alleviating the adverse effects of the COVID-19 pandemic on the world population as we believe Machine Learning is the perfect 21st-century tool to accelerate serendipitous discoveries in these challenging times.

Machine Learning in Industrial Chemicals: Process Quality Optimization

This post is the last in our series of 5 blog posts highlighting use case presentations from the 2nd Edition of Seville Machine Learning School (MLSEV). You may also check out the previous posts about the 6 Challenges of Machine Learning, Predicting Oil Temperature Anomalies in a Tunnel Boring Machine, Optimization of Passenger Waiting Time for Elevators, or Applying Topic Modeling to improve Call Center Operations.

Today, we delve into a use case from the chemicals industry originally presented by José Cárdenas, Technical Services Manager at Indorama Ventures. Headquartered in Bangkok, Thailand, Indorama Ventures started its journey in 1994 specializing in the production of worsted wool yarn, which is typically used in tailored garments and textiles such as suits. Gradually, the company completed its global expansion with acquisitions in the United States and Europe, eventually becoming a global PET (Polyethylene Terephthalate) producer as well as a sizable player in the PTA (Purified Terephthalic Acid) business. Today, Indorama operates production sites in 31 countries on five continents – in Africa, Americas, Asia, Europe & Eurasia.

Indorama Ventures

PET is manufactured in the form of pellets that are then melted to produce packaging material (food and beverage containers, bottles) and various polyester fibers consumed in industries from automotive to medical. The specific Machine Learning project Indorama tackled involved the carboxylic acid process, which has a critical role in PET production of various grades, e.g., hot fill, high/low intrinsic viscosity, quick heat, general grade.

Carboxylic Acid Process

The Carboxylic Acid Process from Xylene to PET

In this project, the Indorama team was mainly concerned with the optimization of the above process, which can be translated as achieving cost reductions while not compromising from key PET quality parameters. In other words, it’s the age-old “do more with less” challenge, whereby the inputs to the process are used more efficiently than it is in the status quo.

Fortunately, the project team had access to 2.5 years worth of detailed chemical process data containing more than 6 million data points. This highly technical dataset described many aspects of the complex chemical process such as throughput, catalyst material concentration, feed to air ratio, oxygen level, and more. In order to pick apart the best signals out of it, technical domain experts turned to anomaly detection, outlier elimination, and association discovery by applying BigML’s handy unsupervised methods. Understanding the variable correlations during the data exploration phase was also key in feature selection to further eliminate noise.

The custom BigML workflow for the project

The custom BigML workflow for the project

The Indorama team went through multiple iterations to improve their classification model metrics such as recall to acceptable levels. The team of experts used BigML’s Partial Dependence Plot (PDP) visualizations to analyze the fine-grained impact of combinations of process variables on the PET quality and yield. In return for all the hard work, such close-up model inspection resulted in discoveries even long-time chemical process experts were not previously familiar with. These days they are hard at work in making the necessary changes and upgrades to the underlying chemical processes to mimic the higher efficiency modes of operation predicted by their best performing BigML models some of which were built by OptiML — BigML’s popular ‘AutoML’ capability.

With that, let’s jump into the corresponding MLSEV video describing exactly how Indorama went about implementing their custom Machine Learning workflow and how the subsequent iterations helped them gain actionable insights:

Do you have a similar Industrial Process Optimization challenge?

Depending on your specific needs, BigML provides expert help and consultation services in a wide range of formats including Customized Assistance to turn-key smart application delivery — all built on BigML’s market-leading Machine Learning platform. Do not hesitate to reach out to us anytime to discuss your specific use case at

%d bloggers like this: