Skip to content

PreSeries’ VC-in-a-Box Crowned at the 6th AI Startup Battle in São Paulo

The 6th Artificial Intelligence Startup Battle came to an end on June 21 in São Paulo in fully automated fashion. The jury of this unique battle, PreSeries’ algorithms, predicted with a score of 96.50 out of 100 that Dataholics is the startup most likely to succeed among other contenders. Dataholics captures and structures millions of data points about people on social networks such as Facebook, Linkedin, Google, Twitter, Google search results, blogs, web portals and online services. Their algorithm creates a unified profile for each person based on behavioral, professional and demographic indicators from their email, cell phone, name or ID.

From left to right: Renato Valente – Country Manager – Telefonica Open Future_ & Wayra Brasil; João Gabriel Souza – Co-Founder & CEO – Mr. Descartes; Eduardo D. Martucci – Founder and CEO – Voice Commerce; Daniel Mendes – Founder and CEO – Dataholics; Dhiogo Corrêa – Data Architect – Itera; Rafael Libardi – Public Relations Executive – Data H.

In the battle, all five startups have had the chance to introduce their company during a 5­-minute pitch. Later on, PreSeries’ AI took the time to ask some questions to all the contenders about key aspects of their business. The exchange was made possible through a voice-assistant device present on stage (thus the name ‘VC-in-a-box’).

Itera, came in 2nd with a score of 86.81. Itera is a technology company founded in 2008 and established in São Carlos/SP, always aiming to build innovative solutions for its clients. They are now investing in a machine learning platform for text mining named ALICE. The platform is currently focusing on finance, and marketing case studies.

Mr. Descartes got the 3rd position with a score of 62.13. This company provides a chatbot to help cities improve their waste management and sustainability. They work in collaboration with local governments, businesses, and people from the community in order to generate data, educate the public and build lasting partnerships.

Voice Commerce was the 4th startup in the ranking with a score of 62.12. Voice Commerce is a voicebot that provides anyone with a simple, objective and secure online purchase experience through voice commands. It creates the perfect solution for people with visual impairment when buying goods and services online.

Finally, Data H achieved the 5th position with a score of 62.10. DATA H is a startup focused on creating intelligent products and artificial intelligence outsourcing of research and development. DATA H has created its own ecosystem to enable artificial intelligence projects for a diverse set of sectors.

After the event, BigML’s CEO & Co-founder and President of PreSeries, Francisco J. Martin, said: “Having organized our 6th AI Startup Battle in only the last year and a half across the globe, it is amazing to us that humans are surprisingly open and adaptable in trusting PreSeries algorithms to assess the future prospects of startups. What started as a crazy idea has come to be seen as an obvious need. This can be attributed to the investment professionals being overwhelmed with mountains of new data created every day, which in turn highlights the acute need for objective assistance and automation.”

This edition of the battle took place on June 21 in São Paulo, Brazil, at the PAPIs Connect conference, Latin America’s 1st conference on real-world Machine Learning applications. Our next AI Startup Battle will be in Boston (Microsoft N.E.R.D. – MIT) for PAPIs ’17 (Oct. 24-25), stay tuned on Twitter with #AIStartupBattle and @PreSeries.

Machine Learning: Past, Present and Future by Tom Dietterich

BigML Chief Scientist, Professor Tom Dietterich gave one of the keynotes at the recent 2ML event held in Madrid, Spain. The event was jointly organized by the consultancy Barrabés and BigML, and gathered an audience of 400 decision makers, technology professionals, and industry practitioners. Based on popular demand, we have posted on our YouTube channel the video recording of Dr. Dietterich’s presentation that covers the evolution of Machine Learning since its inception:

The corresponding slide deck can be accessed on the BigML SlideShare page. It goes over the present-day Machine Learning challenges such as Automated Decision Making, Perceptual Tasks, and Anomaly Detection.  It concludes with key future themes that will keep the discipline occupied for years to come: Detection and Correcting for Bias,  Risk-sensitive Optimization, Explanation of Black Box System, Verification and Validation, and  Integrating ML Components into larger software systems. Enjoy!

AI Startup Battle in São Paulo – Meet the Contenders!

PreSeries, the joint venture between Telefónica Open Future_ and BigML, will be hosting a brand new AI Startup Battle at PAPIs Connect on June 21 in São Paulo. PAPIs Connect is Latin America’s 1st conference on real-world Machine Learning applications and will feature talks from BigML, Nubank, Uber, IBM and many more.

But what makes the AI Startup Battle so special? Well, it is the absence of human involvement in selecting the eventual winner. Indeed, a human jury is no longer needed thanks to PreSeries’ AI. Our voice-controlled AI communicates with the contenders live on-stage and generates scores to rank the startups and choose the winner. In our AI Startup Battles, our Artificial Intelligence is made available through a little device on stage. Our little “VC-in-a-box”, asks the contenders a set of questions and chooses its follow-up questions based on answers given to previous ones. It will naturally focus on questions that have the most predictive power in its own bias-free opinion. In the end, the startup with the highest score is announced as the winner.

Meet the contenders!

At this point, you may be wondering who will be competing in the battle, so let’s get to know the contenders.


Dataholics captures and structures millions of data points about people on social networks such as Facebook, Linkedin, Google, Twitter, Google search results, blogs, web portals and online services. Their algorithm creates a unified profile for each person based on behavioral, professional and demographic indicators from their email, cell phone, name or ID.

Voice Commerce

Voice Commerce is a voicebot that provides anyone with a simple, objective and secure online purchase experience through voice commands. It creates the perfect solution for people with visual impairment when buying goods and services online.

Data H

DATA H is a company focused on creating intelligent products and artificial intelligence outsourcing of research and development. DATA H has created its own ecosystem to enable artificial intelligence projects for a diverse set of sectors.

Mr. Descartes

Mr. Descartes provides a chatbot to help cities improve their waste management and sustainability. They work in collaboration with local governments, businesses, and people from the community in order to generate data, educate the public and build lasting partnerships.


Itera is a technology company founded in 2008 and established in São Carlos/SP, always aiming to build innovative solutions for its clients. They are now investing in a machine learning platform for text mining named ALICE. The platform is currently focusing on finance, and marketing case studies.

Stay tuned!

Be sure to stay tuned as the winner will be announced right after the event on social media (on Twitter with #AIStartupBattle) as well as on our blog. For more details, please follow us on: LinkedIn, Google+, Facebook, or Twitter. The countdown starts now!

Results of 5th AI Startup Battle in Palma de Mallorca

PreSeries, hosted the 5th edition of the AI Startup Battle at the exclusive Global Tourism Innovation Summit taking place in Palma de Mallorca, Spain on June 9. The event served as the meeting point of top international professionals discussing topics such as Tourism Innovation / AI / IoT / VR / Smart Destinations / Hotels / OTA / Tour Operator / Airlines / Connectivity / Big Data / New Technologies / Machine Learning / Airports / Ports / Infrastructure / Smart Mobility / Smart Management & Solutions / Dynamic And Immersive / Video Wall Experiences / Digital Signage / Mobile / Apps. The summit was organized by Agora Next Telefónica Open Future_, the first global tourism innovation program from Telefónica, a strategic partner oriented towards companies and entrepreneurs, who aspire to become a global reference of Tourism 4.0. A large crowd of tourism experts and decision makers came to witness the power of the PreSeries Machine Learning algorithms.

Sr. D. Iñigo Valenzuela (center) – CEO of Smartvel – Winner of the 5th edition, receiving his prize alongside Valentín Fernández, Global Director of Business Development and Partnerships at Telefónica Open Future_ (center left) and Kemel Kharbichi, CEO and President of Agora Next (center right).

With a score of 96.51, Smartvel, a SaaS Supplier of Digital Destination Content, was crowned winner of the 5th edition of the AI Startup Battle. They have built a unique tool combining three types of content:

  • dynamic content like up-to-date travel agenda
  • traditional destination content gathered through geolocalization, e.g., restaurants, points of interest, sights and attractions
  • geocoded layers of content recommended by their clients to promote and cross-sell passes, places or events.

The second place finisher was Apartool with a score of 86.82. Apartool is the bridge between apartment blocks, aparthotels, and tour operators. This intermediation project aims to give travel agencies the smartest way to offer a global service to all their clients. Apartools offers, through a platform, the first booking solution specialized in the booking of entire buildings for touristic purposes.

Finally, the third position, with 67.82 points, was for, which has been working for more than a decade in the implementation of a Wi-Fi network in Palma that allows citizens to connect at any time and at no cost. The project, which has been carried out jointly with the City of Palma, has allowed the simultaneous connection of up to ten million mobile devices in the last year.

From left to right: PreSeries’ VC-in-a-Box, Fabien Durand (Marketing Manager at PreSeries), Sr. D. Iñigo Valenzuela (CEO at Smartvel), Sr. D. Mauricio Socias (Founder & CEO at, Sr. D. Marc Vilar (CEO at Apartool) and Julian Vinué (Director of Wayra Barcelona).

We’ve been proud to host another impressive group of startups during this latest edition of our AI battle.  If you are interested in competing in our next AI Startup Battle in São Paulo, Brazil (June 21), please apply here and stay tuned!

Machine Learning Challenges and Impact: an interview with Thomas Dietterich

BigML’s Co-founder and Chief Scientist, Professor Thomas Dietterich was recently interviewed by National Science Review, a peer-reviewed journal aimed at reviewing cutting-edge developments across science and technology in China and around the world.

ML for Ecosystem Management

The piece touches on many contemporary topics that are source for much debate in the AI/Machine Learning community as well as his own projects:

  • Expanding application areas of Machine Learning e.g., anomaly detection techniques that can identify unusual transactions and present them to a human analyst for law enforcement or improving the management of forest fires in Oregon by applying reinforcement learning.

  • The impact of deep learning and its pros and cons with specific emphasis on the migration of academic talent on the brain drain caused by academics specialized on the topic migrating to large technology companies.

  • Interpretation of alternative future scenarios involving advanced AI systems, technological singularity and the (so called) superintelligence i.e., impact on humanity as a whole from economical, cultural and moral perspectives.

For extra credit, we also highly recommend the presentation below, which Professor Dietterich gave in Valencia at the The Age Of Machine Learning event sponsored by BigML. It does an excellent job of bringing everyone up to speed in understanding the roots and evolution of the discipline of Machine Learning and the future challenges facing technologists like us and the society as a whole. Enjoy!

Embracing Machine Learning: How to get two steps ahead of everyone else.


I am certain you have heard of Artificial Intelligence.

So, now that you have heard about it, you might be wondering what can Artificial Intelligence actually do for your company. Or is it just all hype?


Well a lot of it is hype – I’m looking at you killer robots. As Andrew Ng said, “Fearing a rise of killer robots is like worrying about overpopulation on Mars”. But even when the discussion about AI is not dominated by unwarranted fears, there is still a misinformation epidemic which engenders unrealistic expectations. Consider, the state-of-the-art in AI is nowhere near as advanced as countless movies love to portray. Even something like the home assistant in “Why Him?” is easily still a decade away.

But it’s not all hype either. It is not a coincidence that the five biggest companies in the world are all in technology AND are all heavily investing in AI.

However, if you look closely, what these companies are mostly investing in is not killer robots but rather one aspect of Artificial Intelligence called Machine Learning. Essentially, Machine Learning allows you to program computers to do complex tasks using data instead of hard-coded rules.

And while Machine Learning isn’t really new, it’s only in recent years that computation has become cheap enough and data readily available enough combined with easy to use tools like BigML that have finally made Machine Learning practical.

So while the stories that grab the headlines are things like computers learning to drive or mastering Go well enough to beat a top human player, the applications of Machine Learning are much broader than this. For example, it is now possible to turn your company’s data into insights like:

  1. Predict if a customer will like a specific product (recommender).
  2. Tell you if a customer might cancel your service before they do (churn).
  3. Find fraudulent charges in a high volume transactional system (fraud).
  4. Predict the advertising method to which a specific customer will respond positively (marketing mix).
  5. Help you find the optimal price for an asset (sales).
  6. And more

It’s unlikely you are looking at that list and thinking “I don’t need any of that”. But even if I’m wrong, take note: Your company needs Machine Learning ASAP!

If you are not using it now, your competition most likely is, and that’s going to give them a quantifiable business edge that will be expensive to ignore. In some fields like finance, Machine Learning is even emerging as a requirement for certification.

Maybe you already knew that. You are reading a Machine Learning blog post after all! Even better, maybe you are currently designing your company’s Machine Learning initiative. If so, congratulations!

But no matter where you are in the process of adopting Machine Learning, there is one critical thing you need to know if you want to avoid a mess of false starts and get ahead of everyone else.

Your Company Needs a ML Platform!

Yes, like BigML, of course. But here’s why:

Adoption Speed

The very idea of a platform, if implemented correctly, is that it is designed so that all aspects of the workflow are accounted for and work well together.

We worked with a mobile developer that wanted to build a recommender for their mobile application. From the time they emailed us with a few questions until they had a working system with BigML was three days.

Yes, three days and it was in production!

Compare this to other companies I’ve spoken with who decided instead to roll their own solution with open source tools – the timeline tends to be more like one year. And even then, once they finally have it working, they are not done, because there is no easy way to put the models they have hand crafted into production.

To be clear, open source is NOT the problem itself. There’s lots of great open source tools, in fact BigML relies on open source as well for certain aspects. The problem is that the open source tools are typically focused on one thing, so you end up with a big puzzle of incompatible pieces that have to be glued together with custom code that likely won’t stand the test of time.

And you get to write the glue.

With BigML, you get powerful Machine Learning, a full API, visualizations that make exploration and rapid prototyping easy, and white-box models that can be quickly put into production.

Everything is already put together, saving you all that time and any future headaches due to accumulating technical debt.


You are probably familiar with Pareto’s Principle. It comes up a lot, and I assure you it applies to your employees as well. You know that Machine Learning solution your team assures you will be no problem to put together even though they’ve already been working on it for six months with very little to show?

Well, Pareto’s Principle would warn us that 80% of that system has been produced by just 20% of your team. So, in a team of five people, there is almost assuredly one critical employee. The one person that when they leave, no one left on the team will know how to finish or maintain it.

You can probably already picture who that one person is.

And guess what? That one person is in huge demand. Are you certain you can keep them long enough to finish the project?

For that matter, are you sure what they are building has been tested?

When you adopt a Machine Learning platform, like BigML, you are joining a community of more than 45,000 users in over 120 countries, who have built 100’s of millions of models. You can rest easily knowing that our platform has been thoroughly tested and has survived the ravages of real world data in all it’s varieties.

We’ve even started educating the world through the first Machine Learning engineer certification program, free Summer Schools and our educational program, which brings free ML and other perks to universities around the world.

This means that every day it gets easier to find someone who knows how to use BigML.

And of course, we have an awesome support team. Come try us out.

Ease of Use

Speaking of a roll-your-own solution, even if it works, how many of your employees are going to understand how to use it?

If you are thinking “No problem, we have a team that will build all the models”, then you are missing a critical aspect of the future of Machine Learning.

Machine Learning needs to be for everyone.

Machine Learning is the modern spreadsheet for massive data, and everyone needs to be able to use it. You wouldn’t hire a team of Excel experts and expect all of your company spreadsheets to be managed by only them, right?

And BigML is no longer the only company that foresees this:

That should be the vision of your company as well. Except, you don’t need to spend millions of dollars and years of research to build your own easy-to-use ML tools.

BigML was founded in 2011, and from the beginning we believed that Machine Learning needs to be simple enough for everyone to use.

It is a core principle of everything we do.


Speaking of everyone using Machine Learning, adopting a Machine Learning platform has another significant advantage; it makes it easy to collaborate.

Resources, like models, can be shared with a secret link making it possible to send someone a URL that when clicked on them lets them interact with the model you built and then just as easily use it to make predictions.

Commonly used resources, like a dataset, can also be shared in a gallery making it possible for a small team to curate data and then share it for everyone to use. In a private deployment, which allows your company to use BigML in a private cloud or even on-premises, these resources can be shared privately with everyone within your organization.


Despite how easy BigML makes Machine Learning, there are often other steps that need to be performed, like transforming your data, filtering, augmenting with new features, etc. It is extremely rare that a real-world problem will be solvable without implementing a workflow composed of such steps.

The good news is that these workflows are often reusable, running the same series of steps over and over with new data. The bad news is that if you are rolling out your own solution, then you are rebuilding these workflows every time.

On the other hand BigML has created tools like Flatline a data transformation language, and WhizzML a workflow automation language, that make it possible to separate the workflow logic from the data.

The beauty of this is that these workflows can then be easily shared and reused, extending the functionality of the platform.

WhizzML is a really big deal!


In the early phases of a Machine Learning initiative, it’s easy to get bogged down in the unknowns that cloud your path:

  • What problems do you want to solve?
  • What data do you have?
  • What data do you need?
  • Will it even work?
  • How will you measure success?

With all of these questions, it’s super easy to overlook something even more important: Once you build a solution, how will you automate all the steps?

The importance of automation can not be understated.

Your data is not static, it will change, and you need to build a system that can adapt along with it.

I remember talking to a telecommunications company that already had a process in place for building models to solve a particular marketing question. The problem was, most of the steps required manual processing. We asked how long it took to refresh the models, and the answer was six months!

By the time they built the models, they were no longer relevant!

We explained, as gently as possible, that by automating the entire workflow with BigML they could refresh the models every day if they wanted.

This is possible because, as a platform, we’ve already built an API into BigML. In fact, our API is in some ways our core product. Even our beautiful UI uses our API, the same one we expose to customers.

This means that every single thing you can do in BigML, you can do programmatically. And we even provide bindings in many languages to make it as easy as possible to get started programming.

Handling Resistance

Hopefully after reading the previous section, you can see the benefits of bringing a Machine Learning platform, like BigML, to your company. However, now I have to warn you about something.

We find that this platform message resonates with people who are innovators, the doers in a company that want actionable results, and more often than not the people who are specifically tasked with evaluating new technologies for their company.

This is because these are the very people that can see past the hype, past the excitement of the latest greatest tool and understand the bigger picture. And they understand the benefits that adopting a platform like BigML can bring to their company.

However, not everyone understands this yet, it’s still the early days of Machine Learning. It reminds me of the early days of e-commerce sites, when everyone who wanted a shopping cart would hack together some CGI and HTML into a custom system. And the people that could code those monstrosities were in high demand, and paid handsomely for their effort. Sound familiar?

But who does that now? Well, no one.

That’s because the entire process has been commoditized. And this is a good thing, because those early days saw a lot of repetitive work and wasted time. The same thing is happening with Machine Learning right now.

And if someone is in the trenches, wrangling custom solutions, and you come along and say “stop that, let’s use a Machine Learning platform instead”, you are threatening their existence.

Even worse, playing with the latest and greatest tools is more fun than solving business problems, like actually answering the question “how can we improve our conversion by 10%?

But it should be clear which choice is more important to the success of your company.

This resistance will change eventually, but by then everyone will be using Machine Learning and you will be counted among the laggards.

If you want to accelerate your company past the Machine Learning’s Wild West stage and solidify your competitive edge NOW, then…

…you need BigML’s Machine Learning Platform to get off to a great start in your journey.

3rd Valencian Summer School in Machine Learning is on the Horizon!

Since 2011, BigML has been at the forefront of the Machine Learning revolution, which is now in full swing. Big and small companies alike are seeing the tremendous payoff from using Machine Learning to make data-driven decisions. As all industries become transformed by predictive applications, BigML’s mission to democratize Machine Learning is more relevant than ever.


In our continued effort to make Machine Learning beautifully easy for everyone, BigML is proud to announce the third edition of our summer school, which will take place on September 14 and 15 in Valencia, Spain. This two-day event provides a hands-on, crash course introduction to Machine Learning for advanced students, industry practitioners, and business leaders. BigML provides the foundation you need in Machine Learning concepts, tools, and techniques to become a master of your data. Through two days of intensive training, you will gain practical experience with real datasets and learn to make your first predictive application. This event serves as a good introduction to the kind of work students can expect when enrolling for a Machine Learning masters.

Last year’s event was very well received, having attracted a global audience with over 220 applicants, 140 attendees representing 53 companies and 21 academic organizations from 19 different countries. We are looking forward to break last year’s record with this new edition.

The summer school 2017 costs a nominal fee of 30 Euros and is by invitation only. The deadline to apply is August 15. Applications will be processed as they are received, and invitations will be granted right after individual confirmations to allow for travel plans.

Ready to begin your Machine Learning journey with this fun, challenging event? Please register soon since space is limited. Given the success of our previous summer schools in Valencia, Spain and São Paulo, Brazil, we are greatly looking forward to this upcoming event!

Anomaly Detection, Benchmarks, and WhizzML

Anomaly detectors are a useful tool for any machine learning practitioner, whether for data cleaning, fraud detection, or as early-warning for concept drift. While there are many algorithms for detecting anomalies, there is a lack of publicly available anomaly detection benchmark datasets for comparing these techniques.

This is what our Chief Scientist, Professor Tom Dietterich, and his research group at Oregon State University set to remedy with their paper ”Systematic Construction of Anomaly Detection Benchmarks from Real Data”. They devised a way to sample real-world supervised learning datasets so that they produce benchmarks that vary along three dimensions: point difficulty, relative frequency, and semantic variation. This blog post won’t dive into the details of those dimensions. But it lets us push the anomaly detectors to their limits in a variety of ways, giving us a robust set of tests for comparison. The benchmark datasets will have points labeled as “anomalous” and “normal”, so the detectors can be scored against them (specifically AUC).

Using the flexibility of WhizzML (our LISP inspired DSL for Machine Learning workflows), BigML has replicated their process. With the click of a button, a single supervised “mother” dataset can generate hundreds of children datasets defined by these dimensions and ready for benchmarking.

Before we generate these children datasets, however, we need to label some of the rows as “anomalous” and some as “normal”. This is the fundamental operation in transforming a supervised learning dataset into a anomaly detection benchmark. But how should we split the dataset? If a dataset has a numeric objective field, we can simply test whether the objective value is above or below the median; labelling one side as “anomalous” and the other “normal”. For binary classification tasks it’s even easier. One objective field value is chosen as “anomalous”, the other as “normal”.

Things are a bit more complicated if our objective has multiple classes. We could arbitrarily group classes into two sets and make those “normal” and “anomalous”. But we want to make our benchmarks datasets hard on our anomaly detectors. We can do this by ensuring that both the “normal” and “anomalous” groups have a diversity of classes assigned to them. In other words, we want to make sure that very similar classes are assigned to different groupings.

So how do we go about defining class similarity and assigning the classes to our “normal” and “anomalous” groups? To illustrate, let’s say we have a 10 class problem where the classes are recognizing handwritten digits (like MINST).

First we first train a model on part of the dataset and make an evaluation on the remaining portion. We use the confusion matrix from that evaluation to create a graph with edges that represent the confusion between each class. That is, each node of the graph is a class and each edge weight represents how often the classes were mistaken for each other. For our MINST-like example, we’d might end up with the graph below.


Now we create a maximum spanning tree for the graph:


And finally we two-color the graph to create our “anomalous” and “normal” groups:


By partitioning the graph this way, we’re helping to maximize the difficulty of our benchmarks. Hard to differentiate classes (frequently mistaken for one another) will be placed in separate groups.

This is only part of the overall process for generating benchmarks (our full implementation is here). But now that we’ve introduced this, we want to walk through the WhizzML function for finding a maximum spanning tree. It’s a good example of the flexibility WhizzML offers when building Machine Learning workflows.

WhizzML has no native graph object, so to do this we defined a graph as a map with keys “nodes” and “edges”. The value of “nodes” is the list of classes, and the value of “edges” is a list of edge maps, with keys “ends” (an ordered pair of nodes) and “weight” (the sum of the values found from the confusion matrix). Using a form of Kruskal’s algorithm we can find the maximum spanning tree. Here is the WhizzML implementation:

(define (max-span-tree graph)
  (let (edges (get graph "edges")
        nodes (get graph "nodes")
        sorted (reverse (sort-by-key "weight" edges))
        find-subgraph (lambda (node-id subgraphs)
                        (let (parent (get subgraphs (str node-id)))
                          (if (= parent -1)
                            (find-subgraph parent subgraphs)))))
    (loop (span-edges [] 
           subgraphs (make-map (map (lambda (x) (str (get x "id"))) nodes) 
                              (repeat (count nodes) -1)) 
           list-of-edges sorted) 
      (cond (= (count span-edges) (- (count nodes) 1)) span-edges
            (empty? list-of-edges) "Graph is disconnected"
            (let (edge (head list-of-edges)
                  rest (tail list-of-edges)
                  ends (get edge "ends")
                  subgraph1 (find-subgraph (head ends) subgraphs)
                  subgraph2 (find-subgraph (last ends) subgraphs))
              (if (= subgraph1 subgraph2)
                (recur span-edges subgraphs rest)
                (recur (append span-edges edge) 
                       (assoc subgraphs (str subgraph1) subgraph2)

Like many WhizzML scripts, the first line is the name of the function (max-span-tree) and the variables it takes (graph, a graph of the form just discussed). The next series of lines are the let statements, where variables just used in this function are created. We define “nodes” and “edges”, which are just the keys of the graph, then we define “sorted” as the edges sorted highest to lowest by weight. Since the sort function does smallest to largest by default, we need to reverse the list. Lastly, we define “find-subgraph”, which is itself a function.

Now we get to the meat of the script. It is a loop that takes three inputs (span-edges, subgraphs, and list-of-edges). The loop contains a conditional statement. If span-edges is one smaller than the number of nodes, then we are done and we return span-edges. If the list-of-edges is empty, the graph is disconnected and we can’t make a maximum spanning tree. Otherwise, if the ends of the first edge in the list-of-edges are part of the same subgraph, do the loop again with the tail of list-of-edges. If they are in separate subgraphs, do the loop again after adding that first edge to the span-edges, updating your subgraphs, and taking the tail of list-of-edges. Follow the Wikipedia link for a nice picture of the process.

This script demonstrates WhizzML is more than just a scripting language. It is a fully formed programming language, capable of doing complex calculations as well as creating and fetching resources. With this power at your fingertips, you can be confident that you can implement cutting edge Machine Learning research.

Calling Startups to Compete at the Sixth Edition of the AI Startup Battle at PAPIs Connect in São Paulo

PAPIs Connect 2017, Latin America’s 1st conference on real-world Machine Learning applications, is a series of localized events that run in between the annual PAPIs conference events, the International Conference on Predictive Applications and APIs. This year, PAPIs Connect goes to São Paulo, Brazil, on June 21-22, 2017, and will hold the Sixth Edition of our Artificial Intelligence Startup Battles.

The audience, mainly decision makers and developers interested in the latest technology to build real-world intelligent applications, will witness the power of PreSeries, a predictive application built on top of BigML’s Machine Learning platform that provides fact-based insights and many other investment and traction related metrics to help investors foresee which companies warrant a potential investment. PreSeries predictive models are trained with a diverse sets of public and private data on more than 370,000 companies worldwide.

The sixth edition of the AI Startup Battle is powered by PreSeries, the joint venture between Telefónica Open Future_ and BigML.

Apply now! 

If your startup uses AI as a core enabler, this battle is your chance to participate in the 6th edition of PreSeries’ AI Startup Battle. The startups selected to compete will be able to pitch on stage, make connections at PAPIs Connect, and get unique exposure among a highly distinguished audience attending Latin America’s 1st conference on real-world Machine Learning applications. The winner of the battle will be taken into consideration by the Wayra Academy and has a chance to get invited to Telefónica Open Future’s acceleration initiatives and services that include: training, coaching, a global network of talent, as well as the opportunity to reach many Telefónica enterprises in Brazil and abroad.


Thursday, June 21, 2017 at 4:30 PM BRST.


Telefônica Auditorium:  R. Martiniano de Carvalho, 851 – Bela Vista, São Paulo – SP, 01321-001, Brazil.

Application Deadline?

To compete in the battle, please fill out this form before June 11, 2017, and send a short presentation about your company (up to 2 MB) to

For more information on previous AI Startup Battles we recommend that you visit the dedicated page including highlights of all the battles performed so far.


Automating your Predictions using BigML and Zapier



At BigML we are always striving to provide our users with the best options to integrate machine learning into their workflows, be it through our bindings, available for the major programming languages, our BigMLer command-line tool, BigML for Google Sheet Add-on, or our proprietary Domain Specific Language WhizzML. Our latest effort in this direction is providing the means to integrate the BigML platform into Web-based automation tools such as Zapier, a web service that allows end users to create workflows across different web applications, including Google Drive, Salesforce, Gmail, and many others.

Today, we’d like to give you a preview of the new BigML Predict Action for Zapier, which will allow users to make predictions using Models, Ensembles, Logistic Regressions, or Clusters in their BigML account as part of a larger Zapier workflow. Imagine the following scenarios:

  • An IoT device monitors the insulin and glucose level, and the blood pressure of a patient and periodically, e.g., every hour, sends its data to some remote storage service, e.g. a Google Sheet or an email server. Whenever new data comes in, a Zapier trigger reads it in and passes it to the BigML Predict action, which will predict the likeliness of diabetes. The prediction outcome is then used to trigger the sending of an email in case the confidence of a diabetes diagnosis is higher than a given threshold.
  • An E-commerce service stores  all processed orders in Salesforce, along with the data about the  buyer, the payment, and any other significant data to describe the transaction, such as whether the delivery was disputed, the product was returned, a refund required, etc.  For each new order coming in, you could trigger a prediction using a BigML model to evaluate the likeliness of that transaction to fail for whatever reasons. If the prediction confidence is higher than a given threshold, you can use another Zapier action to flag the transaction in Salesforce as requiring ad-hoc tracking by a human controller.

These scenarios represent just a couple of possibilities on how to integrate the BigML Predict action in a Zapier workflow. The following images shows a simple Zapier workflow that polls a Google Sheet for new input data, uses those inputs to make a prediction by using a given model, then stores the prediction in a separate sheet.


Using Zapier UI you can easily map the data in your Google Sheet as input to a BigML Predict action:


Once you enable your Zapier workflow, it will check the input Google Sheet for new data and automatically store the prediction in a second Google Sheet:


The BigML Predict action for Zapier is available as a beta for any BigML customer that is willing to try it out. Just get in touch with us and we will provide you access to our new Zapier action.

%d bloggers like this: