Skip to content

PreSeries Receives Partnership Award at Telco Data Analytics Conference

The Telco Data Analytics conference series is the perfect occasion to witness leading industry players showcasing their innovations, success stories and strategies for the future of telecommunications. It is the place to go to uncover new trends, network with potential partners and stay abreast of opportunities and challenges facing the market. The latest edition of the European tour took place in Madrid on October 25 and 26. Major operators like Telefonica, Orange, Swisscom and other industry players such as Huawei, EMC, ip.access, Netscout, Solvatio and SAP were in attendance.

One of the highlights was the award ceremony, where BigML won the Partnership Award for its collaboration with Teléfonica Open Future_ in creating PreSeries. Preseries’ mission is to take advantage of the latest innovations in Machine Learning to transform startup financing from its current subjective form into a highly objective data-driven practice.


Amir Tabakovic (VP Business Development at BigML) holding the prize

Thanks to Telco Data Analytics for this award. We are looking forward to establish an even more fruitful partnerships in the future!

First Summer School in Machine Learning in São Paulo!

A bit of context

Machine Learning is making its presence felt on the worldwide stage as a major driver of digital business success. A good proof of that was our recently completed second edition of the Valencian Summer School in Machine Learning celebrated last September 2016 in Spain. Over 140 attendees representing 53 companies and 21 academic organizations from 19 countries travelled to Valencia for a crash course in Machine Learning and it was a great success!


What are the next steps?

Encouraged by the level of interest and motivated by our mission to democratize Machine Learning, we continue spreading Machine Learning concepts with this series of courses. This time in São Paulo, Brazil, on December 8 and 9. BigML, in collaboration with VIVO and Telefónica Open Future_, will be holding the two-day hands-on summer school perfect for business leaders and advanced undergraduates, as well as graduate students and industry practitioners, who are interested in boosting their productivity by applying Machine Learning techniques.


All lectures will take place at the VIVO Auditorium from 8:30 AM to 6:00 PM BRST during December 8 and 9. You will be guided through this Machine Learning journey starting with basic concepts and techniques, and proceeding to more advanced topics that you need to know to become the master of your data with BigML. Check out the program here!

Special closure

To complete this Summer School, at the closure of the event on Friday, December 9, we will showcase real-world companies based on the Machine Learning technology. We will also present the Artificial Intelligence Startup Battle, where the jury is a Machine Learning algorithm that predicts the probability of success of early stage startups, with no human intervention.

If you are a startup with applied Artificial Intelligence and Machine Learning as a core component of your offering, submit your application to compete in the AI Startup Battle! Read this blog post for more details.

Join us at the Brazilian Summer School in São Paulo!

The Brazilian Summer School in Machine Learning is FREE, but by invitation only. The application deadline is Friday, December 2, by 9:00 PM BRST. Applications will be processed on an as received basis, and invitations will be granted right after individual confirmations to allow for travel plans. Make sure that you register soon since space is limited!


Co-organized by:


AI Startup Battle coming to São Paulo on December 9, at the BSSML16

The Artificial Intelligence Startup Battle where the jury is an algorithm keeps on traveling around the world! We are glad to announce our next stop: São Paulo, Brazil, on December 9, 2016. This will be the third edition of a series of very exciting battles, where five AI startups compete with each other and are judged by a Machine Learning algorithm, with no human involvement.

The world’s premiere Artificial Intelligence Startup Battle took place in Valencia, Spain, last March 2016, when Telefónica Open Future_ and BigML presented PreSeries to the world. The reaction of the audience was extremely positive! So much that the PreSeries team and Telefónica Open Future_ decided to continue with a series of battles spread worldwide. That is why we launched the second edition in Boston last October 12 at PAPIs ’16. But we still want more! This third time in São Paulo, Brazil.

Apply to compete!

The worldwide interest in Machine Learning makes Brazil’s economic capital São Paulo the perfect South American stop to organize the next AI Startup Battle. If your startup applies Artificial Intelligence and Machine Learning as a core component of your offering, we invite you to compete in the battle! Submit your application and if you are selected, you’ll be able to pitch on stage, make connections at the conference, and get unique exposure among a highly distinguished audience.

Five Artificial Intelligence startups will be selected to present their projects on stage on December 9 at the closing of the Brazilian Summer School in Machine Learning 2016. These five startups will be automatically judged by the Machine Learning algorithm of PreSeries that will predict the probability of success for each startup.

The winner of the battle will be invited to Telefónica Open Future_’s acceleration initiatives to enjoy the access to the Wayra Academy (up to six months) and to Wayra services and contacts, such as training, coaching, a global network of talent, as well as the opportunity to reach many Telefónica enterprises, in Brazil and abroad. After six months the winner company will be evaluated and may apply to run for a full Wayra acceleration process, including up to US $50,000 convertible note loan (versus a possible 7 to 10% equity).

Competition details


  • Artificial Intelligence Startup Battle.


  • Friday, December 9, 2016 from 06:00 PM to 07:30 PM BRST.


  • VIVO Auditorium (Telefónica Building). Av. Engenheiro Luís Carlos Berrini, 1376 – Brooklin, São Paulo – SP, 04571-000, Brazil.


The Machine Learning technology behind PreSeries

The battle is powered by PreSeriesa joint venture between Telefónica Open Future_ and BigML that uses a diverse set of public and private data about early stage startups to find patterns that help investors foresee which companies warrant a potential investment. PreSeries predictive models are built on top of BigML’s Machine Learning platform and facilitate agile data-driven decisions in combination with the worldwide funding experience of Telefónica Open Future_.


The AI Startup Battle will meet its select audience in a very innovative context, the Brazilian Summer School in Machine Learning 2016. This event is part of a series of Machine Learning courses organized by BigML in collaboration with VIVO and Telefónica Open Future_. Read this blog post for more details.

BSSML16 is a two-day course for industry practitioners, advanced undergraduates, as well as graduate students seeking a fast-paced, practical, and hands-on introduction to Machine Learning. The Summer School will also serve as the ideal introduction to the kind of work that students can expect if they enroll in advanced Machine Learning masters.

BigML Certifications are Here!

At BigML, we believe that the best way to add business value is by showing and not just telling what is possible via Machine Learning techniques. This has been the main reason why we prefer to give our user community free and easy access to our full-featured platform without having to fill out endless online forms to even get a glimpse. BigML has also been practicing what it preaches on the “getting hands-on” front when it comes to actively helping our customers launch their first Machine Learning use cases built on our platform.

BigML Certifications

As happy as we are in seeing customers grow the application areas of Machine Learning in their organizations, we can’t help but notice that many more customers are requesting BigML to get involved in their projects.  With this heightened awareness, BigML has set its sights to systematically address the need to certify our partners, which has led to today’s announcement of BigML Certifications.  This is a great opportunity for BigML partners to demonstrate their mastery of the rapidly growing BigML Machine Learning-as-a-Service platform while further differentiating themselves from competitive analytical services organizations offering more arcane methods relying on traditional statistical analysis tools.

Not yet a BigML partner? Well, what are you waiting for?  Contact us today to find out more on how the new wave Machine Learning-as-a-Service platforms can help you deliver actionable insights and real-world smart applications to your clients in days or weeks, and not months or years!

BigML Certifications come in two flavors: BigML Certified Engineer and BigML Certified Architect. In order to be eligible to enroll into the BigML Certified Engineer courses you must show certain level of proficiency in Machine Learning, BigML Dashboard, BigML API, and WhizzML. The following getting started assets will help you to get up and running in no time: Tutorials, API documentation, and WhizzML.

BigML Certified Engineer

This certification track prepares analysts, scientists, and software developers to become BigML Certified Engineers. Topics covered include:

  • Advanced Data Transformations for Machine Learning (3 hours)
  • Advanced Modeling (3 hours)
  • Advanced API (3 hours)
  • Advanced WhizzML (3 hours)
  • EXAM (3 hours)

BigML Certified Architect

This certification track prepares BigML Certified Engineers to become BigML Certified Architects. Once you’ve successfully passed the BigML Certified Engineer Exam, you are eligible to enroll into the BigML Certified Architect Courses. Topics covered include:

  • Designing Large-Scale Machine Learning Solutions (3 hours)
  • Measuring the Impact of Machine Learning Solutions (3 hours)
  • Using Machine Learning to Solve Machine Learning Problems (3 hours)
  • Lessons Learned Implementing Machine Learning Solutions (3 hours)
  • EXAM (3 hours)

Be sure to check out the certifications page for more on pricing and to pre-order yours.  As always, let us know if you have any special needs or feedback.

AI Startup Battle in Boston – The AI has spoken!

It is now well established that advances in Artificial Intelligence (AI) technology have opened up new markets and new opportunities. So much so, that hearing early-stage investors preaching how AI will automate everything is not a surprise nor a far fetched idea anymore. Even though, investors are keen to admit the disruptive power of AI, they still have a harder time admitting the same when it comes to the venture capital industry itself. The idea of automating early-stage investments is slowly taking ground in the VC community, but a lot convincing still needs to be done. Thus said, and knowing the industry’s appetite for competition, what a better way than organizing a startup contest with the jury being an AI? That is why we created the AI Startup Battle, our best attempt to show the world that even VCs can be disrupted one advancement at a time.

The last edition of the AI Startup Battle took place last Wednesday (Oct. 12 2016) during PAPIs ‘16, the 3rd International Conference on Predictive Applications and APIs. Four startups competed on stage at the Microsoft New England Research & Development Center (MIT Campus) and an impartial AI, powered by PreSeries and Telefónica Open Future_ chose the winner without having humans influencing the outcome.


Joonko, winner of the AI Startup Battle at PAPIs ’16 – Represented by Ilit Raz, CEO & CoFounder (left) and Guy Grinwald, CTO & CoFounder (right)

After being questioned live on-stage by the algorithm, the four startups were delivered a score from 0 to 100 representing their long-term likelihood of success. This edition’s winner with a total score of 89.24 is Joonko, which provides data-driven solutions to help companies improve the diversity and inclusion of their workforce. They will be offered an investment of up $50,000, an incredible place to work, access to mentors, business partners, a global network of talent as well as the opportunity to reach millions of customers.

The second place was for Cognii with a close score of 83.84, they are dedicated to improve the quality, affordability and scalability of education with the help of Artificial Intelligence. The third position went to Heartbeat Ai Technologies with a score of 70.73; they aim to design emotionally intelligent technologies and tools to help machines understand people’s feelings, needs and motivations. Finally the fourth position, was for Palatine Analytics with a score of 70.71; they help companies evaluate the current and future performance of their employees by using Artificial Intelligence and Predictive Analytics.


From left to right: Poul Petersen (CIO at BigML), Lana Novikova (CEO at Heartbeat Ai Technologies, Miguel Suarez (Strategic Advisor at BigML), Guy Grinwald (CTO & CoFounder at Joonko), Ilit Raz (CEO & CoFounder at Joonko), Archil Cheishvili (Founder at Palatine Analytics), and Dharmendra Kanejiya (Founder & CEO at Cognii)

Following the event, Francisco J. Martin, President of PreSeries, and Cofounder and CEO of BigML said “Today was further testament to the increasing level of interest in a quantifiable, data-driven approach to evaluating early stage startups. We have been continuously improving the models that make PreSeries possible as evidenced by the variety of questions ranging from team experience and depth to prior investor interest, as well as intellectual property and current traction. Many traditional investors were skeptical when we started this journey, but we are now witnessing that a growing number of institutional investors are starting to see the merit in PreSeries’ approach. It’s safe to say our ‘crazy idea’ will move onward with an emboldened spirit”.

Stay tuned for updates on the next AI Startup Battle!

AI Startup Battle in Boston – Meet the contenders!

If you are somewhat familiar with the world of startups today, you have probably noticed how startup competitions keep popping up all around the place. From the biggest competition to the more modest, every early-stage venture can now find its way under the spotlight. But despite their growing number, startup contests are for the most part still relying on the same approach. Meaning, carefully selected companies pitching in front of carefully selected juries.

By design, a competition’s result will reflect its jury’s subjectivity. Even when decades of research in psychology and behavioral economics show us that putting human bias at the center of a selection process might not be the best solution. For lack of better alternatives, humans are believed to be the best and only option. Yet when dealing with predicting the success of a startup, a jury will often provide you with as many opinions as there are members of the jury. In the end, the result is a consensus of opinions based on five minute presentations and handful of slides.


Luckily, there is still hope for a more scientific approach that does not remove the fun out of the competition! Our solution? The AI Startup Battle.

The AI Startup Battle is a startup contest powered by PreSeries, a joint venture from BigML and Telefonica Open Future_ with the objective of creating the world’s first platform to automate early stage investments. The second edition of the Battle will be taking place on Oct. 12 as part of the PAPIs ’16 conference on predictive applications and APIs. Join us at the Microsoft N.E.R.D. center on the MIT campus where you’ll see live real-world & high-stakes AI judging startups.

The first edition that was previously held in Valencia in March 2016 where PreSeries’ impartial algorithm crowned Novelti, a company that uses online machine learning algorithms to turn IoT data streams into actionable intelligence. Novelti will be presenting on stage this week and kickstart the contest for this year’s participants.

Let’s have a quick look at the contenders:

Cognii: they’re developing a leading edge assessment technology to evaluate essay-type answers for online learning platforms. Their exclusive natural language processing technology can also give customized feedback, not just a score, to engage students in an active learning process and improving their knowledge retention. Cognii’s solution is offered through an API for all online learning platforms, including LMS (Learning Management System), MOOCs (Massive Online Open Course), and more.



Joonko: The first data-driven solution for workforce diversity. It integrates into companies’ SaaS platforms and analyzes real actions in real time. The data collected is unbiased – this way, organizations can ensure that all employees get an equal opportunity to succeed in a safe, non-judgmental way. Diversity is a business problem, not just an HR one.


Palatine Analytics: Palatine helps companies evaluate the current and future performance of their employees by using AI and Predictive Analytics. With Palatine, you can collect reliable data points by incentivizing employees through Palatine’s real-time AI-driven feedback system capturing accuracy of evaluations, recognizing strengths and weaknesses of employees, and using predictive analytics to accurately predict their future performance.


Heartbeat Ai Technologies: The mission of Heartbeat Ai is to design emotionally intelligent technologies and tools to help machines understand people’s feelings, needs and motivations, and ultimately improve our emotional wellbeing. How? Language uniquely enables the differentiation of fine-grained emotions. The approach first teaches machines to understand fine-grained emotions from language and context. Then, it builds a broader understanding of human needs, desires and motivations.


Good luck to all participants!

BigML Summer 2016 Release Webinar Video is Here!

Many thanks for the enthusiastic feedback on BigML’s Summer 2016 Release webinar that formally introduced Logistic Regression to the BigML Dashboard. We had a number of inquiries from those that missed the broadcast, so we’re happy to share that you can now watch the entire webinar on the BigML Youtube channel:

As for more study resources, we recommend that you visit the Summer Release page, which contains all related resource links including:

  • The Logistic Regression documentation that goes into detail on both the BigML Dashboard and the BigML API implementations of this supervised learning technique.
  • The series of 6 blog posts covering everything between the basics to how you can fully automate your Logistic Regression workflows with WhizzML.

As a parting reminder, BigML offers a special education program for those students or lecturers that want to actively spread the word about Logistic Regression and other Machine Learning capabilities in their institutions. We are proud that we currently have more than 80 ambassadors and over 600 universities around the world enjoying our PRO subscription plans for FREE for a full year. Thanks for your hand in making the BigML community great!

Hype or Reality? Stealing Machine Learning Models via Prediction APIs

Wired magazine just published an article with the interesting title How to Steal an AI, where the author explores the topic of reverse engineering Machine Learning algorithms based on a recently published academic paper: Stealing Machine Learning Models via Prediction APIs.

How to Steal an AI

BigML was contacted by the author via email prior to the publication and within 24 hours we responded via a lengthy email that sums up our stance on the topic. Unfortunately, the article incorrectly stated that BigML did not respond. We are in the process of helping the author correct that omission. Update: the Wired article has now been updated and includes a short paragraph that summarizes BigML’s response. In the meanwhile, to set the record straight, we are publishing the highlights of our response below for the benefit of the BigML community as we take any security and privacy related issue very seriously:

WIRED Author:

“I’d really appreciate if anyone at BigML can comment on the security or privacy threat this [ways of “stealing” machine learning models from black-box platforms] might represent to BigML’s machine learning platform, given that it seems certain models can be reverse engineered via a series of inputs and outputs by anyone who accesses them on BigML’s public platform.”


  • Models built using BigML’s platform are only accessible to their owners who already have complete and white-box access to them, so this research does not expose or represent any security or privacy threat to BigML’s platform at all.

  • BigML’s users can access the underlying structure of their own models. This means that they can not only introspect their models using BigML’s visualizations but also fully download their model and use them in their own applications as they wish. BigML does not charge users for making predictions with their own models. So there is no need to reverse-engineer them as might be the case when you use Azure ML or Amazon ML. These services charge the owners of the models for making predictions with their own models.

  • BigML allows users to share models with other BigML users either in a white-box mode or in a black-box mode. In the latter case, if a user wanted to monetize her model by charging for predictions to another user, the user being charged might try to reproduce such model and avoid to continue paying for predictions. There is currently no BigML user charging for predictions. Again, this research does not expose or represent any security or privacy threat to BigML’s platform at all.

On Obviousness

  • Anyone versed in Machine Learning can see that many of the results of the publication are obvious. Any machine-learned model that is made available becomes a “data labeling API”, so it can, unsurprisingly, be used to label enough data to reproduce the model to some degree.  These researchers are focused on elaborate attacks that learn decision trees exactly (which does seem interesting academically), but far simpler algorithms will and always have been able to generate a lossy reproduction of a machine-learned model.  In fact, this is the exact trick that Machine Learning itself pulls on human experts: The human provides labeled data and the machine learns a model that replicates (to the degree made possible by the data) the modeling process of the human labeler.  It is therefore utterly unremarkable that this also works if it is a machine providing the labeled data.

  • As an instructive example, imagine you want to reverse-engineer the pricing strategy of an airline. It is unimportant how the model used by the airline was created; using a Machine Learning API,  an open source ML package, or a collection of rules provided by experts.  If one looks up at the price for enough flights, days, and lead times, one will soon have enough data to replicate the pricing strategy.

On Charging for Predictions:

  • BigML does not charge customers for predictions with their own models.  We think that this research might be relevant for services like Amazon ML or Azure ML, since they are charging users for predictions. Users of those services could try to reproduce the model or simply cache model responses to avoid being charged. Selling predictions is not a long-term money-making proposition unless you keep improving the classifier so that your predictions keep improving too. In other words, this shows how charging for predictions is a poor business strategy, and how BigML’s business model (charging for overall computational capacity to build many models for many different predictive use cases in an organization) is therefore more reasonable.

  • In BigML, this research would only be significant in the scenario where a BigML user publicly offers their trained model for paid predictions but wants to keep it secret.  We do not currently  have any customers exposing black-box models (except the ones created by these researchers).  But if that were the case, a user can guarantee that reconstructing the model will have a prohibitive cost by setting a higher price for each prediction.

On Applicability:

  • Some models are easier to reproduce while others are considerably harder. This research shows that their most elaborate method is only useful for single trees.  When the confidence level of a prediction is provided, the difficulty of the learning problem decreases.  However, when the models are more complex (such as Random Decision Forests) the process to replicate a model is not amenable to many of the techniques described in the paper, so models can only be approximated via the method we describe above.

  • If we wanted to offer a monetized black-box prediction platform in a serious way (and we are sure that we do not), we would encourage users to use complex models rather than individual trees. We can easily detect and throttle the kind of systematic search across the input space that would be required to efficiently reconstruct a complex classifier.

On Machine Learning APIs:

  • One thing is very clear to us though, Machine Learning APIs help researchers in many areas to start experimenting with machine-learned models in a way that other tools have never allowed. Mind you this is coming from a team with backgrounds in ML research. In fact, the research these folks carried out would be far more difficult to pursue using old-fashioned Machine Learning tools such as R or SAS that are tedious and complicated.

Finally, some comments in defense of other Machine Learning services that are potentially subject to this issue.

On Legality: 

  • We assume that to a researcher in security trying to find things on which to publish a paper, everything looks like a “security issue”. Putting things in the same category data privacy or identity theft issues makes them sound dangerous and urgent. However, the vast majority of the paper describes security issues closer in nature to defeating copy protection in commercial software, or developing software that functions exactly as an existing commercial product.  While this sort of security breach is certainly unfortunate and something to be minimized, it is important to distinguish things that are often dangerous to the public at large from those that, in the vast majority of cases, do not pose as big a threat.

  • Software theft and reverse engineering isn’t new or unique to Machine Learning as a Service, and society typically relies on the legal system to provide incentives against such behavior.  Said another way, even if stealing software were easy, there is still an important disincentive to do so in that it violates intellectual property law.  To our knowledge, there has been no major IP litigation to date involving compromise of machine-learned models, but as machine learning grows in popularity the applicable laws will almost certainly mature and offer some recourse against the exploits that the authors describe.

Logistic Regression versus Decision Trees

The question of which model type to apply to a Machine Learning task can be a daunting one given the immense number of algorithms available in the literature. It can be difficult to compare the relative merits of two methods, as one can outperform the other in a certain class of problems while consistently coming in behind for another class. In this post, the last one of our series of posts about Logistic Regression, we’ll explore the differences between Decision Trees and Logistic Regression for classification problems, and try to highlight scenarios where one might be recommended over the other.

Decision Boundaries

Logistic Regression and trees differ in the way that they generate decision boundaries i.e. the lines that are drawn to separate different classes. To illustrate this difference, let’s look at the results of the two model types on the following 2-class problem:

Decision Trees bisect the space into smaller and smaller regions, whereas Logistic Regression fits a single line to divide the space exactly into two. Of course for higher-dimensional data, these lines would generalize to planes and hyperplanes. A single linear boundary can sometimes be limiting for Logistic Regression. In this example where the two classes are separated by a decidedly non-linear boundary, we see that trees can better capture the division, leading to superior classification performance. However, when classes are not well-separated, trees are susceptible to overfitting the training data, so that Logistic Regression’s simple linear boundary generalizes better.

Lastly, the background color of these plots represents the prediction confidence. Each node of a Decision Tree assigns a constant confidence value to the entire region that it spans, leading to a rather patchwork appearance of confidence values across the entire space. On the other hand, prediction confidence for Logistic Regression can be computed in closed-form for any arbitrary input coordinates, so that we have an infinitely more fine-grained result and can be more confident in our prediction confidence values.


Although the last example was designed to give Logistic Regression a performance advantage, its resulting f-measure did not exactly beat the Decision Tree’s by a huge margin. So what else is there to recommend Logistic Regression? Let’s look at the tree model view in the BigML web interface:

Model 5768cee47e0a8d34dd0167bd | - Chromium_056

When a tree consists of a large number of nodes, it can require a significant amount of mental effort to comprehend all the splits that lead up to a particular prediction. In contrast, a Logistic Regression model is simply a list of coefficients:


At a glance, we are able to see that an instance’s y-coordinate is just over three times as important as its x-coordinate for determining its class, which is corroborated by the slope of the decision boundary from the previous section. An important caveat to this is in regards to scale. If for example, x and y were given in units of meters and kilometers respectively, we should expect their coefficients to differ by a factor of 1000 in order to represent equal importance in a real-world, physical sense. Because Logistic Regression models are fully described by their coefficients, they are attractive to users who have some familiarity with their data, and are interested in knowing the influence of particular input fields on the objective.

Source Code

The code for this blog post consists a WhizzML script to train and evaluate both Decision Tree and Logistic Regression models, plus a Python script which executes the WhizzML and draws the plots. You can view it on GitHub.

Learn more about Logistic Regression in our release page. You will find documentation on how to use Logistic Regression with the BigML Dashboard and the BigML API. You can also watch the webinar, see the slideshow, and read the other blog posts of this series about Logistic Regression.

Automating Logistic Regression Workflows


Continuing with our series of posts about Logistic Regression in this fifth post we will focus on the point of view of a WhizzML user. WhizzML is BigML’s popular domain specific language for Machine Learning, which provides programmatic support for all the resources you work with in BigML. You can use WhizzML scripts to create a Logistic Regression, or to create a prediction or batch prediction based on a Logistic Regression.  

Let’s begin with the easiest one: If you want to create a Logistic Regression with all the default values you just need to create a script with the following source code:


As BigML’s API is asynchronous, the create call will probably return a response before the Logistic Regression is totally built. Thus, if you want to use the Logistic Regression to make predictions, you should wait until the creation process has been completed. If you want to stop the code from processing until the Logistic Regression is finished you can use the “create-and-wait-logisticregression” directive.


To modify the default value of a Logistic Regression property you can simply add it in the properties map as a pair: “<property_name>” <property_value>. For instance, to calculate a Logistic Regression with a dataset that contains missing values, the normal default behavior in BigML is to replace them by the mean. However, if you want to replace them by zero you should add default_numeric_value and set it to “zero”. The source code will be as follows:


You can modify any configuration option in similar fashion. The BigML API documentation contains detailed information about those properties.

What if you have an existing Logistic Regression, and you want to get the code needed to recreate it with WhizzML? No problem, programmer or not BigML has a solution for you. Say you already tuned a Logistic Regression in BigML and you want to repeat the process on a new source that you just uploaded to the service. You can easily use the scriptify utility. This will generate a script that will run the exact steps needed to reproduce the Logistic Regression. Just navigate to the Logistic Regression you want to replicate and click on the “SCRIPTIFY LOGISTIC REGRESSION” link.


If you want to create a prediction from your Logistic Regression with WhizzML, the code is also short and easy. You just need the ID of the Logistic Regression you want to use and a collection of the new instances that you want to predict for, i.e. your input data. In the input_data collection the field ID is used as key.  Here’s an example:


In case you need to predict not for a single instance but for a set of new instances, you will need to create a batch prediction from your Logistic Regression by using WhizzML.


Once your source code is in place, how do you execute your script?

  • Using the BigML Dashboard, look for the script you just created. Opening the script view will reveal the available inputs, and you will be able to select their new values after which you can start the execution. For instance, the first script on this post looks as follows, and it expects you to select the dataset you want to create the Logistic Regression from.
  • If you want to execute the script through the API, you need to know the ID of the script you previously created. To follow the same example, the dataset you want to create a Logistic prediction for (input “ds1”) should be included in the list of inputs. The corresponding request to the BigML API should be as below:

    curl "$BIGML_AUTH"
           -X POST
           -H 'content-type: application/json'
           -d '{"script": "script/55f007d21f386f5199000003",
                "inputs": [["ds1", "dataset/55f007d21f386f5199000000"]]}'

These Logistic Regressions should execute swiftly, while you reach out for you coffee.

If you have any doubt or you want to learn more about Logistic Regression please check out our release page for documentation on how to use Logistic Regression with the BigML Dashboard and the BigML API. You can also watch the webinar, see the slideshow, and read the other blog posts of this series about Logistic Regression.

%d bloggers like this: