Skip to content

Spanish Companies are Convinced: Machine Learning is the Present and the Future

As we have been covering in our previous blog posts, the premier Machine Learning event in Madrid, 2ML took place on May 11.  Having experienced a great day filled by presentations from distinguished Machine Learning experts, the 2ML audience of 300 Spanish decision makers are now keenly aware that the time to apply Machine Learning in their diverse set of industries has already arrived.  It’s increasingly obvious that companies disregarding this trend will be gradually losing their competitive edge in the coming years.

Panel with the speakers of 2ML’s morning sessions. From left to right: Sari Heinonen from Barrabés Napapiiri (moderator), Fernando Vidiella from Banco Santander, Poul Petersen from BigML, Ed Fernández from NAISS.IO, Susanna Pirttikangas from the University of Oulu) and Luis Martín from Barrabés.

For starters, Alexa (the virtual assistant by Amazon) gave the starting signal of Madrid Machine Learning to the audience gathered at the Fundación Pablo VI by explaining what Machine Learning is. Then came Carlos Barrabés, president of Barrabés.biz, and Francisco J. Martín, Co-Founder and CEO of BigML, who gave the opening remarks, emphasizing the importance of Machine Learning in the current economic climate.

During the morning sessions, Machine Learning experts such as Susanna Pirttikangas (University of Oulu), Ed Fernández (NAISS.IO), Poul Petersen (BigML) and Fernando Vidiella (Banco Santander) focused on the potential of Machine Learning by sharing their own experiences and explaining how to apply this technology according to the specific context and needs of each company. This was followed by BigML’s Poul Petersen, who showcased three live demos with Alexa. This was further testament to the emergence of a new generation of APIs serving as the building blocks for developers to seamlessly construct smart applications encompassing custom Machine Learning models.

Luis Martin’s (CEO of Barrabés.biz) speech strengthened the point that it is impossible to “abstract from the digital environment” today so companies need to adapt by employing Machine Learning in a way that augments existing human expertise.

In the afternoon, we had four parallel tracks that focused on the application of Machine Learning in select sectors: Finance; Telecommunications and Technology; Marketing, Sales and Sports; and Industry. In the Finance track, Jean-Marc Guillard (Stats4Trade), Auke IJsselstein (ABN AMRO) and Hans Hjersing Dalsgaard (Danske Bank) spoke about investments, Human Resources and credits. In Telecommunications and Technology, Joan Serra (Telefónica), Libor Morkovsky (Avast) and Darren Brown (VMware) discussed the topics of Deep Learning, malware and event analysis. In Marketing, Seamus Abshere (faraday.io), Raúl Peláez (FC Barcelona) and Nick Mote (Vacasa) talked about other sectors that are benefitting from insights driven by Machine Learning such as football or vacation rentals. In Industry, with Perttu Laurinen (Indalgo), Laura Viñolas (Gestamp) and Dragos D. Margineantu (Boeing) as speakers, key aspects such as Scalability were discussed.

For more on this Machine Learning event, please follow 2ML in Facebook and Google+.

The BigML Team and Barrabés, are already looking forward to the next edition!

Fact Based Human Resources at ABN AMRO Bank

Today’s edition of our blog post series written by the speakers giving talks at the upcoming 2ML event, covers how ABN AMRO applies Machine Learning in their Human Resource Management department. Auke IJsselstein, the HR Analytics Lead at ABN AMRO, provides a summary of his talk for us. To discover the insights from his full speech, we invite you to attend his presentation session at 2ML Madrid Machine Learning on May 11.

Exciting times for the Human Resource Management department at ABN AMRO! This month, the HR Analytics department, has launched their all new and improved proposition into the organization. For four years they have been servicing the HR organization and Senior Management teams with their products HR Analytics and Strategic Workforce Planning. The HR Analytics Team views the fact that their department has doubled in size this year as a huge compliment for their work and also as a sign that the organization strongly believes in the power of data analytics and working in a fact-based manner. During these four years of experience ABN AMRO has learned a great deal, sometimes the hard way, as they admit. With the new integral approach to Fact-Based Human Resource management, the HR Analytics Team feels that they can make good use of their lessons learned and, having optimized their products, processes and tooling, are able to focus fully on gaining impact on their business goals.

As from 2013, ABN AMRO has been performing predictive analyses within the HR field. The main focus is to provide management with relevant insights to make better decisions regarding how to optimize their workforce to be able to reach the bank’s goals. In the early days, they did this mainly through using classical forms of analysis such as multivariate regression algorithms. As from 2015, the HR Analytics Team was introduced to the BigML tooling by iNostix (now iNostix by Deloitte). They started using Machine Learning techniques such as Decision Trees, Random Forest and Clustering next to the more traditional analysis methods. This gives them more flexibility and knowledge to match analysis methods better to the business questions and the characteristics of data at hand.

The aim is, of course, to be able to predict certain business and HR performance outcomes, but even more important for the HR Analytics Team is to better understand the drivers of these predicted outcomes. Most of their research focuses on explaining the human factors in reaching company goals such as client satisfaction and financial performance. For example, when they find that cultural aspects (e.g. communication, collaboration or leadership style) have a direct impact on customer satisfaction, they are able to give advice where and how to focus culture and training interventions.

Another trend they are witnessing and exploring is that in the near future, performing (basic) analyses will no longer be solely reserved for analysts. With the development of analytical tools that combine complex algorithms with user friendly interfaces and comprehensive visualization possibilities, analysis slowly becomes reachable for non-expert users. The Human Resource Management department at ABN AMRO are not there yet, but they are already offering on-the-spot analyses sitting at the table with senior managers. Of course this comes with risks and requires a good preparation, but they believe this is the way analysis will evolve, insights becoming available for larger groups of people.

At ABN AMRO Bank, they could not have achieved their current level of maturity in the field of People Analytics without their partners from iNostix by Deloitte, who helped them with building new capabilities and performing analyses, and BigML, who introduced ABN AMRO into the world of Machine Learning providing the ML platform. Therefore, Auke IJsselstein, Lead HR Analytics at ABN AMRO, is proud to present at the 2ML conference their views on fact-based HR, their proposition to transform the HR organization with hard-learned lessons, and examples of how they were able to bring valuable insights to senior leaders through Machine Learning and data mining. 

Want to know more about ML in Finance?

You can visit the dedicated event page and register for it today, as 2ML will take place this week, on May 11 in Madrid, Spain. There will be two more companies in the finance sector showcasing how they apply Machine Learning: Stats4Trade and Danske Bank.

We hope to see you at #2ML17!

Bringing Machine Learning to the Vacation Rental Industry

This blog post is the second of our series of guest posts authored by the speakers to present at 2ML: Madrid Machine Learning in Madrid, Spain, on May 11. The first blog post covered how Stats4Trade applies Machine Learning to help active investment fund managers select stocks and make buy/sell decisions using a new software-driven approach.

This post introduces a different application area of Machine Learning, namely Marketing and Sales. Nick Mote, the Director of Innovation at Vacasa, will be presenting how his company determines the value of accommodations and the process to automate such tasks by using Machine Learning. Below there is a high level overview of his upcoming talk. For more on the presenters and topics, please reach out to the 2ML event site.

Much like a hotel operator, Vacasa provides housekeeping and marketing services to create a hotel-like experience across a diverse portfolio of more than 5,000 vacation rentals. However, unlike the hotel industry, Vacasa faces many challenges resulting from managing single properties that are not operating under the same roof.

For instance, pricing vacation rentals in the same market is much harder than pricing a hotel room, since Vacasa’s homes are geographically spread throughout the market. The old adage “location, location, location” is absolutely true when determining the value of accommodations. Neighborhood and proximity to attractions have a huge impact on pricing within a given market, and qualities such as direct beach access add much greater value compared to a similar home three blocks from the same beach.

A hotel also has the advantage of pricing mostly identical rooms, whereas Vacasa faces the challenging task of determining the value of many completely unique properties with non-obvious comparable properties. Two houses in the same neighborhood can have entirely different amenities and quality compared to each other, preventing you from easily grouping and learning from even similar units. For example, how does a six bedroom house on the outskirts of town compare to a one bedroom cottage on the beach? Finding a way to automate the consideration of these factors when setting pricing is no easy task, but a critical one for vacation rental managers to be able to tackle at a large scale.

At Vacasa, the analytics team knew that they could use Machine Learning to find high value correlations that can automatically be adapted to specific markets to predict the appropriate price for any given day of the year. In other words, the Machine Learning algorithms pick the relevant features that are most important in driving the optimal prices. Armed with this understanding, Vacasa launched the second version of their yield management algorithm, code named Alan (named after computer scientist Alan Turing) to automate their pricing of the entire portfolio of properties based on past and current market conditions.

However, Vacasa’s use of Machine Learning technology doesn’t stop at pricing. Machine Learning techniques also predict the time it takes to drive to and clean different sized homes, as well as to predict fraudulent transactions before they occur.

Want to know more about ML in Marketing and Sales?

Don’t miss the opportunity to learn more about how Vacasa is applying Machine Learning to the vacation rental industry at the upcoming 2ML event on May 11 in Madrid, Spain. If you are interested in these business sectors, you may also attend Faraday.io‘s presentation. Stay tuned for future blog posts.

If you haven’t yet done so, there’s still time to register for #2ML17. We hope to see you there!

Beating Benchmark Indices with Machine Learning

In less than two weeks, on May 11, many business and technical decision makers will gather at 2ML: Madrid Machine Learning in Madrid, Spain, to discover the impact that ML is going to have on their business. This ML event will be divided in two parts, keynotes before lunch time and four specific ML vertical oriented sessions after the lunch break, including: Finance, Telecom and Technology, Marketing, Sales, Sports, and Industry.

This guest post, written by Michael Allan Stafne, Co-Founder and Vice-President of Stats4Trade, and Jean-Marc Guillard, CEO at Stats4Trade, reveals some highlights of their presentation. If you are curious to hear the full story, join 2ML to attend the event and discover all the details of how S4T applies ML to help active investment fund managers select stocks and make buy/sell decisions using a new software-driven approach.

Imagine that you are a typical private investor. You wish to invest some of your money in stocks. However you are risk-averse. Therefore you want to minimize volatility and any potential losses while maximizing gains. What options are available to you?

Well, you could simply adopt a buy-and-hold approach invest your money in a low-cost passive fund that tracks a broad index. Such funds are cheap because they simply mimic indices’ portfolios. Or you could invest your money in a traditional active fund that strives to beat a benchmark index. The managers of such funds use a combination of experience and research in an attempt to select winning stocks. Finally you could adopt a “do-it-yourself” approach and trade stocks with your own expertise and research.

Nonetheless, and like any investment approach, each of these options comes with trade-offs. For example, the passive approach with index funds assumes ever-increasing markets over time; however this is not always a viable assumption in Japanese and some European markets. On the other hand the two active approaches rely on humans. Unfortunately though, humans (even seasoned fund managers) are frequently emotional, subjective and hence irrational, which often leads to poor investment decisions. For proof, just look at the rather dismal performance of actively-managed funds versus benchmarks. Moreover, active funds charge a wide variety of annual and one-time fees oftentimes quite exorbitant.

But what if there was another option that combines the diversification and low-cost of passive funds with the index-beating performance of active funds – all the while allowing private investors the freedom to design and simulate their own data-driven investment strategies and ultimately trade on their own? Thanks to modern Machine Learning tools and low-cost cloud-computing platforms, such an approach now exists.

The advent of modern Machine Learning technologies and cloud-based platforms now allows the development of data-driven software applications that automatically generate buy/sell signals for a wide range of equity markets. The applications allow individual investors the opportunity to design and simulate various investment scenarios and ultimately choose an optimal strategy that fits their risk-return needs. Investors can then choose to actively trade with no-fee brokerages.

The key to any Machine Learning approach is to statistically increase the odds of making a winning stock-trade from just above chance (e.g. 50%) to a higher level (say to at least 60%). Of course statistics implies uncertainty and some trades will indeed incur losses. Nonetheless, in the long-term with enough trades to become statistically relevant (at least thirty), this increase in odds yields better performance than benchmark indices with respect to both return and risk.

None of this is magic however. The process of developing trading applications that integrate machine-learning technologies requires much time in both acquiring and preparing input data and then building, training and testing data models that balance accuracy and consistency over different time-spans both short and long.

The general process starts with data acquisition in the form of historical price-data for stocks in a particular market like the Dow Jones 30 in the United States or the CAC 40 in France. Today price-data for many markets over multiple decades is widely available electronically at low cost and updated daily.

These price-data then undergo time-consuming transformation into time-implicit data-sets that convert the raw price-data into forms more conducive to multi-frequency statistical analyses. In fact the transformations are analog to the Fourier transformations that occur for digital signal analysis. During this step a balance must be struck between precision and consistency in the transformed data to prevent overfitting.

Next, statistical models are created using tools from recognized providers such as BigML. These models map the underlying and often obscure statistical relationships between the transformed price-data – called “training data” – and user-defined outputs such as stock-price movements over different forecast time-periods. In effect the models are trained to recognize the statistical relationships between prices at one period in time and price movements later in time.

Lastly, the models are tested against varying time-spans. The models are tested with “test data” in the form of price-data, which were not used to train the models. Overall the goal during testing is to refine the models with respect to precision and consistency over any time period and verify their validity. Only by conducting robust testing can the models be optimized for the two metrics, which most interest investors: return and risk (i.e., volatility and its statistical definition, variance).

The foregoing description for creating trading applications that embed Machine Learning technologies admittedly skims over several interesting technical details. However the ultimate objective for investors remains the same – namely increase the odds of making winning trades to at least 60%. Over time and with enough trades the relative frequency of gains to losses begins to increase. In doing so we are able to gradually and steadily outperform benchmark indices with higher returns and less risk as measured by variance.

Find out more details by watching this Stats4Trade – BigML Case Study:

Want to know more about ML in Finance?

Join us at #2ML17 on May 11 in Madrid, Spain, and discover the impact of Machine Learning in the financial sector presented by Stats4Trade. Other companies presenting their use cases will be ABN AMRO and Danske Bank, which we will also cover in upcoming blog posts.

If you have any questions about this use case and/or the Machine Learning behind it, please do not hesitate to contact BigML at info@bigml.com or STATS4TRADE directly. We’re looking forward to hearing from you!

2ML: Discover the Applications of Machine Learning for your Business

Machine Learning is fast impacting all business sectors. In fact, the right application of Machine Learning can significantly transform any business data into actionable insights, which helps enterprises grow their businesses as decision makers can more consistently make the right decisions at the right time.

The continuous evolution of predictive applications allow companies to foresee what’s about to happen next. Yet this represents half the challenge as businesses need to also decide what to do with the distilled insights. In some cases, highly skilled humans are replaced by machines that can perform certain complicated tasks better and more efficiently than humans. Other times, the application of Machine Learning in combination with humans can lead to the best outcomes.

To raise awareness about the optimal ways to incorporate Machine Learning in your business processes, the innovative consultancy Barrabés and BigML are co-organizing 2ML, the Machine Learning event for decision makers, technology professionals, and other industry practitioners who are interested in boosting their work productivity by applying Machine Learning techniques.

2ML will bring together 400 attendees that will gather to hear from some of the brightest minds in the Machine Learning field, such as Professor Thomas Dietterich or Susanna Pirttikangas, from University of Oulu, among others.

2ML Agenda

The conference will start with a global view of the origin, present and future of Machine Learning in the corporate environment, explaining how and why Machine Learning will have an impact on your business. In the afternoon, after the lunch break, we will continue with four parallel sessions on the application of Machine Learning in various industries such as Finance, Telecom, Technology, Marketing, Sales, Sports, and Industry. Each vertical will have two different presentations and a panel, where the speakers will discuss the impact of Machine Learning in their sector and will answer questions from the audience.

For more information on the talks, speakers and panels from the morning sessions and the four verticals, please visit the dedicated event page.

Want to know more about #2ML17?

Discover the impact that Machine Learning is going to have on your business and find out how you can take advantage of it to take your business to the next level. Join us at #2ML17 on May 11 in Madrid, Spain. Please note that purchasing your ticket before April 24 will get you a 30% discount!

PAPIs Connect 2017 – Call for Proposals

PAPIs Connect, Latin America’s 1st conference on real-world Machine Learning applications goes to São Paulo, Brazil, on June 21-22, 2017. We are now calling for proposals to select the best talks, ideas and applications that will be shown at the Telefonica Auditorium.

PAPIs Connect is a series of more localized events that run in between the annual PAPIs conference events, the International Conference on Predictive Applications and APIs. PAPIs Connect is followed by those decision makers and developers who are interested in building real-world intelligent applications and want to find out about the latest technology.

Are you passionate about technology and predictive applications? Would you like the world to know of your contribution to the practice of Artificial Intelligence? Then, this is your place to be!

We are always excited to see practical presentations about:

  • Innovative Machine Learning use cases
  • Challenges and lessons learnt in integrating Machine Learning into various applications / processes / businesses and new areas; this can include technical and domain-specific challenges, as well as those related to fairness, accountability, transparency, privacy, etc.
  • Techniques, architectures, infrastructures, pipelines, frameworks, API design to create better predictive / intelligent applications (from embedded to web-scale)
  • Tools to democratize Machine Learning and make it easier to build into products
  • Needs, trends, opportunities in this space
  • Tutorials that teach a specific and valuable skill

If you think you can be a good candidate, please submit your proposal before April 23, 12:00 AM (São Paulo BRT / GMT -3) and share your story with a great audience that will appreciate innovation as much as you do.

AI Startup Battle

The attendees at PAPIs Connect will also enjoy the 5th edition of our Artificial Intelligence Startup Battle, where there will not be any human intervention. The jury is an algorithm and will predict the probability of success of the early stage startups competing on stage. To know more about the format of the battle, please check our previous AI Startup Battles: the world premiere took place in Valencia, Spain, on March 2016 at PAPIs Connect, the second one in Boston, US, on October 2016 at PAPIs, the third one in São Paulo, Brazil, on December 2016, and the fourth one in Barcelona, Spain at the 4YFN.

These competitions have been powered by PreSeries, the automated platform to discover, evaluate, and monitor early stage investments, a joint venture between BigML and Telefónica Open Future_. We will announce all the details for the fifth edition soon. Stay tuned!

Previous PAPIs and PAPIs Connect Conferences

The PAPIs conference series started in November 2014 in Barcelona. Since then, PAPIs and PAPIs Connect have traveled around the world, providing interesting talks to a distinguished audience in Boston, Sydney, Barcelona, Paris and Valencia. The next stop will be São Paulo in June 2017 and Boston in October 2017. Please visit the PAPIs website for further details.

SAIC Motor takes a strategic stake in BigML

Today, we are happy to share BigML has secured a strategic investment from SAIC Capital, the corporate venture of SAIC Motor Corporation Limited (SAIC Motor) – the $110B company that leads the automotive design and manufacturing industry in China. As part of the investment, Tao Wang, Director of Investment, is joining BigML’s board of directors.

SAIC & BigML

This is an important milestone in BigML’s journey that started back in 2011 in Corvallis, Oregon, the home of Oregon State University (OSU). Since our inception, we have been making Machine Learning easy and beautiful for everyone by steadily advancing our platform. BigML now reaches over 40,000 analysts, developers and scientists that are discovering the hidden insights in their data and building intelligent applications in 120 countries around the world.

SAIC Motor’s investment in BigML further proves that leading global enterprises see Machine Learning as a key enabler of their future competitive performance. The future belongs to the predictive businesses with operations that are increasingly run by automated processes that are powered by Machine Learning. It is now clearer than ever that this is not a matter of choice, but an imperative across a broad swath of industries.

In 2016 SAIC Motor sold 6.4 million vehicles leading the Chinese market continuously.  SAIC Motor is a Fortune Global 500 company as its ranking had been rising to 46th place in 2015. It marked the 12th time that the company had made it onto the list of the world’s largest companies. Today, SAIC Motor has also set its sights on the automotive future that is being evolving through electrification, autonomous driving, and intelligence human interface, big data analytics capabilities more and more defined by Machine Learning.

About SAIC Motor

SAIC Motor Corporation Limited (SAIC Motor) is the largest auto company on China’s A-share market (Stock Code: 600104), SAIC Motor’s business covers the research, production and vehicle sales of both passenger cars and commercial vehicles. It also covers components including engines, gearboxes, powertrains, chassis, interior and exterior and miscellaneous electronic components, and logistics, vehicle telematics, second-hand vehicle transactions and auto finance services.  Major vehicle companies under the SAIC Motor umbrella include SAIC GM, SAIC VW, SAIC Motor Passenger Vehicle Company, SAIC Motor Commercial Vehicle Company, SAIC-GM Wuling Automobile Co and others. SAIC Motor’s North America business includes a division based in Michigan providing logistics and supply chain services, an investment division in Menlo Park, CA, and an advanced technology research and development center based in San Jose, CA.

BigML Winter 2017 Release Webinar Video is Here!

As announced in our latest blog posts, Boosted Trees is the new supervised learning technique that BigML offers to help you solve your classification and regressions problems. And it is now up and running as part of our set of ensemble-based strategies available through the BigML Dashboard and our REST API.

If you missed the webinar that was broadcasted yesterday, here you have another chance to follow our latest addition. In fact, you can play it anytime you wish since it’s available on the BigML Youtube channel.

Please visit our dedicated Winter 2017 Release page for more learning resources, including:

  • The Boosted Trees documentation to learn how to create, interpret and make predictions with this algorithm, from both the BigML Dashboard and the BigML API.
  • The series of six blog posts that guide you in the Boosted Trees journey step by step. Starting with the basic concepts of this algorithm and the differences between the other ensembles we offer; continuing with a use case and several examples of how to use Boosted Trees through the Dashboard, API, or how to automate the workflows with WhizzML and the Python Bindings; and finally wrapping up with the most technical side of how Boosted Trees work behind the scenes.

Many thanks for your attention, your questions, and the positive feedback after the webinar. We cannot wait to announce the next release!

The Down Low on Boosting

If you’ve been following our blog posts recently, you know that we’re about to release another variety of ensemble learner, Boosted Trees. Specifically, we’ve implemented a variation called gradient boosted trees.

Let’s quickly review our existing ensemble methods. Decision Forests take a dataset with either a categorical or numeric objective field and build multiple independent tree models using samples of the dataset (and/or fields). At prediction time, each model gets to vote on the outcome. The hope is that the mistakes of each tree will be independent from one another. Therefore, in aggregate, their predictions will come to the correct answer. In ML parlance, this is a way to reduce the variance.

decision-forest

With Boosted Trees the process is significantly different. The trees are built in serial and each tree tries to correct for the mistakes of the previous. When we make a prediction for a regression problem, the individual Boosted Trees are summed to find the final prediction. For classification, we sum up pseudo-probabilities for each class and run those results through Softmax to create final class probabilities.

boosting

Each iteration makes our boosted meta-model more complex. That additional complexity can really pay off for datasets with nuanced interactions between the input fields and the objective. It’s more powerful, but with that power comes the danger of overfitting, as boosting can be quite sensitive to noise in the data.

Many of the parameters for boosting are tools for balancing the power versus the risk of overfitting. Sampling at each iteration (BigML’s ‘Ensemble Sample’), the learning rate, and the early holdout parameters are all tools to help find that balance. That’s why boosting has a lot of parameters and the need to tune them is one of the downsides of the technique. Luckily, we have a solution on the way. We’ll be connecting Boosted Trees to our Bayesian parameter optimization library (a variant of SMAC), and then we’ll describe how to automatically pick boosting parameters in a future blog post.

Another downside to Boosted Trees are that they’re black box. It’s pretty easy to inspect a decision tree in one of our classic ensembles and understand how it splits up the training data. With boosting, each tree fits a residual of the previous trees, making them near impossible to interpret individually in a meaningful way. However, just like our other tree methods, you can get a feel for what the Boosted Trees are doing by inspecting the field importances measurements. As part of BigML’s prediction service, not only do we build global field importance measures, we also report which fields were most important on a per-prediction basis.

importance

On the advantageous side, BigML’s Boosted Trees support the missing data strategies available with our other tree techniques. If you have data that contains missing values and if those have inherent meaning (e.g. someone decided to leave ‘age’ unanswered in a personals ad), then you may explicitly model the missing values regardless of the field’s type (numeric, categorical, etc.). But if missing values don’t have any meaning, and just mean ‘unknown’, you can use our proportional prediction technique to ignore the impact of the missing fields. This technique is what we use when building our Partial Dependence Plots (or PDPs), which evaluate the Boosted Trees right in your browser to help visualize the impact of the various input fields on your predictions.

pdp

We think our Boosted Trees are already a strong addition to the BigML toolkit, but we’ll continue expanding the service to make it even more interpretable via fancier PDPs, easy to use with parameter optimization, and more powerful thanks to customized objective functions.

Want to know more about Boosted Trees?

We recommend that you visit the dedicated release page for more documentation on how to create Boosted Trees, interpret them, and predict with them through the BigML Dashboard and the API; as well as the six blog posts of this series, the slides of the webinar, and the webinar video.

Boosted Trees with WhizzML and Python Bindings

by

In this fifth post about Boosted Trees, we want to adopt the point of view of a user who feels comfortable using some programming language. If you follow this blog, you probably know about WhizzML or our bindings, which allow for programmatic usage of all the BigML’s platform resources.

Screen Shot 2017-03-15 at 01.51.13

In order to easily automate the use of BigML’s Machine Learning resources, we maintain a set of bindingswhich allow users to work with the platform in their favorite language. Currently, there are 9 bindings for popular languages like Java, C#, Objective C, PHP or Swift. In addition, last year we released WhizzML to help developers create sophisticated Machine Learning workflows and execute them entirely in the cloud thus avoiding network problems, memory issues or lack of computing capacity, while taking full advantage of WhizzML’s built in parallelization. In the past, we wrote about using WhizzML to perform Gradient Boosting and now we are making it even easier to perform with our Winter 2017 release.

In this post, we will show how to use Boosted Trees through both the bindings and WhizzML. In the case of our bindings example, we will use our popular Python binding, but the operations described here are available in all the bindings. Let’s wrap up the preambles and see how to create Boosted Trees without specifying any particular option, just with all default settings.  We need to start from an existing Dataset to create any kind of model in BigML, so our call to the API will need to include the ID of the dataset we want to use. In addition, we’ll need to provide the boosting related parameters. For now, let’s just use the default ones. This is achieved by setting the boosting attribute to an empty map in JSON. We would do that in WhizzML as below,

creation_wz.png
where ds1 should be a dataset ID. This ID should be provided as input to execute the script.

It’s the same way that you should create a decision tree ensemble, with the difference being the addition of the “boosting” parameter.

In Python bindings the equivalent code is:

creation_py

Let’s see now how to customize the options of Boosted Trees. To have a list of all properties that BigML offers to customize gradient boosting algorithm, please visit the ensembles page in the API documentation section. In a WhizzML script, the code should include the settings we want to use in a map format. For instance, if we want to adjust all available properties the code should be:

creation_opts_wz

The equivalent code in python bindings would read:

creation_opts_py

Creation arguments in Python bindings are structured as a dictionary. They are consistent with the natural dictionary representation of JSON objects in the language.

When we were talking about creating Boosted Trees, we explained some applicable parameters that can help you improve your results by proper tuning. It’s very easy to evaluate your Boosted Trees either through WhizzML or the Python bindings: you just need to set the ensemble to evaluate and the test dataset to be used for the evaluation.

eval_wz.png

Similarly, we can use the Python syntax as follows:

eval_py.png

Next up, let’s see how to obtain single predictions from our Boosted Trees once we are past the evaluation stage. For this, we need the ensemble ID and some input data that should be provided with “input_data” parameter. Here’s an example:

s_pred_wz.pngThe equivalent code in Python Bindings would be:

s_pred_py.png

In addition to this prediction, calculated and stored in BigML servers, the Python bindings allow you to instantly create single local predictions in your compute. The Ensemble information is downloaded to your computer the first time it is used, and as  predictions are computed in your machine, there are no additional costs or latency involved. Here is the straightforward code snippet for that:

l_pred_py.png

You can create batches of local predictions by using the predict method in a loop. Alternatively you can upload the new data set you want to predict for to BigML. In this case, results will be stored in the platform when the batch prediction process finishes. Let’s see how to realize this latter option first in Python:

b_pred_py.png

The equivalent code to complete this batch prediction by using WhizzML can be seen below:

b_pred_wz.png

A batch prediction comes with configuration options related to the inputs format such as the fields_map, which can be used to map the dataset fields to the ensemble fields especially if they are not identical. Other options affect the output format, like header or separator. You can provide any of these arguments at creation time following the appropriate syntax described in the API documents. We recommend that our readers check out all batch predictions options in the corresponding API documents section.

We hope this post has further encouraged you to start using WhizzML or some of our bindings to more effectively analyze and take action with your data in BigML. We are always open to community contributions to our existing bindings or to any new ones that you think we may not yet support.

Don’t miss our next post if you would like to find out what’s happening behind the scenes of BigML’s Boosted Trees.

To learn more about Boosted Trees or to direct us your questions about WhizzML or the bindings, please visit the dedicated release page for more documentation on how to create Boosted Trees, interpret them, and predict with them through the BigML Dashboard and the API; as well as the six blog posts of this series, the slides of the webinar, and the webinar video. 

%d bloggers like this: