Predictive Lead Scoring is a crucial task to maximize the efforts of any sales organization.
There are a few applications on market today like Fliptop, KXEN, or Infer that already allow you to score your sales leads. These offerings have validated both the importance and market appetite for predictive scoring solutions.
However, you may want to have greater flexibility in choosing your CRM system, or perhaps you want to build your own predictive model to do the scoring, or you might want to integrate the scoring process within related services in your organization. In this post, I’m going to show you how to build an application to score your Salesforce.com leads using Talend Open Studio and BigML—three great tools working together to build a flexible predictive solution for a common business problem!
To complement this post I’ve created a complete step-by-step tutorial that will guide you on the implementation for this use case. No matter if you are a developer, a business analyst or a data scientist, this tutorial is made for you :-). At the end of each section of this post, you will find references to the related parts in the tutorial.
The Example
To illustrate the post, I am going to use a fictitious company named AllYouCanBuy that uses Salesforce and wants to prioritize their sales leads automatically. The objective is to provide AllYouCanBuy’s sales team with an automated solution that provides a panel like the one below where leads can be sorted by priority score. Each lead should be automatically labeled with a score so that the top priority leads (green bars on the picture) represent the leads with higher confidence of becoming customers.
To implement a fully automated solution, we need to accomplish the following tasks:
- Automatically generate scores for sales leads. I’ll solve this task with Machine Learning. We’ll use historical data on which leads converted into customers to predict future scores. We need an engine or service that allows us to programmatically build predictive models and generate predictions. Obviously, I am going to use BigML but another machine learning package could be used in a similar fashion.
- Automatically extract historical data from the CRM and return it with new scores. I’ll solve this task using an ETL tool. ETL tools are great helpers on the integration of disparate sources of data and services. They help Extract data from a multitude of sources including Salesforce, Transform them with using a programmable toolkit with many pre-built functions and techniques, and finally Load the results into other system or service. For this example, I am going to use the Talend Open Studio tool, from the Talend Company
The following picture shows a high-level architecture of the predictive lead scoring solution:
Next I’m going to describe each of the high-level components of the architecture with a little more detail, but first let me tell you about more Predictive Lead Scoring.
Predictive Lead Scoring
Predictive Lead Scoring is one of the most active fields today within the set of problems that can be solved using Machine Learning algorithms, and therefore with the use of BigML.
This technique seeks to improve the results obtained by sales teams during the qualification stages of their leads, helping them to predict leads that have a higher probability of success. This will improve sales results by focusing efforts first on the most important leads rather on those with lower chances of success—thereby helping teams organize their time and work. When one considers the cost associated with lead qualification it is very logical to spend time looking for ways to optimize the process.
Imagine a company that buys a database of 5,000 new leads. Without a tool that allows them to set priorities, they would be calling from first to last in the list without having any idea what will happen. Ad hoc intuition would be the guiding light on lead prioritization. What a waste of time, right?
However, with solutions like we are detailing here, you can solve this kind of business challenge quite easily—helping your sales team be more effective through the power of machine learning (as opposed to gut instinct).
Salesforce.com
AllYouCanBuy (our fictitious company) fortunately has been operating for some time and has been storing data about prior leads using Salesforce. Their data includes the following types of custom fields:
- Input fields: fields that hold information about each lead (city, sector, number of interactions, etc). These are the fields that we will use as input data for making prediction. This set of features can be as reach as you want and can use both specific data about the lead as well as data about the interactions kept with the lead.
- Fact field: this is field that the sales team has been using to label if a lead became a new customer or not. This is the label (in supervised learning parlance) that will be used to train and evaluate the predictive models. In a more sophisticated application, this field could be automatically computed.
- Output fields: these are the fields that we will use to store the output of the model, the confidence of the prediction returned by the model, and a priority field (the lead score). We compute the priority field based on the output of the model and its confidence. For example, leads with a ‘true’ prediction and a high confidence will become our top priority and leads with a ‘false’ prediction and high confidence will become our low priority leads. Priority fields will allow for a more user-friendly representation of the predictions.
In the tutorial, you can read more on how to create a Salesforce.com Developer Account, how create custom fields of leads object, how to customize leads objects in Salesforce.
BigML
BigML not only provides a great set of 1-click functions and visualizations for predictive modeling but also a REST API to programmatically run diverse sophisticated predictive modeling workflows.
To simplify the first version of our predictive lead scoring app, I am going to create the model directly in BigML and use it to make predictions. In a second iteration, I’ll automate the model creation too. So basically, we only need to export Salesforce data to a CSV file, upload the file to BigML and let it do the data modeling. I will use BigML’s 1-Click Ensemble to create a very robust predictive model ready to make predictions in a matter of seconds.
I the tutorial you will find more details on how to create an account in BigML and how to create a predictive model in BigML
Talend Open Studio
Talend Open Studio provides an extensible, highly-performant, open source set of tools to access, transform and integrate data from any business system in real time or batch to meet both operational and analytical data integration needs. It has more than 800 connectors, and can help you integrate almost any data source. The broad range of use cases addressed includes: massive scale integration (big data/ NoSQL), ETL for business intelligence and data warehousing, data synchronization, data migration, data sharing, data services, and now predictions!!!
We will use Talend Open Studio to not only perform the data transformations requested but also to communicate with Salesforce and BigML. The transformations will be only simple metadata mapping between the respective output data and input data of both services as in our case with AllYouCanBuy, all the information about our leads actually come from the same place: Salesforce. However, the transformations can be more sophisticated for more complex applications—for in the real world, you may want to get information from other internal and external sources that can help create richer predictive models.
Talend allows you to use a high-level visual component to design complex ETL processes without writing a single line of code! You can see what the Talend ETL process looks like below:
BigML has developed a Talend Component named tBigMLPredict that you can download here and incorporate in your own installation of Talend. This component will help you make predictions with a predictive model you have previously created in BigML.
Once installed, this component will be visible in the Palette of components, inside the Business Intelligence – BigML Components category.
The tBigMLPredict component allows us to set the following configuration parameters:

In the tutorial, you can read more on how to download and install Talend Open Studio, how to download and configure the BigML Talend Components, how to design the integration job in Talend, and how to execute it in Talend Open Studio.
Summary
We have outlined in this blog (and the tutorial + related documents!) how you can orchestrate a flow that automatically scores sales leads in Salesforce using Talend and BigML. It shouldn’t be difficult to create similar flows for other CRM services or using other ETL platforms–please let us know which ETL and CRM tools we should work on next!
I hope this post inspired you to start building your own predictive lead scoring application or more sophisticated predictive flows.