BigML’s upcoming release on Wednesday, January 31, 2018, will be presenting two new features: operating thresholds for classification models and organizations. In this post, we’ll do a quick introduction to operating thresholds before we move on to the remainder of our series of 6 blog posts (including this one) to give you a detailed perspective of what’s behind the operating thresholds part of the release. Today’s post explains the basic concepts that will be followed by an example use case. Then, there will be three more blog posts focused on how to use operating thresholds through the BigML Dashboard, API, and WhizzML and Python Bindings for automation, and finally we will complete this series of 6 posts with a technical view of how operating thresholds for classification models work behind the scenes. After that, we will conclude with an extra post on how your company can collectively benefit from our new organizations feature.
Understanding operating thresholds
Say you are building a classification model to predict loan risk to decide which loan applications should be approved and which ones should be denied as they represent a higher default risk than your company would be willing to underwrite. As usual, after many iterations and some clever feature engineering, your best model evaluation yields an F1-score of 0.85 (%85). Is this good or bad? Should you present your results to your management without hesitation? Of course, you can keep iterating forever, but this is real life and there are deadlines!
Well, in the case of most financial security portfolios even a few percentage points of defaults can turn an otherwise profitable outfit into a red ink generator. In this instance, chances are your management will lose some sleep over the use of Machine Learning in deciding the fate of their company. Is it better to scrap the whole idea of being more data-driven by using advanced analytical tools and going back to good old rule-based loan approvals then?
Not so fast! Luckily, operating thresholds can come to the rescue in a situation like this to fine tune your classification model to better integrate it based on your company’s risk appetite and the implied cost structure. In this particular example, assuming the “positive class” is bad loans (Approve? = No), your company has a much more important risk exposure in missing out bad loans when incorrectly predicting that they should be approved (i.e., False Positives) than it has in rejecting what would otherwise be good loans (i.e., False Negatives).
In the former case, you could lose hundreds of thousands of dollars (or even millions) with a single bad decision wiping away perhaps your entire loan portfolio profits. In the latter, you may be leaving some money on the table by turning away a good loan (i.e., opportunity cost) but that probably will be measured by thousands (or tens of thousands) of dollars depending on the loan amount. So there’s a magnitude of a difference between the two scenarios.
The trick then is to adjust the tradeoff between False Positives and False Negatives to lessen that chances of False Positives occurring by adjusting your model’s Operating Threshold values. This technique is especially useful for imbalanced datasets, where one or a few classes are the majority classes. As described in our loan risk example, unadjusted classification models tend to predict the majority classes at the expense of the minority (positive) class that is usually the class of interest. Keep in mind that the usefulness of thresholds is mainly about having false positive and false negative costs that are imbalanced, regardless of the class distribution. It so happens that imbalanced datasets in the real world tend to have asymmetric costs. But if you have a balanced dataset, the chance is still very high that you’ll need some kind of a threshold, because there’s a good chance the costs aren’t totally symmetric.
BigML lets you adjust thresholds easily with a simple slider in the evaluation view of models, ensembles, deepnets or logistic regressions. The best Operating Threshold value that minimizes your risk and costs can change from one model to another so it’s best to judge the appropriate values on a case by case basis. However, if you have a pretty good idea of the costs involved with each of the classes, the search for the optimal threshold can very well be determined automatically with WhizzML — more on this option will be covered in the remainder of our blog series.
Pick from three types of thresholds
In BigML, operating thresholds are applicable when evaluating or predicting with your models. Classification models in BigML always return a confidence and/or a probability for each prediction, i.e., a percentage between 0% and 100% that measures the certainty of the prediction. Deepnets also return a probability measure. When evaluating or predicting with your model based on either the probability or confidence threshold for a selected (positive) class the model only predicts the positive class if the probability or the confidence is greater than the threshold set, otherwise it predicts the negative class.
In the simple example above, without setting any threshold, just by looking at the probabilities for each predicted class, Applicants #1 and #3 would be granted a loan. However, if we select “Approve=NO” as the positive class and we decide to set a probability threshold of 30%, the loan applications from Applicants #2 and #3 will both be denied, and only Applicant #1 will be approved.
In addition to confidence and probability thresholds, for decision forests, you can also set a vote threshold, i.e., a threshold based on the percentage of models in the ensemble voting for the positive class. The type of threshold (confidence, probability or vote threshold) for non-boosted ensembles can be configured before creating your evaluation or when making predictions.
Want to know more about operating thresholds?
If you have any questions or you’d like to learn more about how operating thresholds work, please visit the dedicated release page. It includes a series of six blog posts about operating thresholds, the BigML Dashboard and API documentation, the webinar slideshow as well as the full webinar recording.