Logistic Regression versus Decision Trees
The question of which model type to apply to a Machine Learning task can be a daunting one given the immense number of algorithms available in the literature. It can be difficult to compare the relative merits of two methods, as one can outperform the other in a certain class of problems while consistently coming in behind for another class. In this post, the last one of our series of posts about Logistic Regression, we’ll explore the differences between Decision Trees and Logistic Regression for classification problems, and try to highlight scenarios where one might be recommended over the other.
Logistic Regression and trees differ in the way that they generate decision boundaries i.e. the lines that are drawn to separate different classes. To illustrate this difference, let’s look at the results of the two model types on the following 2-class problem:
Decision Trees bisect the space into smaller and smaller regions, whereas Logistic Regression fits a single line to divide the space exactly into two. Of course for higher-dimensional data, these lines would generalize to planes and hyperplanes. A single linear boundary can sometimes be limiting for Logistic Regression. In this example where the two classes are separated by a decidedly non-linear boundary, we see that trees can better capture the division, leading to superior classification performance. However, when classes are not well-separated, trees are susceptible to overfitting the training data, so that Logistic Regression’s simple linear boundary generalizes better.
Lastly, the background color of these plots represents the prediction confidence. Each node of a Decision Tree assigns a constant confidence value to the entire region that it spans, leading to a rather patchwork appearance of confidence values across the entire space. On the other hand, prediction confidence for Logistic Regression can be computed in closed-form for any arbitrary input coordinates, so that we have an infinitely more fine-grained result and can be more confident in our prediction confidence values.
Although the last example was designed to give Logistic Regression a performance advantage, its resulting f-measure did not exactly beat the Decision Tree’s by a huge margin. So what else is there to recommend Logistic Regression? Let’s look at the tree model view in the BigML web interface:
When a tree consists of a large number of nodes, it can require a significant amount of mental effort to comprehend all the splits that lead up to a particular prediction. In contrast, a Logistic Regression model is simply a list of coefficients:
At a glance, we are able to see that an instance’s y-coordinate is just over three times as important as its x-coordinate for determining its class, which is corroborated by the slope of the decision boundary from the previous section. An important caveat to this is in regards to scale. If for example, x and y were given in units of meters and kilometers respectively, we should expect their coefficients to differ by a factor of 1000 in order to represent equal importance in a real-world, physical sense. Because Logistic Regression models are fully described by their coefficients, they are attractive to users who have some familiarity with their data, and are interested in knowing the influence of particular input fields on the objective.
The code for this blog post consists a WhizzML script to train and evaluate both Decision Tree and Logistic Regression models, plus a Python script which executes the WhizzML and draws the plots. You can view it on GitHub.
Learn more about Logistic Regression in our release page. You will find documentation on how to use Logistic Regression with the BigML Dashboard and the BigML API. You can also watch the webinar, see the slideshow, and read the other blog posts of this series about Logistic Regression.