If you’re a data scientist or an aspiring one, you may sometimes have had this question – is a more complex learning algorithm better for making predictions?
Let’s assume we have a supervised learning task at hand that involves predicting the average selling price of houses in different neighbourhoods of Toronto, projected over 5 years, 10 years, and so on. We have access to vast geographic, demographic, socioeconomic, and historical data, from which we have extracted features that we consider to be important for influencing as well as determining real estate prices. We could think of these features as our independent variables or predictors, and the selling price as our dependent variable. Since we are trying to predict the average selling price of a house (i.e, a continuous value) as a function of a set of features, this is clearly a regression problem.
What next? Should we start with some deep learning, which has been getting a lot of attention over the last few years, or support vector regression, in an attempt to achieve maximum performance? To address this question, we need to look at two things (a) the size and quality of our data set, and (b) our end goals.
Much has been written about the importance of data in the context of machine learning. If we have good data and a sizeable amount of it, we do not need complex algorithms. More data allows us to see more. Essentially, we can perceive clearer patterns in the data beyond the noise, which in turn could guide us towards simpler algorithms for modeling the data, thus obviating the need for something more complex. In other words, complex algorithms may not provide us with a good return on investment, as regards performance, in comparison to simpler ones that require fewer assumptions to be made. Garrett Wu expands on these points in more detail in this blog post.
Speaking to the second point, what are our end goals? In the context of our example, are we interested in good prediction performance alone or do we want to capture relationships between features and the selling price of a house? If performance is what really matters to us, then complex nonlinear methods offer more flexibility in finding a function that fits our training data better (assuming we have taken measures to control for overfitting). However, this comes at the cost of interpretability. The more complex the method, the more likely it is that we have lost track of how our features relate to the dependent variable. But if we wish to understand how the number of up-and-coming retail stores in a neighbourhood affect the home prices in that neighbourhood, a simpler algorithm such as linear regression is way more powerful.
So, coming back to our question, is a more complex learning algorithm better for making predictions? The answer depends on what our definition of “better” is, then finding the right balance between performance and interpretability, and choosing an algorithm accordingly. James, Witten, Hastie, and Tibshirani provide an excellent figure (Figure 2.7) and a helpful explanation, capturing the essence of this point, in Chapter 2 of their book titled, “An Introduction to Statistical Learning”, freely downloadable here.