The recent thought: Regression vs. Classification

Regression vs. Classification

Machine Learning generates a lot of buzz because it's applicable across such a wide variety of use cases. That's because machine learning is actually a set of many different methods that are each uniquely suited to answering diverse questions about a business. To better understand machine learning algorithms, it’s helpful to separate them into groups based on how they work.

We’ve done this before through the lens of whether the data used to train the algorithm should be labeled or not (see our posts on supervised, unsupervised, or semi-supervised machine learning), but there are also inherent differences in these algorithms based on the format of their outputs. Looking at them this way, two popular types of machine learning methods rise to the top: classification and regression.

Classification

Classification algorithms are used when the desired output is a discrete label. In other words, they’re helpful when the answer to your question about your business falls under a finite set of possible outcomes. Many use cases, such as determining whether an email is spam or not, have only two possible outcomes. This is called binary classification.

Multi-label classification captures everything else, and is useful for customer segmentation, audio and image categorization, and text analysis for mining customer sentiment. If these are the questions you’re hoping to answer with machine learning in your business, consider algorithms like naive Bayes, decision trees, logistic regression, kernel approximation, and K-nearest neighbors.

Regression

On the other hand, regression is useful for predicting outputs that are continuous. That means the answer to your question is represented by a quantity that can be flexibly determined based on the inputs of the model rather than being confined to a set of possible labels. Regression problems with time-ordered inputs are called time-series forecasting problems, like ARIMA forecasting, which allows data scientists to explain seasonal patterns in sales, evaluate the impact of new marketing campaigns, and more.

Linear regression is by far the most popular example of a regression algorithm. Though it’s often underrated because of its relative simplicity, it’s a versatile method that can be used to predict housing prices, likelihood of customers to churn, or the revenue a customer will generate. For use cases like these, regression trees and support vector regression are good algorithms to consider if you’re looking for something more sophisticated than linear regression.

Choosing an algorithm is a critical step in the machine learning process, so it’s important that it truly fits the use case of the problem at hand. Make sure data scientists and business users align early on at your organization to avoid common pitfalls of building predictive models

The recent thought

Friday, March 9, 2018

Regression vs. Classification

Classification

Regression

No comments:

Popular Posts