## Uplift Modeling: Using machine learning to improve marketing strategies

By: Bronwyn Dumbleton

One of the key goals of a business’s marketing campaign is to maximise the number of products sold to customers, while minimising the cost of marketing. Machine learning can easily be used to predict how likely a customer is to purchase a product. Uplift modeling goes one step further by attempting to model the incremental effect that the marketing intervention has on a customer.

In this blog post, we give a high-level overview of uplift modeling. We’ll cover the following topics:

1. Propensity models vs uplift models
2. Methodology
3. Uplift modeling techniques
4. Challenges

### 1. Propensity models vs uplift models

By making use of a customer’s past behaviour, a propensity model can be built in order to predict their future behaviour. In other words, given a set of features x, the goal is to predict a target y. Thus, you could consider a propensity model as a “typical” predictive machine learning model.

This type of model is highly valuable for determining how likely a customer is to purchase a product or for identifying customers who are at risk of churning. Based on the outputs of the model, marketers are able to intervene accordingly. As an example, consider a telecommunications company that identifies a customer that is 80% likely to cancel their contract (churn). The company would then be able to intervene, maybe with an attractive offer, in the hopes of preventing the customer from churning.

However, propensity models do not give any insight into the influence that the intervention had on the customers’ likelihood of purchase or churn. Enter uplift!

Uplift models aim to determine how positively the customer is likely to respond given a campaign or treatment (Olaya, Coussement and Verbeke, 2020). That is, we want to estimate the impact that a treatment t had on the probability of purchase y, given a set of features x. Building on the example of the telecommunications company, the uplift model might show that the customer is 60% less likely to cancel their contract if they receive a campaign email. Figure 1 provides a visual comparison of a predictive and uplift model.

Figure 1. Comparison of the output of a propensity model and an uplift model.

The difference between a customer’s action with a treatment and without a treatment is predicted by the uplift model. These models attempt to answer the following kinds of questions:

• Was there a positive impact due to contact with the customer?  In other words, did the customer purchase the product as a result of advertising?
• Was there a negative impact due to contacting the customer? That is, did the probability of purchase decrease because of the contact.
• Was money wasted by contacting the customer as they were already planning on purchasing the product?

Figure 2. Uplift model matrix.

Figure 2 shows how customers can be divided into one of four groups. To ensure that a marketing campaign is as successful as possible, the marketers should focus their attention on customers who respond positively to interventions. This group of people are referred to as ‘Persuadables’. Marketers need to avoid spending time and money on customers who would purchase regardless of intervention (‘Sure Things’), people who do not purchase with or without intervention (‘Lost Causes’) as well as those who respond negatively to intervention (‘Do-Not-Disturb’).

### 2. Methodology

Ideally we want to determine the increase in the likelihood of a customers’ action with an intervention or treatment, compared to their action without the treatment. This is known as causal effect and is represented mathematically by

$$\tau_i = Y_i^1 – Y_i^0,$$

where  $$Y_i^1$$ denotes person $$i$$’s action when they receive a treatment and  $$Y_i^0$$ denotes person $$i$$’s action when they do not receive treatment.

The biggest challenge in uplift modeling is the fact that we can not both apply a treatment to a customer and not apply a treatment to see the results thereof. Thus, there is no ground truth for comparison.

In order to develop an uplift model, an experimental setting is therefore required, where individuals are randomly divided into treatment and control groups (Haupt, 2020). Treatment groups are targeted by the campaign, whereas control groups are not. The response of the individuals in each of the groups is recorded, and can be used to determine the response rate. The response rate is defined as the proportion of the number of individuals who responded to the campaign over the total number of individuals in the group (Devriendt, Moldovan and Verbeke, 2018). If the response rate for the treatment group is higher than that of the control group, the marketing campaign is a success.

Uplift models are designed to identify individuals who fall into the ‘Persuadable’ category. The model accomplishes this by using the information of the control group to explain the variation in the response rate.  The uplift model therefore estimates the conditional average treatment effect (CATE) as a function of the features. We can define CATE or uplift as the difference in response rate between the treatment and control groups. Mathematically, we have

\begin{eqnarray*}
\widehat{CATE} & = & uplift \\
& = & E[Y_i | X_i = x, W_i =1] – E[Y_i | X_i = x, W_i =0],
\end{eqnarray*}

where $$E$$ denotes the expected value calculated by the uplift model. Additionally,  $$W_i \in \{0, 1\}$$, with $$1$$ indicating that customer $$i$$ receives the treatment, and $$0$$ indicating that customer $$i$$ receives no treatment (control group). Finally, $$X_i$$ is a vector of features associated with customer $$i$$.

### 3. Uplift modeling techniques

Uplift modeling can be categorised into data preprocessing and data processing approaches, as seen in Figure 3 (Shevchenko and Elisova, 2020).

Figure  3. Uplift modeling techniques.

As expected, in the case of the data preprocessing approach, data and outcomes undergo pre- or post- processing methods and then typical learning methods can be applied. The two types of data preprocessing approaches are the transformation approach and the variable selection approach. Both these approaches work by first modifying the data in some way, and then applying a machine learning algorithm.  In the transformation approach, the target variable is modified, while in the variable selection approach the set of features is extended.

On the other hand, data processing methods require the development of new learning methods. The category can be subdivided into direct and indirect models. In the case of direct modeling, a single,  standard machine learning algorithm is modified to directly infer a treatment by optimising a measure of uplift. Tree-based models are typically adapted by modifying the splitting criteria and pruning techniques.

A two model approach is used for indirect modeling. Instead of using only one model to optimise a measure of uplift, a model is developed for both the treatment and the control group. While this method is typically faster to implement than direct models, they tend to give less accurate results.

### 4. Challenges

As previously mentioned, we cannot see the effects of applying and not applying a treatment on a customer at the same time. Therefore ground truth labels are only available for synthetic data, making it difficult to train and evaluate the model. A further constraint is that there are currently no well developed libraries built to perform uplift modeling. This is a definite gap in the market as these models have the potential to transform the manner in which marketing campaigns are conducted.

### 5. Conclusion

In this post, we gave an overview of how uplift modeling can be used by a marketer to choose an appropriate treatment for a particular customer. The treatment applied will depend on which customer segment the customer falls into. This allows marketing campaigns to be directed at the correct clients. The methodology of uplift modeling as well as the different modeling approaches was also discussed. It is clear that uplift modeling is a promising research area that will prove to be highly beneficial to increasing profits from marketing campaigns.

### References

Devriendt, F., Moldovan, D. and Verbeke, W. (2018). A Literature Survey and Experimental Evaluation of the State-of-the-Art in Uplift Modeling: A Stepping Stone Toward the Development of Prescriptive Analytics. Big Data, [online] 6(1), pp.13–41. Available at: https://www.liebertpub.com/doi/pdfplus/10.1089/big.2017.0104

Haupt, J.S. (2020). Machine Learning for Marketing Decision Support. [online] Available at: https://d-nb.info/1213721776/34#page=105

Olaya, D., Coussement, K. and Verbeke, W. (2020). A survey and benchmarking study of multitreatment uplift modeling. Data Mining and Knowledge Discover, 34 (2), pp.273–308

Shevchenko, M. and Elisova, I. (2020). User Guide for Uplift Modeling. [online] Available at: https://www.uplift-modeling.com/en/latest/user_guide/models/classification.html