Problems with current retention models

Customer retention is the act of keeping customers paying and using your services. For subscription-based business models like phone service providers, this area of focus is responsible for a significant amount of their profit. Customer acquisition is expensive and difficult, but customer retention is typically easier and ensures that the customer is paying for longer, making each customer acquisition more and more profitable as they stay longer.

Typical retention programs are informed through the work of multiple data scientists and several years of statistical calculation and model building. This method of human engineering is often expensive and time-consuming. does away with these steps of human engineering and data analytics through the use of automated AI modeling. By making the modeling process one automated step we perform years of work in minutes and the AI nature of the modeling process allows complicated mathematics and modeling processes to not only be performed faster but also more accurately. Machine learning effectively replaces standard human engineering as the cheaper, faster, and more accurate solution.

Flexibility is another one of’s strengths as we offer several different ways to inform retention strategy. Depending on your data, data format, and unique modeling goals, different models can be created and optimized for your use. The two examples used will be of a grocery store trying to retain customers and a phone company trying to keep people subscribed.

Solution A: Forecasting

Forecasting works by using the historical data of a customer and predicting their future actions. In the case of a grocery store, our historical data is how many times a customer made a transaction per month. This information will be used as a proxy to inform customer retention because when they are making transactions they are still a customer:


Our table includes 3 different customer IDs, the month the transactions occurred, and how many transactions did each customer make in that month. With this information a forecasting model learns how the customer acts by observing their transactions per month. It determines how much each customer buys on average and when their transactions seem to increase and decrease.

Note that each customer does not have transactions for every month. This could mean that they have stopped being a customer or perhaps were on vacation. Regardless, it is important to have every customer’s data to train a forecasting model, whether they left or not, it gives examples of downwards trends throughout the year allowing the forecast to show months that are problematic for transactions or maybe just when many people go on vacation.

Since we are using the number of transactions as our deciding factor for retention programs we would be sensitive to when that number approaches 0 for the customer. As their transactions decrease each month it means that the customer is moving towards possibly leaving. With this information, our grocery store can monitor how a customer’s transactions fluctuate and move customers whose transactions have dropped significantly into retention programs.

There are multiple settings when customizing your forecast, if you want the range of the forecast to be long or short, how long the forecast should run, or how often it should be updated are all quantities that need to be specified. For detailed information on how to set each one to your specific example see the Forecasting guide.

After training our forecasting model it is ready to be used to inform retention programs, but there is still one thing to set, its range. While it might seem that the simple solution is just to do a one month forecast because we have monthly data, such a short range would be prone to problems. Customer transaction history fluctuates month to month, what might seem like a serious retention problem from a drop in the customer’s activity, might be part of their natural trend. Instead, a forecast range of 3 months would give us a more complete picture of where the customer’s transaction activity will be headed and if it is truly heading towards 0 or just fluctuating. However, a longer forecast requires even more data.

As a general rule, forecasting is most efficient when your historical data is long and includes every single recorded customer. Our grocery store example table should be viewed as only a small section of what should be a much larger table including possibly hundreds of customers and stretching back several years for their transaction history. Large historical data of every customer allows the forecast to be able to evaluate its own predictions and increase its confidence that its prediction is accurate. Imagine trying to predict what would happen in June without knowing anything about what happened in previous Junes, you can do it, but without the info from past Junes, you have no way of knowing if your prediction is accurate. The solution is to have that customer’s data from past years’ Junes to evaluate your prediction and ensure its accuracy. This way every prediction has previous past examples to compare itself to, ensuring its accuracy through historical knowledge.

Solution B: Time-Series Classification

Most phone companies operate on a subscription base where people choose to renew their phone services at select periods of time, in this example, customers decide at the end of each month whether or not to renew their subscription. Our company is in the month of August and we want to know who will renew their subscription for September. To use time-series classification to predict if they renew we will need an ordered time-series dataset going from as far back as we have records (in this example that is the month of April) up to the month of July. We do not include August’s data (the current month) for training the model as that month does not have data on whether the customer subscribed or not for next month (because they haven’t decided yet) so it will not work for examples to train the model. Our April to July dataset will look something like this:


Our table contains basic customer data like the number of calls, data usage, and the number of text messages. These are then aggregated each month the customer is subscribed to give a monthly total, and whether the customer renewed their subscription by the end of that month is recorded in the last column. There are three customers whose data was recorded up till the period where we stopped recording (July in this example). Notice how each customer does not have data for each month, and the customers whose data ended, ends on the month they chose not to resubscribe. These are examples of customers who have left, meaning we no longer have data on them. They are included in the training data because they teach the model examples of what a customer who is leaving that month will act like. This information is extremely important when wanting to predict which customers might leave.

This data will be used to train a classification model that will predict whether the customer will resubscribe or not. Each prediction will be informed based on both that customer’s total history and examples of customers who have left to see if a customer fits the pattern of past leaving customers. Once the model is trained it can be applied to our August data to predict for the month of September:


Notice how these are customers who are currently subscribed in the month of August, each of their August data has been recorded, but their decision of whether to resubscribe has been left blank as it is both unknown and what we are trying to predict. The trained model will now look at this data and give predictions on whether the customer will resubscribe based on not only the current row but also all the historical data it has observed. Its predictions will be in the same format used to train it, (yes or no) and give confidence levels (given as percentages) to inform you how likely that yes or no will happen.

A final aspect to consider is determining how to use retention resources and deciding action thresholds. Because binary classification is effectively deciding whether or not to act, and the decision to act can never be 100% certain, it is important to decide ahead of time when your company should pursue retention. Customers who are 60% likely to leave still means that 40% will not, whereas customers who are 90% likely to leave means there are only 10% who will stay. If resources are limited it would be more cost-effective to target high-risk customers with 90% probabilities rather than lower ones. What this threshold is dependent on available resources for each retention program and company.

Solution C: Classification with Feature Engineering

Many companies prefer to transform their data in ways that might be more meaningful to them, or simply because that is the way the company has always done them. In the process they format a customer’s entire history into one row, making it perfect for a classification model. Using the example of a phone company trying to get people to renew their monthly subscription, our data recorded the statistics for how many calls each customer made vs how many calls are made on average for six, three, and one month periods. The ratio of their data vs the average was then recorded into columns. Put into a table it looks like this:


Because the formatted data is effectively a transformed version of a customer’s historical data it can be used to predict their future in much the same way a forecast model would look at historical trends. By training a classification model on examples of customer’s aggregated data and their subscription choices, the model can then be used to predict subscription choices of any customer once their various call ratios are determined and recorded.

The main advantage of classification tasks with is that it requires no data-tuning or determining which classification models to use. Built-in meta-learning will decide which parameters are meaningful and select the most optimum model to work with your data. Because the modeling process is automated, the time and resources it would typically require a team to create a model are significantly reduced. The only requirement is that data tables retain the same format and column names throughout their training and later use, in our example, a model trained on this table must only be used with future data tables using the exact same table format and columns. For more information on classification formatting and tasks see our Classification Guide.

Create Your Free Account

Share this Post