The skills the author demonstrated here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
Based on data, financial institutions often need to run a campaign in order to sell a product to potential customers. These campaigns cost time, money, and provide inconvenience to the people contacted if the product offered to them is a poor match. As a result of being inconvenienced, the customers can start having negative sentiments towards the bank, leading to reputational damage for the bank and the resulting profit loss from diminished long-term customer value. In addition, most customers will say no in the end, which makes identifying those that will say yes particularly challenging.
In the parlance of machine learning, this is known as an imbalanced classification problem. In this project, I solve this problem for the case of a Portuguese bank running telephone campaigns to sell long-term deposits. While this problem is challenging, much headway can be made to help the bank reach the right customers, save tens of thousands of dollars in concrete costs, and protect its reputation.
The data for this project was collected between May 2008 and June 2013 by a Portuguese banking institution and is available through UCI here. There are 45211 observations with features on bank client data (age, job type, marital status, education, housing and loans status), information regarding the last contact (including a leaked variable duration, which shall be dropped), information pertaining to previous campaigns, and some social or economic context variables (such as the 3-month borrowing rate between banks).
The goal is to predict whether the person contacted will subscribe to the long-term deposit, which is a kind of security deposit granting the lender a higher interest rate than a traditional savings account and granting the bank a guarantee of having the lender’s funds for a fixed period (such as 12 months). In the data, only about 11% of people contacted end up subscribing, making this classification problem highly imbalanced.
I’ve tried several approaches to address the imbalance issue: using the data as is, rebalancing using the SMOTE algorithm, and simple rebalancing based on sampling the minority class at a higher rate. In addition, I’ve tried XGBoost and random forest classification in R. Finally, careful attention must be paid to the choice of the metric, which I will discuss in more detail below.
Accuracy is not an acceptable metric for this problem: Classifying all customers as ‘No’ customers will achieve 89% accuracy without providing any insight into our group of interest, the ‘Yes’ customers. Since the positive class is the ‘Yes’ class, false positives would amount to predicting that the person will say yes when they will say no. I would like to avoid these noes to minimize inconvenience to our customers, the resulting reputational damage to the bank, and time lost on contacting the no customers.
However, I would tolerate some false positives to get more yeses and would not maximize precision, which is the ratio of true positive to true positives plus false positives, per se. False negatives are when one predicts that the customer will say yes when they will say no. The cost of this prediction is not getting a client, something one would wish to avoid in a sales situation. The ratio of true positives to true positives plus false negatives is known as recall, and it is of greater importance for this problem.
Nonetheless, I would like to strike a balance between precision and recall, using the ROC metric. This metric balances out the considerations regarding wanting a small number of both false positives and false negatives, and, in view of the other tools used to solve this problem, leads to the highest recall that can be achieved.
The final selection of rebalancing, model, and metric that I made is simple rebalancing, random forest model, and ROC metric. XGBoost was particularly prone to overfitting on this data, SMOTE seemed to introduce too much extra noise, and other metrics (such maximizing recall directly) did not work as well as maximizing ROC. I addressed the overfitting issue by ensuring that the nodes have a reasonable minimum number of observations (in this case, at least 40 observations in each final node were chosen).
The final model achieved an ROC score of .775 and identified a group of clients particularly likely to respond positively (3 of 8 clients identified would say yes). The group that can be reached by following the recommendations of this ML model corresponds to 63% of all the people that would say yes. The concern is, of course, that this would not be enough for the bank. I addressed this concern by lowering the classification threshold to .3 instead of the default .5, allowing 80% of the yes customers to be reached at a somewhat higher cost of noes. I will address the business value of these models below.
Suppose the bank obtains 100,000 records of potential customers and would like to determine which of these people to contact. Assuming customers likely to say yes are uniformly distributed within this data, the following table summarizes three possible approaches of contacting customers along with their corresponding costs.
Note that in both ML and No ML cases, the goal is to reach 63% of all yeses in the data. Once this target is reached, the bank’s agents/telemarketers stop calling the potential customers. Since the ML strategy gives the bank information helpful for reaching the right customers, the bank can save money and intangible costs that would otherwise be spent on reaching the noes unnecessarily. The lower bound calculations assume $10.00 per hour rate (converted from euros) for telemarketers time and the upper bound calculations assume $20.00 for bank employees’ time.
The actual hourly salaries of each of these worker groups are a little lower, $8.08 and $16.00, respectively, but I’m assuming workers need some time between the yes/no calls (perhaps for non-responding customers or data lookup/entry) and base the rates off time spent on call.A natural concern is that 63% is not good enough to meet the bank’s objectives. In that case, by lowering the classification threshold for a yes to .30, 80% of all the yeses can be reached, albeit at a higher cost.
After reaching the likely responders, I would suggest that the bank use the extra time it saves through the use of one of these strategies to target a different product to the customers unlikely to respond with a yes. It could also be the case that the bank determines that long-term deposits are the most profitable product that the it can offer its customers. In such case, the bank could decide to simply call all of its potential customers and accept the higher costs. In the end, this is as far as machine learning can take us, and the bank would need to conduct A/B tests for each of the three strategies before deploying the best one on all of its potential clients.
R Shiny App and Conclusions
I developed an R Shiny app to help bank employees determine if they should offer a long-term deposit to a customer. The context is that an employee may have an in-bank meeting or a telephone call with a customer regarding a different issue, but they could enter the customer’s information into the app to determine if they should also pitch the long-term security deposit. The app provides the probability that the customer will say yes, then suggests that the agent offer the product if the probability of customer accepting is above .5 if the agent is conservative or .3 if the agent is willing to take a bigger risk.
I believe that had the data been collecting around 2022, there would be more features one could use to build a much stronger predictive model. For example, the bank could perform NLP analysis to dissect agent/client interactions to determine which of agent’s actions lead to higher customer conversion rates. Apart from this example, in the days of expanding data collection, there are almost certainly other features one could obtain to build an even stronger model, and the bank should consult experts in this regard. In addition, the financial crisis occurred during the data collection period.
However, the timestamps are not available in the data and cannot be unambiguously inferred from other variables that have a time component (such as the 3-month European inter-bank borrowing rates). If the timestamp had been available, I could train a model only on data points that were not collected during the 2008 financial crisis to build a stronger model. Finally, as briefly mentioned above, it is imperative to A/B-test these machine learning models before using them in production and carefully monitor for data/model drift once the models are deployed.
Portugal bank employees and call center employees salary information:
salaryexplorer.com and https://www.erieri.com/salary/job/call-center-agent/Portugal