Stores can target campaigns at its customers in order to generate more revenue or maintain customer loyalty, for example. However, which customers should they target with various campaigns? Knowing about star or disengaging customers can also help the store know where to put its efforts, both in terms of campaigns as well as other aspects, such as types of goods, the quality of customers’ in-store or online shopping experience, etc. The purpose of this study is to examine the store’s revenue and campaign effectiveness data in order to uncover such insights. The Kaggle data set for store’s customers and purchases spans 796 days (2012-2014) and consists of 2240 individual observations on
- Customer characteristics (year of birth, education, income, number of children, etc.)
- Products (aggregated amounts spent of wines, fruit, sweets, etc.)
- Products (aggregated amounts spent of wines, fruit, sweets, etc.)
- Promotion acceptance for 5 campaigns
- Number of purchases done on the web, using catalog, or in store, as well as the total number of web visits
It contains about 16 missing income observations and a few extreme outliers in income and age. Using this data, I run a clustering analysis to learn more about the store’s customers and offer suggestions targeted at each of the clusters. In the end, I identify two particularly important clusters, Cluster 1 and Cluster 5. Cluster 5 are the recent high-income customers who value the in-store experience and a polished catalog. They are not interested in deals, but will respond to a well-executed campaign. When normalized by length spent as customers, this cluster brings the most revenue to the company.
Cluster 1 are the high-income loyalty customers who are responsible for most historic revenue brought to the company. Like Cluster 5, they value in-store experience, a polished catalog, are not responsive to deals, but will respond to a well-structured campaign.
The following graph provides a glimpse at the store’s main revenue drivers, namely wine and meat products.
From the graph below, it is clear that the greatest number of purchases occurs in-store, followed by web purchases. There are also many web visits that are not necessarily tied to a purchase.
While these visualizations provide some insights, a much richer analysis results from clustering analysis discussed in the following sections.
Since I had no additional background regarding the store and its customers, I did not have prior knowledge of an appropriate number for store’s customers. Machine learning can help in this regard, however, as clusters can be discovered based on geometric closeness in the feature space. Since categorical features generally do not have a clear associated notion of distance, I only used numeric features for clustering. However, once cluster labels are added to the data, it is possible to look at the way numeric features vary among clusters. After performing missing value imputation, outlier removal, and feature scaling, I used the elbow method to choose 6 as the optimal number of clusters to use.
One could also transform the long-tailed features with a log-transform before doing feature scaling. This yields results that are qualitatively similar to the ones in this analysis. In addition to KMeans, I’ve tried DBSCAN, which did not yield useful results for this data. DBSCAN is based on the notion of cluster density, hence it’s strong at separating high density clusters from low density clusters. It struggles with high dimensional data, however, so perhaps reducing the data dimensions to the most important features and retrying DBSCAN could yield better results. As future work, it could also be useful to try KModes clustering to see if additional insights can be generated once categorical features are taken into account.
Initial Observations and Revenue Analysis
From the graphs below, it is evident that while clusters 2 – 4 are the biggest clusters by customer size, clusters 1, 5, and 0 bring in the most revenue.
These higher-spending clusters are also the highest income earners, as the following graph indicates.
Here is another look at cluster incomes:
A crucial pattern emerges: If we look at total amounts spent by cluster, it appears that Cluster 1 is the most important, followed by Clusters 5 and 0.
However, if we normalize by time spent as customer, Cluster 5 spends the most, followed by Clusters 1 and 0.
I’ve explored the possible dimensions of Cluster 1 and 5 differences (age, education, marital status, etc), finding that the key distinguishing characteristic of Cluster 5 is length spent as customers: Cluster 5 customers are more recent. Please see the graph below to observe this difference.
A key objective for the store is to keep Cluster 5 engaged and re-engage Cluster 1. Cluster 0 is similar to Clusters 1 and 5, but has somewhat lower income and more children. I’ll suggest a strategy to target it in the next section. I’m not providing the graphs here for conciseness, but Clusters 2-4 have lower income, are more likely to have children, and spend less. In what follows, I’ll look at campaign effectiveness and other means of engaging the customers.
Campaign Effectiveness and Other Observations
From the graph below, it is evident that campaigns were only moderately successful: In every cluster, the majority of the customers did not accept any campaign.
Campaigns 1 and 5 were more successful with Clusters 1 and 5, indicating that the store perhaps tried to replicate parts of the first campaign with the last one.
Campaign 4 did better with Cluster 0, while other campaigns (not shown here) had a more uniform low acceptance rate.
It would be desirable to obtain more detail on Campaigns 1,4, and 5 in order to determine how to run improved versions of the three campaigns discussed above.
In terms of where customers make their purchases, Cluster 5 favor the store, followed by catalog and web.
Cluster 0, followed by Clusters 2-4, are most responsive to deals, which the store should target at these customers. For example, if store wishes to retire a product or create more inventory space, targeting deals at Cluster 0 would be a good strategy.
Conclusions and Future Work
The store should concentrate on running well-structured campaigns to even better engage Cluster 1 and monitor the engagement of Cluster 5. A pleasant in-store shopping experience and a polished catalog are critical, but the store should study this demographic in detail to earn even more of their business.
Cluster 0 are have higher than average income but more children (hence probably less dispensable income), are responsive to deals and an occasional campaign. It’s best to target deals at these customers.
Clusters 2-4 have lower than average income, accept deals at a higher rate, and visit the company’s website. The store can target deals and website promotions at this demographic.
In the future, the store can use the suggestions above to perform A/B testing to target different deals to different customer clusters with a focus on profitability. Furthermore, it is crucial to keep Cluster 5 customers engaged (by targeted campaigns, positive in-store experience, and other solutions that are yet to be observed) and to boost future engagement of Cluster 1 customers by similar or even better targeted measures.
For future work, it would be helpful to have more information about the data set and the store in order to answer the following questions:
- What do we know about the rationale behind each campaign? What distinguishes the campaigns?
- What more can we learn about our customers? Specifically, are there factors that differentiate Cluster 5 that are not in the data?
- What else can be learned about the way the store is making customers in-store and online shopping experience pleasant? Are there way to improve?
- Can the store learn to do profitable business with Cluster 2-4?
- Do A/B testing to judge the effectiveness of recommendations
- Get more granular data on store purchases
- Get more data on profitability rather than just revenue, as profitability is the store’s key objective