dc.description.abstracten |
The modeling of CLV in retail is a complicated task due to the lack of access to historical data of purchases, the difficulty of customer identification, and building the historical reference with a particular customer. Inthisresearch, historical transactional data were taken from twelve North American brick-and-mortar grocery stores to compare different approaches to CLV modeling in terms of segmentation and forecast. Data engineering pipeline was applied to raw transactional data to transfer it into ready-for-modeling datasets, providing with the logic of each obtained feature. K-Means, Gaussian Mixture Model (GMM), DBSCAN clustering algorithms were applied to customer segmentation. The best outputs of clustering samples were later tested in CLV modeling. Unexpectedly, the K-Means algorithm results overperformed both GMM and DBSCAN ones. For CLV modeling, two main models were considered: Markov Chain probabilistic approach of changing purchase behavior over time alongside with econometric Time Series revenue forecast and Survival Analytics lifespan estimates. The suggestions on CLV estimation for the offline retail business case were derived after result comparison with given advantages and limitations of each approach. Markov Chain model was suggested to check the general picture of the ongoing processes from the long-term perspective. On the other hand, Time Series revenue forecasting with Survival Analytics lifespan estimates could be used to check the expectations for the nearest feature. Moreover, the business value of CLV estimates and its applications were shown on examples derived from the results of both models: defined the promising clusters, checked their stability, and how they were formed, what customers were at risk to churn. |
uk |