Journal: IPSI Transactions on Internet Research


Addressing Class Imbalance in Customer Response Modeling Using Random and Clustering-Based Undersampling and SVM

Authors: Kašćelan, Ljiljana and Vuković, Sunčica


View PDF Cite this article

Abstract

The main challenge in machine learning-based customer response models is the class imbalance problem, i.e. small number of respondents, compared to non-respondents. Aiming to overcome this issue, the approach of preprocessing training data using a Support Vector Machine (SVM), trained on a balanced sample obtained by random undersampling (B-SVM), as well as on a balanced sample obtained by clustering-based undersampling (CB-SVM) was tested. Several classifiers are tested on such a balanced dataset, to compare their predictive performances. The results of this paper demonstrate that the approach effectively preprocesses the training data, and, in turn, reduces noise and overcomes the class imbalance problem. Better predictive performance was achieved compared to standard training data balancing techniques such as undersampling and SMOTE. CB-SVM gives a better sensitivity, while B-SVM gives a better ratio of sensitivity and specificity. Organizations can utilize this approach to balance training data automatically and simply and more efficiently select customers that should be targeted in the next direct marketing campaigns


Keywords

customer response model, data imbalance, data preprocessing, clustering, support vector machine


Published in: IPSI Transaction on Internet Research (Volume: 20, Issue: 2)
Publisher: IPSI, Belgrade

Date of Publication: July 1, 2024

Open Access: CC-BY-NC-ND
DOI: 10.58245/ipsi.tir.2402.00

Pages: 1 - 9

ISSN: 1820 - 4503



References

1. G. Kim, B. K. Chae, and D. L. Olson, “A support vector machine (SVM) approach to imbalanced datasets of customer responses: Comparison with other customer response models,” Serv. Bus., vol. 7, no. 1, pp. 167–182, 2013, doi: 10.1007/s11628-012- 0147-9.

2. V. L. Miguéis, A. S. Camanho, and J. Borges, “Predicting direct marketing response in banking: comparison of class imbalance methods,” Serv. Bus., vol. 11, no. 4, pp. 831–849, 2017, doi: 10.1007/s11628-016-0332-3.

3. M. M. Al-Rifaie and H. A. Alhakbani, “Handling class imbalance in direct marketing dataset using a hybrid data and algorithmic level solutions,” Proc. 2016 SAI Comput. Conf. SAI 2016, pp. 446–451, 2016, doi: 10.1109/SAI.2016.7556019.

4. H. Shin and S. Cho, “Response modeling with support vector machines,” vol. 30, no. 4, pp. 746– 760, 2006, doi: 10.1016/j.eswa.2005.07.037.

5. N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: synthetic minority oversampling technique,” J. Artif. Intell. Res., vol. 16, pp. 321–357, 2002.

6. G. Douzas, F. Bacao, and F. Last, “Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE,” Inf. Sci. (Ny)., vol. 465, pp. 1–20, 2018, doi: 10.1016/j.ins.2018.06.056.

7. S. Rogić and L. Kašćelan, “Class balancing in customer segments classification using support vector machine rule extraction and ensemble learning,” Comput. Sci. Inf. Syst., vol. 18, no. 00, pp. 52–52, 2020, doi: 10.2298/csis200530052r.

8. D. Martens, J. Huysmans, R. Setiono, J. Vanthienen, and B. Baesens, “Rule extraction from support vector machines: An overview of issues and application in credit scoring,” Stud. Comput. Intell., vol. 80, no. 2008, pp. 33–63, 2008, doi: 10.1007/978-3-540-75390-2_2.

9. M. A. H. Farquad and I. Bose, “Preprocessing unbalanced data using support vector machine,” Decis. Support Syst., vol. 53, no. 1, pp. 226–233, 2012, doi: 10.1016/j.dss.2012.01.016.

10. Y. Yao et al., “K-SVM: An effective SVM algorithm based on K-means clustering,” J. Comput., vol. 8, no. 10, pp. 2632–2639, 2013, doi: 10.4304/jcp.8.10.2632-2639.

...

×

Ljiljana Kašćelan

Ljiljana Kašćelan is a full professor at the Faculty of Economics, University of Montenegro since 1992. Business databases (relational databases and SQL) and business intelligence (data warehouse, OLAP, big data, data mining and machine learning with applications in business) are her main research areas.
Email: ljiljak@ucg.ac.me, ORCID: 0000-0001-9831-7599

×

Sunčica Vuković

Sunčica Vuković is a teaching assistant at the Faculty of Economics, University of Montenegro. Her research interests are Marketing Analytics, Direct and Digital Marketing and Data Mining Applications in Business.
Email: suncica@ucg.ac.me

×

Cite this article

Kašćelan, Ljiljana and Vuković, Sunčica
"Addressing Class Imbalance in Customer Response Modeling Using Random and Clustering-Based Undersampling and SVM",
IPSI Transactions on Internet Research, vol. 20(2), pp. 1-9, 2024. https://doi.org/10.58245/ipsi.tir.2402.00