nav emailalert searchbtn searchbox tablepage yinyongbenwen piczone journalimg journalInfo journalinfonormal searchdiv searchzone qikanlogo popupnotification paper paperNew
2025, 04, v.14 49-56
基于Stacking模型融合的信用卡违约风险评估与预测
基金项目(Foundation): 安徽省高等学校省级质量工程重大教学研究项目(2023jyxm0151)
邮箱(Email): djheahnu@163.com;
DOI: 10.19943/j.2095-3070.jmmia.2025.04.06
摘要:

本文使用Stacking融合算法作为最终预警模型,预测信用卡用户次月违约的可能性.首先采用数据预处理与特征工程技术的方法,对数据集进行深入的处理和特征筛选,接着对数据进行平衡化处理.采用多种机器学习算法进行模型训练和优化,通过五折交叉验证法和网格搜索进行模型调参,确定模型中的最佳参数组合.引入4个模型评估指标,用于比较各分类模型的性能.对比指标取值后发现随机森林算法、 AdaBoost算法、 XGBoost算法和LightGBM算法的预测效果最好,进而用AdaBoost算法、 XGBoost算法和LightGBM算法作为Stacking融合模型的基模型,用随机森林作为Stacking融合模型的元模型,构建一个两层Stacking融合模型.结果表明,Stacking融合模型的分类效果要优于单个分类模型.

Abstract:

With the rapid development of our country′s economy, the credit card business has risen swiftly. For financial institutions, the risk of credit card delinquency is increasing. In this paper, we use the Stacking fusion algorithm as the final warning model to predict the possibility of credit card users defaulting in the following month. It employs data preprocessing and feature engineering techniques aimed at deep processing and feature selection of the dataset, followed by balancing the data. Various machine learning algorithms were used for model training and optimization, and the model parameters were tuned using 5-fold cross-validation and grid search to determine the best combination of parameters in the model. Four model evaluation metrics were introduced in order to compare the performance of different classification models. After comparing the values of these metrics, it was found that the Random Forest, AdaBoost, XGBoost, and LightGBM algorithms performed the best in prediction. Therefore, AdaBoost, XGBoost and LightGBM algorithms were used as the base models for the stacking fusion model, with random forest serving as the meta-model, constructing a two-layer stacking ensemble model. The results show that the classification effect of the stacking fusion model is much better than that of a single classification model.

参考文献

[1]中国政府网.国家统计局相关部门负责人解读2023年主要经济数据[EB/OL].(2024-01-18)[2024-08-20].https://www.gov.cn/lianbo/bumen/202401/content_6926737.htm.

[2]Makowski P.Credit scoring branches out[J].The Credit World,1985,75(1):30-37.

[3]Bhattacharyya S,Jha S,Tharakunnel K,et al.Data mining for credit card fraud:a comparative study[J].Decision Support Systems,2011,50(3):602-613.

[4]Matthias S,Rosie Y Z.The random forest algorithm for statistical learning[J].The Stata Journal,2020,20(1):3-29.

[5]Fabrizio C,Yann-A?l L B,Olivier C,et al.Combining unsupervised and supervised learning in credit card fraud detection[J].Information Sciences,2021,557:317-331.

[6]Zhang H,He H,Zhang W.Classifier selection and clustering with fuzzy assignment in ensemble model for credit scoring[J].Neurocomputing,2018,316:210-221.

[7]Butaru F,Chen Q,Clark B,et al.Risk and risk management in the credit card industry[J].Journal of Banking & Finance,2016,72:218-239.

[8]刘武男.数据挖掘技术在信用卡信用风险评估中的应用[D].成都:西南财经大学,2005.

[9]王智立.基于主成分分析的随机森林信用卡违约预测[J].金融文坛,2023,1:49-52.

[10]姜明辉,谢行恒,王树林.个人信用评估的Logistic-RBF 组合模型[J].哈尔滨工业大学报,2007,39(7):1128-1130.

[11]李健,张金林.供应链金融的信用风险识别及预警模型研究[J].经济管理,2019,41(8):178-196.

[12]吴文凯,代红,任玲,等.Stacking算法的优化与改进[J].海峡科技与产业,2019,4:99-100.

[13]涂伟华,王索漫.基于数据挖掘方法对商业银行信用卡违约预测模型的研究[J].中国证券期货,2011,9:146-147.

[14]刘建伟,赵会丹,罗雄麟,等.深度学习批归一化及其相关算法研究进展[J].自动化学报,2020,46(6):1190-1120.

[15]Wagner T J.Convergence of the edited nearest neighbor[J].IEEE Transactions on Information Theory,1973,19(5):696-697.

[16]周志华.机器学习[M].北京:清华大学出版社,2016.

基本信息:

DOI:10.19943/j.2095-3070.jmmia.2025.04.06

中图分类号:F832.2;TP18

引用信息:

[1]何道江,母远缘.基于Stacking模型融合的信用卡违约风险评估与预测[J].数学建模及其应用,2025,14(04):49-56.DOI:10.19943/j.2095-3070.jmmia.2025.04.06.

基金信息:

安徽省高等学校省级质量工程重大教学研究项目(2023jyxm0151)

引用

GB/T 7714-2015 格式引文
MLA格式引文
APA格式引文