Analysis of Distinct Feature Groups in the Credit Scoring Problem

Resumo

Registration and financial data have been successfully used for the credit scoring problem. However, slight improvements in the reliability of the scores positively impacts financial companies. Therefore, exploring new features is a strategic task. This work analyzes the importance of new feature groups not commonly employed for the credit scoring task and others already used. We categorized features from open credit scoring datasets, such as German and Australian and compared their groups with the ones of a company dataset used in this work. Our dataset contains unusual feature groups, such as historical, geolocation, web behavior, and demographic data. In our analyzes, we first conducted bivariate tests with each feature-pair to assess their individual importance. Secondly, we ran XGBoost machine learning model with each feature group to evaluate each group importance. Next, we employed correlation tests to find inner and inter-correlation among the features groups. Finally, we used the full dataset and employed AdaBoost, Multilayer Perceptron, and XGBoost algorithms to find the best model for the task. We analyzed the results with different metrics and compared them with the company results. Our main finding was that the unusual features added a slight improvement to the standard dataset. We also identified the most promising feature groups as the historical group and noticed that the tuned XGBoost performed better than the company solution in three out of four deployed metrics.

Publicação
JOURNAL OF INFORMATION AND DATA MANAGEMENT (JIDM'21)
Rodrigo Lira
Rodrigo Lira
Professor

Rodrigo Lira é professor no IFPE e tem interesse nas áreas de inteligência de enxames, aprendizado de máquina e IoT.

Próximo
Anterior