Analysis of Distinct Feature Groups in the Credit Scoring Problem

Aug 21, 2021·

Luiz F. V. Verçosa

Rodrigo Lira

Rodrigo P. Monteiro

Kleber D. M. Silva

Jailson O. Liberato

Alexandre M. A. Maciel

Byron L. D. Bezerra

Carmelo J. A. Bastos-Filho

· 0 min read

PDF

Abstract

Registration and financial data have been successfully used for the credit scoring problem. However, slight improvements in the reliability of the scores positively impacts financial companies. Therefore, exploring new features is a strategic task. This work analyzes the importance of new feature groups not commonly employed for the credit scoring task and others already used. We categorized features from open credit scoring datasets, such as German and Australian and compared their groups with the ones of a company dataset used in this work. Our dataset contains unusual feature groups, such as historical, geolocation, web behavior, and demographic data. In our analyzes, we first conducted bivariate tests with each feature-pair to assess their individual importance. Secondly, we ran XGBoost machine learning model with each feature group to evaluate each group importance. Next, we employed correlation tests to find inner and inter-correlation among the features groups. Finally, we used the full dataset and employed AdaBoost, Multilayer Perceptron, and XGBoost algorithms to find the best model for the task. We analyzed the results with different metrics and compared them with the company results. Our main finding was that the unusual features added a slight improvement to the standard dataset. We also identified the most promising feature groups as the historical group and noticed that the tuned XGBoost performed better than the company solution in three out of four deployed metrics.

Type

Journal article

Publication

JOURNAL OF INFORMATION AND DATA MANAGEMENT (JIDM'21)

Last updated on Aug 21, 2021

Authors

Luiz F. V. Verçosa

Authors

Rodrigo Lira

Professor

Professor no Instituto Federal de Educação, Ciência e Tecnologia de Pernambuco (IFPE) com doutorado em Engenharia da Computação pela Universidade de Pernambuco (2025) na área de Inteligência de Enxames e Aprendizado de Máquina. Possui Mestrado (2014) e Bacharelado (2013) em Engenharia da Computação pela mesma instituição. Realiza pesquisa de pós-doutorado em Engenharia de Sistemas na UPE. É conselheiro do Conselho Superior (CONSUP) do IFPE, atual coordenador de curso do Tecnologia em Análise e Desenvolvimento de Sistemas do Campus Paulista, possui também experiência como coordenador da Divisão de Pesquisa e Extensão.

É membro da Sociedade Brasileira de Computação (SBC), IEEE e Complexity Systems Society. Participa(ou) de projetos de inovação tecnológica com a Rede Nacional de Ensino e Pesquisa (RNP), Universidade de Pernambuco, CESAR e SENAI. Já coordenou projetos no IFPE em parceria com instituições como FACEPE, SiDi, IPA, SOFTEX, NIC.BR e Prefeitura de Paulista.