Analysis of Distinct Feature Groups in the Credit Scoring Problem

Aug 21, 2021·
Luiz F. V. Verçosa
Rodrigo Lira
Rodrigo Lira
,
Rodrigo P. Monteiro
,
Kleber D. M. Silva
,
Jailson O. Liberato
,
Alexandre M. A. Maciel
,
Byron L. D. Bezerra
,
Carmelo J. A. Bastos-Filho
· 0 min read
PDF
Abstract
Registration and financial data have been successfully used for the credit scoring problem. However, slight improvements in the reliability of the scores positively impacts financial companies. Therefore, exploring new features is a strategic task. This work analyzes the importance of new feature groups not commonly employed for the credit scoring task and others already used. We categorized features from open credit scoring datasets, such as German and Australian and compared their groups with the ones of a company dataset used in this work. Our dataset contains unusual feature groups, such as historical, geolocation, web behavior, and demographic data. In our analyzes, we first conducted bivariate tests with each feature-pair to assess their individual importance. Secondly, we ran XGBoost machine learning model with each feature group to evaluate each group importance. Next, we employed correlation tests to find inner and inter-correlation among the features groups. Finally, we used the full dataset and employed AdaBoost, Multilayer Perceptron, and XGBoost algorithms to find the best model for the task. We analyzed the results with different metrics and compared them with the company results. Our main finding was that the unusual features added a slight improvement to the standard dataset. We also identified the most promising feature groups as the historical group and noticed that the tuned XGBoost performed better than the company solution in three out of four deployed metrics.
Type
Publication
JOURNAL OF INFORMATION AND DATA MANAGEMENT (JIDM'21)
publications
Rodrigo Lira
Authors
Professor

Professor no Instituto Federal de Educação, Ciência e Tecnologia de Pernambuco (IFPE) com doutorado em Engenharia da Computação pela Universidade de Pernambuco (2025) na área de Inteligência de Enxames e Aprendizado de Máquina. Possui Mestrado (2014) e Bacharelado (2013) em Engenharia da Computação pela mesma instituição. Realiza pesquisa de pós-doutorado em Engenharia de Sistemas na UPE. É conselheiro do Conselho Superior (CONSUP) do IFPE, atual coordenador de curso do Tecnológo em Análise e Desenvolvimento de Sistemas do Campus Paulista, possuitambém experiência coordenador da Divisão de Pesquisa e Extensão.

É membro da Sociedade Brasileira de Computação (SBC), IEEE e Complexity Systems Society. Desde 2023, participa de projetos de inovação tecnológica da Rede Nacional de Ensino e Pesquisa (RNP). Já coordenou projetos de pesquisa e extensão no IFPE em parceria com instituições como FACEPE, SiDi, IPA, SOFTEX, NIC.BR e Prefeitura de Paulista.