A Review of Machine Learning Techniques for RiskEvaluation in Healthcare and Insurance Systems
Main Article Content
Abstract
Financial institutions require an accurate estimation of the risk of loan default in order to reduce losses incurred by credit
and sustain lending. This study proposes a robust stacking-based machine learning framework that integrates Knowledge Graph
Embedding (KGE) for semantic feature enrichment with XGBoost as the final predictive model. The approach is evaluated on the
Home Credit Default Risk (HCDR) dataset, comprising diverse financial, demographic, and behavioral attributes of loan applicants.
A comprehensive preprocessing pipeline, including imputation, normalization, one-hot encoding, and correlation-based feature
selection, ensures data quality and model generalizability. The proposed KGE-XGBoost model captures both structured tabular and
relational semantics by transforming borrower-entity relationships into dense embeddings, which are concatenated with original
features to form a unified representation. Experimental results demonstrate superior performance with 96.79% accuracy (ACC),
80.83% precision (PRE), 78.75% recall (REC), and an F1-score (F1) of 79.00%. The proposed model exhibits a strong ability to
outperform the baseline models (Random Forest achieved ACC 94.20%, NN achieved ACC 89%, and DT achieved ACC 73%),
particularly in scenarios with class imbalances. The KGE integration has been found to greatly contribute to feature expressiveness
and it presents a scalable and promising credit risk assessment solution to real-life financial applications.
Downloads
Article Details
Section

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.