data-analytics-portfolio

⏮️ Back to Portfolio Home; ⬅️ Previous Project; ➡️ Next Project

Loan Default Risk Prediction

This notebook develops a machine learning workflow to predict the likelihood of loan defaults using borrower demographic, financial, and credit data. The model can help financial institutions assess applicant risk, minimize credit losses, and improve lending strategies.


Table of Contents

  1. Import Libraries
  2. Load the Dataset
  3. Exploratory Data Analysis (EDA)
  4. Preprocessing
  5. Principal Component Analysis (PCA)
  6. Model Training and Evaluation
  7. Business Insights

Notebook

Open Jupyter Notebook


Dataset

The dataset has been taken from Coursera’s Loan Default Prediction Challenge deposited in Kaggle. The dataset contains 255,347 records with borrower demographic, financial, and credit history as features. All columns names and descriptions are outlined below:


Key Results


Conclusion

This project demonstrates how machine learning can support credit risk management by identifying borrowers with a high likelihood of default. The optimized Random Forest model achieves strong recall, making it well-suited for early risk detection. Financial institutions could apply such models to screen applications, refine risk-based pricing, and reduce portfolio losses, ultimately improving lending strategies.