⏮️ Back to Portfolio Home; ⬅️ Previous Project; ➡️ Next Project
This notebook develops a machine learning workflow to predict the likelihood of loan defaults using borrower demographic, financial, and credit data. The model can help financial institutions assess applicant risk, minimize credit losses, and improve lending strategies.
The dataset has been taken from Coursera’s Loan Default Prediction Challenge deposited in Kaggle. The dataset contains 255,347 records with borrower demographic, financial, and credit history as features. All columns names and descriptions are outlined below:
LoanID
- A unique identifier for each loanAge
- The age of the borrowerIncome
- The annual income of the borrowerLoanAmount
- The amount of money being borrowedCreditScore
- The credit score of the borrowerMonthsEmployed
- The number of months the borrower has been employedNumCreditLines
- The number of credit lines the borrower has openInterestRate
- The interest rate for the loanLoanTerm
- The term length of the loan in monthsDTIRatio
- The Debt-to-Income ratioEducation
- The highest level of education attained by the borrowerEmploymentType
- The type of employment status of the borrowerMaritalStatus
- The marital status of the borrowerHasMortgage
- Whether the borrower has a mortgageHasDependents
- Whether the borrower has dependentsLoanPurpose
- The purpose of the loanHasCoSigner
- Whether the loan has a co-signerDefault
- Indicates whether the loan defaulted or notThis project demonstrates how machine learning can support credit risk management by identifying borrowers with a high likelihood of default. The optimized Random Forest model achieves strong recall, making it well-suited for early risk detection. Financial institutions could apply such models to screen applications, refine risk-based pricing, and reduce portfolio losses, ultimately improving lending strategies.