data-analytics-portfolio

⏮️ Back to Portfolio Home; ⬅️ Previous Project

House Prices Prediction Using Machine Learning

Overview

This project is my machine learning submission for the Kaggle competition House Prices: Advanced Regression Techniques. The objective is to predict the sale price of houses in Ames, Iowa, based on a dataset of 79 explanatory variables describing various aspects of residential homes. The project employs advanced regression techniques, including feature engineering, ensemble modeling, and stacking, to achieve competitive performance. The final submission uses a stacked regressor ensemble, yielding a cross-validation RMSE (log scale) of approximately 0.123 and a test RMSE of 0.129.

View Notebook

Dataset

The dataset consists of two CSV files:

Approach

The workflow follows a structured pipeline:

1 Data Loading and Splitting: Load training data and split into train/validation sets (80/20) to prevent leakage.

2 Exploratory Data Analysis (EDA):

3 Missing Value Handling:

4 Feature Engineering:

5 Feature Transformation:

6 Categorical Encoding:

7 Modeling:

8 Submission: Predict on test set and generate CSV.

Visualizations include:

Models and Performance

Base Models Evaluation

|Model | CV RMSE | Test RMSE |
|Ridge 🥈| 0.134726 | 0.134243 |
|RandomForest 🥉| 0.139180 | 0.144454 |
|GradientBoosting 🥇| 0.125139 | 0.134765 |

Stacked Ensemble

Top Features (Across Models):

Residual analysis indicates mild heteroscedasticity at higher prices, with Gradient Boosting showing the most stable predictions.

Insights

Sample submission predictions:
|Id | SalePrice |
|1461 | 118,987 |
|1462 | 158,082 |

Requirements

License

This project is licensed under the MIT License - see the LICENSE file for details.

Author

Contributions and issues are welcome! Please open a pull request or issue on GitHub.