This dataset offers a detailed collection of house listings from various cities and regions in Bangladesh, with a specific focus on Dhaka and Chittagong. It encompasses essential details such as location, property type, size, amenities, and #. The dataset is consistently updated to provide an accurate and current portrayal of the housing market in Bangladesh.
The dataset caters to diverse purposes, serving as a valuable resource for researchers, data scientists, real estate professionals, and investors. Some key use cases include:
- Analyzing regional price trends
- Identifying popular neighborhoods and amenities
- Training machine learning models for predicting housing prices
The following regression models have been applied to analyze and predict housing prices using this dataset:
-
Linear Regression:
- A fundamental model assuming a linear relationship between input features and housing prices.
-
XGBRegressor (Extreme Gradient Boosting):
- A powerful gradient boosting algorithm known for its speed and performance, applied specifically for regression tasks.
-
LGBMRegressor (LightGBM):
- Another gradient boosting framework, recognized for efficiency and speed, utilized for regression on this dataset.
-
Random Forest:
- A versatile ensemble learning model that leverages multiple decision trees to make predictions, often robust and effective for various datasets.
The dataset is structured with the following columns:
- Location: The geographical location of the property.
- Property Type: Categorization of the property (e.g., apartment, house).
- Size: Size or area of the property.
- Amenities: Features and facilities associated with the property.
- Price: The listed price of the property.
Before applying the regression models, the dataset underwent the following preprocessing steps:
-
Handling Missing Data:
- Any missing data in crucial columns was addressed through imputation or removal.
-
Encoding Categorical Variables:
- Categorical variables like "Property Type" were encoded to make them suitable for the regression models.
-
Feature Scaling:
- To ensure consistent model performance, numerical features were scaled.
-
Dataset Access:
- Download the dataset in CSV format for your analysis.
-
Run the Models:
- Explore the implementation of Linear Regression, Random Forest, XGBRegressor, and LGBMRegressor in the notebook here.
-
Customization:
- Customize models or dataset features based on specific research questions or objectives.
- Heterogeneity: The dataset may exhibit variations in property listings, requiring careful consideration during analysis.
- Outliers: Addressing outliers in # or property size may impact model performance.
If you encounter issues or have suggestions for improvement, please open an issue or submit a pull request.
Happy analyzing!