- Overview
- Features
- Project Structure
- Installation
- Usage
- Dependencies
- Configuration
- Contributing
- License
- Contact
The Trustpilot Housing Reviews Analysis project aims to extract and analyze Trustpilot reviews for various housing providers. The project consists of two main components:
- Data Extraction: Automates the retrieval of Trustpilot reviews for specified housing providers.
- Classification: Processes and classifies the extracted reviews to identify key issues and sentiment trends.
This analysis helps in understanding tenant satisfaction, common complaints, and areas requiring improvement for housing providers.
- Funding This project was funded by a Campion Grant awarded by Manchester Statistical Society. See https://manstatsoc.org/ for more information.
- Report An external link to the report accompanying this project can be found here: MSS REPORT LINK WHEN PUBLISHED.
- Automated Data Extraction: Scrapes Trustpilot reviews for selected housing providers.
- Data Cleaning and Preprocessing: Cleans the extracted data for accurate analysis.
- Text Classification: Categorizes reviews into predefined categories (e.g., Maintenance, Customer Service).
- Reporting: Generates summary reports and visualizations of findings.
SocialHousing/
│
├── Housing Association Review Classification and Theme Visualization.ipynb # Notebook for classifying reviews and visualizing themes in housing association data.
├── Keyword Analysis 1 Star Reviews.ipynb # Notebook for analyzing keywords within 1 star housing association reviews.
├── LICENSE # Project license file.
├── README.md # Repository overview, setup instructions, and usage guidelines.
├── Themes2D.xlsx # Excel file containing theme data, with keywords for HACT UK Data Standards classes, for visualization and further analysis.
├── Trustpilot Review Single Page Extractor.ipynb # Notebook for scraping reviews from a single Trustpilot page.
└── Trustpilot Review Extraction Compilation.ipynb # Notebook for systematically extracting across multiple Trustpilot pages.
- Python 3.8+: Ensure you have Python installed. You can download it here.
git clone https://github.com/yourusername/trustpilot-housing-reviews-analysis.git
cd trustpilot-housing-reviews-analysis
The data extraction component scrapes Trustpilot for reviews related to specified housing providers.
Edit the extraction.py file to specify the housing providers you want to analyze.
# Example
housing_providers = [
"msvhousing",
"clarionhousing",
"onehousing",
"onward",
"yourhousinggroup",
"jigsawhomes",
"placesforpeople",
"guinnesspartnership"
]
You can run the data extraction tool using the provided script or via a Jupyter notebook.
Using Jupyter Notebook:
Open Trustpilot Review Extraction Compilation.ipynb
and run the cells sequentially.
The classification component processes the extracted reviews and categorizes them based on predefined criteria.
Using Jupyter Notebook:
Open Housing Association Review Classification and Theme Visualization.ipynb
and run the cells sequentially.
Required Python packages are:
- Requests: HTTP library for web scraping.
- BeautifulSoup4: Web scraping.
- pandas: Data manipulation and analysis.
- scikit-learn: Machine learning for classification.
- Matplotlib: Data visualization.
- Seaborn: Data visualization.
- Jupyter Notebook: Interactive development.
Contributions are welcome! Please follow these steps:
-
Fork the Repository
-
Create a Feature Branch
git checkout -b feature/YourFeature
-
Commit Your Changes
git commit -m "Add some feature"
-
Push to the Branch
git push origin feature/YourFeature
-
Open a Pull Request
This project is licensed under the CC0-1.0 (LICENSE).
For any questions or suggestions, please open an issue or contact guy@fuza.co.uk.
Disclaimer: This project is not affiliated with Trustpilot or any of the housing providers mentioned. It is intended for educational and analytical purposes only.