This project is a web scraper that automates the process of extracting LinkedIn alumni data from a specific university. It collects information such as names, job titles, profile links, experience, education, and certifications.
- Automated Login: Supports auto login that able to bypass LinkedIn's bot detection.
- Manual Login: Safer option to bypass LinkedIn's bot detection.
- Dynamic Scrolling: Loads more profiles dynamically for comprehensive data extraction.
- Profile Scraping: Extracts detailed information such as work experience, education, and certifications.
- Data Storage: Saves extracted data into a CSV file for further analysis.
- City-Based Search: Scrapes alumni based on city names from a predefined list.
Before running the scraper, ensure you have the following dependencies installed:
pip install -r requirements.txt
selenium>=4.10.0
– Automates web interactions.webdriver-manager>=4.0.1
– Manages WebDriver installations.beautifulsoup4>=4.12.0
– Parses HTML content.lxml>=4.9.0
– Faster XML and HTML parsing.
pandas>=2.1.0
– Data manipulation and analysis.numpy>=1.25.0
– Numerical computing.openpyxl>=3.1.2
– Reads and writes Excel files.python-dotenv>=1.0.0
– Loads environment variables.
-
Install Dependencies
pip install -r requirements.txt
-
Set Up Credentials Create a
.env
file in the project directory and add your LinkedIn credentials:LINKEDIN_EMAIL=your-email@example.com LINKEDIN_PASSWORD=your-password
-
Download ChromeDriver Ensure you have the appropriate version of ChromeDriver installed. The script will attempt to download it automatically using
webdriver-manager
. -
Prepare City List Ensure that the
Data/Person Locations/indonesia_cities.csv
file contains a list of cities in a column namedCity
. -
Prepare Class Code
('div', { 'class': 'YqprdwMdlHkSDMqLRuVsNMDuqpfpOSlCY EUugwXMAWHNSsJUZCvVoLYGTUzCejokiBUPPY aDbiGyAraCVAtqkDKUGRiLuhDZgkXmYiMA' # Make sure this Code is UP TO DATE })
Ensure that theClass
, code from your Linkedin is Up To Date, the Class Code on the program might be different due to Linkedin Dynamic SectionClass
Code.💡Tips: Place your Cursor in the Border of the Section While Inspect With Cursor
Run the script with the following command:
python main.py
- The script will prompt you to log in manually to LinkedIn.
- After logging in, press Enter in the terminal to continue.
- The script will automatically login to your LinkedIn, ensure your Email and Password on
.env
are correct. - Don't do to much, otherwise the Linkedin Anti-Scraping System will notice unusual request and your account can get restriction.
- Press Enter to continue scraping the next profile.
- Type
next
to skip to the next city. - Type
exit
to stop the script immediately.
The extracted data will be saved in:
Data/LinkedIn_SCU_Alumni.csv
with the following fields:
- City
- Name
- Job Title
- LinkedIn Profile Link
- Profile Picture URL
- Experience
- Education
- Licenses & Certifications
- Scraping LinkedIn data is against their terms of service; use this tool responsibly.
- Avoid running the script too frequently to prevent detection.
- Ensure your LinkedIn account is in good standing before scraping.
This project is for educational purposes only. Use at your own risk.
Author: Faiz Noor Adhytia
Contact: faizadhytia24@gmail.com