A project done for the course CSE3024 - Web Mining under Dr.A.Bhuvaneswari
- AKASH R 20BCE1501 Github: akash-r34
- ABRAR AHAMED 20BCE1437 Github: Abrar-Ahamed
- AKSHAY GIRISH 20BCE1573 Github: Akshaykviit023
- Python 3x
- Numpy
- Pandas
- Matplotlib
- Seaborn
- Requests
Getting data from an API can be considered a type of web mining, specifically, it can be classified as web data mining or web content mining.
Web data mining refers to the process of extracting data from various sources on the World Wide Web, including web pages, web documents, and web databases. In this case, the API serves as a web-based data source that can be accessed through standard web protocols such as HTTP.
Web content mining, on the other hand, refers to the process of extracting useful information from the content of web pages, such as text, images, and videos. While an API may not necessarily provide access to raw web page content, it can still be used to extract structured data, such as transit schedules or real-time vehicle locations.
Therefore, while getting data from an API is not a traditional form of web mining, it can still be classified as a type of web data mining or web content mining, depending on the nature of the data being retrieved
It is possible to sample and save the data of various buses and bus routes in a database. For the sake of this project, we will concentrate on a single data sample for a single bus that travels a single bus route. The data snippet was collected in March for roughly 5 hours (6 PM to 10:40 PM) throughout the evening on the Weekend RIT(Rochester Institute of Technology) Hotel bus (due to the Coronavirus pandemic that time the data collection was limited). The bus route has 6 distinct stops, and the entire trip takes 35 minutes. The bus service runs from 7:00 AM until 1:40 AM. You may get the complete bus timetable here.
An approximate rate of 2.2 samples per minute are taken from the data, which is then saved in a CSV file (Dataset/data.csv). The data is first averaged for every minute (1 minute bins). By doing this, it is made sure that every minute of the entire sampling period (about 5 hours) has a single passenger occupancy value assigned to it. Moreover, the data is averaged for 5-minute bins. The data is also averaged for bins of 35 minutes because the round journey time is 35 minutes. The timetable will be optimised using 35 minute bins as a foundation.
To determine whether a bus should operate based on passenger occupancy data, you would need to establish a minimum threshold for passenger occupancy that is required for a bus to be considered financially viable to operate. This threshold will likely vary depending on the specific circumstances of the bus route, such as the cost of fuel and labor, the price of tickets, and any subsidies or government funding that may be available. Once you have established this threshold, you would then need to analyze the passenger occupancy data for each bus route to determine whether the number of passengers on a given trip meets or exceeds the minimum threshold. If the occupancy level is below the threshold, you would need to consider whether there are any factors that may be contributing to the low ridership, such as poor scheduling or routing, lack of marketing or outreach, or competition from other modes of transportation.
If you determine that a bus route is not meeting the minimum occupancy threshold and there are no obvious solutions to increase ridership, it may be necessary to consider reducing the frequency of service or eliminating the route altogether. However, it's important to carefully weigh the potential impacts on passengers and the broader transportation network before making such decisions.
As passenger occupancy is exceptionally low during non-peak hours, particularly in the evening once most courses are done, a circumstance may emerge when no bus is scheduled for a lengthy period of time, for example, if we utilise a single threshold value of 10%. We may further improve the bus timetable while guaranteeing a convenient service for customers by setting distinct criteria for normal working/class hours (9AM to 7PM) and non-working hours (7PM forward). In the second row of the above picture, we can notice a rise in the amount of yellow blocks indicating that the bus will run for that specific time slot if a threshold of 5% is applied to all time slots starting at roughly 7 PM.
The final row of the following diagram demonstrates how dynamic thresholding, which uses 10% during working hours and 5% outside of them, might be used to improve bus schedules. The red boxes indicate that a bus will not run during that specific timeslot, whereas the green blocks show the time periods for which a bus will run. The schedule was tailored for this 5-hour data sample such that the bus only runs for 4/8 time slots, or 50% of the overall operating window. Even at this pace, the bus won't reach its maximum capacity, preventing congestion.
As seen in the above graphic, the dynamic thresholding optimisation approach will cause the bus to run for half as long as it usually does. The distance travelled by the Weekend RIT Hotel bus is around 10 miles round way. As a result, the bus's total distance travelled throughout the 5-hour period will drop by 40 miles.
An optimized bus schedule could be useful for transportation planners or other professionals. It can help them make data-driven decisions to improve the efficiency and sustainability of bus services, and ultimately improve the mobility and quality of life for passengers. By optimizing bus schedules, transit agencies can improve the efficiency of their operations, reduce fuel consumption, and reduce operating costs. Passengers benefit from more frequent and reliable service, shorter wait times, and reduced travel times. The fact that this project has been tested on a real-world network is a promising sign, although it would be important to verify its effectiveness in other contexts as well.
This project also has the potential to help the environment in several ways. By optimizing bus schedules to reduce fuel consumption and operating costs, transit agencies can also reduce their carbon footprint and other harmful environmental impacts. According to the US Department of Energy, the average fuel economy for a transit bus is 3.26 miles per gallon of gasoline, which can be drastically reduced by implementing a optimized bus schedule. Additionally, by providing more frequent and reliable service, public transit becomes a more attractive option for commuters, which can lead to reduced vehicle use and fewer greenhouse gas emissions from cars and trucks.