Earnings and Leave Statements (pay stubs) from https://www.employeeexpress.gov are provided only in either HTML or Text format as PDFs, not as downloadable spreadsheets. This program scrapes all relevant information from Text format PDFs of each statement including pay (gross and net), deductions (federal and state taxes, health/dental/vision, TSP, retirement, etc.), benefits paid by government (e.g. TSP matching), and annual and sick leave balances. This program also accounts for and scrapes data from items that do not appear in every bi-weekly statement such as use of annual or sick leave and bonuses (cash or leave) earned.
The second part of the program incorporates personal expenses from year-end Chase and American Express credit card statements through 2020 and cleans those data. This allows for analysis of top spending categories, change in spending categories across years, as well as net savings based on pay.
Input data (pay stubs and expense statements) are not included in the GitHub Repository to protect personal information. However, a subset of the combined credit card statements across financial institutions is included in the Output folder along with an output graph of expenses by category.