The following is a data analysis task as part of a volunteering activity at Amazon. A special thanks to the Catalogue Specialist team for providing the opportunity to showcase this work on GitHub
- Update (Dec 2021) : The Python code was compressed and simplified using a smarter regex and function approach (version 2.0). Additionally, merchant IDs (tokens that range from 10-15 characters) are now included in searching pattern
Use python script (in Jupyter Lab IDE) to extract specific text from a collection of computer logs stored in an excel file. We use Pandas DataFrame and string pattern matching to accomplish this task
Name | Purpose |
---|---|
Email_asins_extract.ipynb | Jupyter notebook containing the extraction code (version 1.0) |
asin_exctrator.ipynb | Jupyter notebook containing the extraction code (version 1.1) |
Script guide.pdf | A documented PDF explaining how the script works |
script_result.png | Image showing script result when latest notebook is run (truncated for fitting) |
Python 3 kernel notebook or supporting IDE (recommended: Jupyter Labs)