Skip to content

PictoDocReader Wiki

Abhijeet Praveen edited this page Jan 23, 2022 · 2 revisions

What Inspired us

Due to COVID, many students like us have become accustomed to work on their schoolwork, projects and even hackathons remotely. This led students to use online resources at their disposal in order to facilitate their workload at home. One of the tools most used is “ctrl+f” which enables the user to quickly locate any text within a document. Thus, we came to a realisation that no such accurate method exists for images. This led to the birth of our project titled “PictoDocReader”.

What we learned

We learned how to implement Dash in order to create a seamless user-interface for Python. We further learnt several 2D and 3D pattern matching algorithms such as, Knuth-Morris-Pratt, Bird Baker, Karp and Rabin and Aho-Corasick. However, only implemented the ones that led to the fastest and most accurate execution of the code. Furthermore, we learnt how to convert PDFs to images (.png). This led to us learning about the colour profiles of images and how to manipulate the RGB values of any image using the numpy library along with matplotlib. We also learnt how to implement Threading in Python in order to run tasks simultaneously. We also learnt how to use Google Cloud services in order to use Google Cloud Storage to enable users to store their images and documents on the cloud.

How we built your project

The only dependencies we required to create the project were PIL, matplotlib, numpy, dash and Google Cloud.

PIL - Used for converting a PDF file to a list of .png files and manipulating the colour profiles of an image.

matplotlib - To plot and convert an image to its corresponding matrix of RGB values.

numpy - Used for data manipulation on RGB matrices.

dash - Used to create an easy to use and seamless user-interface

Google Cloud - Used to enable users to store their images and documents on the cloud.

All the algorithms and techniques to parse and validate pixels were all programmed by the team members. Allowing us to cover any scenario due to complete independence from any libraries.

Built With

Python version 3.9

matplotlib version 3.4.3

numpy version 1.20.3

PIL version 2020.9.1

dash version 2.0.0

Google Cloud

Challenges we faced

The first challenge we faced was the inconsistency between the different RGB matrices for different documents. While some matrices contained RGB values, others were of the form RGBA. Therefore, this led to inconsistent results when we were traversing the matrices. The problem was solved using the slicing function from the numpy library in order to make every matrix uniform in size.

Trying to research best time complexity for 2d and 3d pattern matching algorithms. Most algorithms were designed for square images and square shaped documents. While we were working with any sized images and documents. Thus, we had to experiment and alter the algorithms to ensure they worked best for our application.

When we worked with large PDF files, the program tried to locate the image in each page one by one. Thus, we needed to shorten the time for PDFs to be fully scanned to make sure our application performs its tasks in a viable time period. Hence, we introduced threading into the project to reduce the scanning time when working with large PDF files as each page was scanned simultaneously. Although we have come to the realisation that threading is not ideal as the multi-processing greatly depends on the number of CPU cores of the user’s system. In an ideal world we would implement parallel processing instead of threading.

Clone this wiki locally