help.txt

Welcome to PDFViewer help!

PDFViewer is a GUI written using python3 and tkinter which lets you view PDF and image files.

PDFViewer has a lot to offer. Here is a small description of some of the tools provided by PDFViewer.


Open Files:

The 'Open Files' function allows you to open one or multiple files into PDFViewer.

On clicking the button, you're prompted with a dialog box where you can select all the files you want to open. Once you're done, PDFViewer cycles through each document in the list and let's you view each document.

PDF files are opened directly into the viewer. However, when an image file is opened, Google's OCR, Tesseract, is ran on the image in order to extract readable text. Once this is done, PDFViewer converts the image file into a PDF file containing the extracted text and loads this PDF into the viewer.


Open Directory:

The 'Open Directory' function works in the same way as the 'Open Files' function, but it allows you to open all files inside a particular directory in a single shot.


Clear:

The 'Clear' option lets you clear any modification you made to the current page.


Search Text:

The 'Search Text' option allows you to search for text in the current page.

On clicking the button, you are prompted to enter the text you want to search. PDFViewer then searches the entire page and marks any word that contains the text you searched for.


Extract Text:

The 'Extract Text' option allows you to draw a bounding box on the current page and get the text that you marked with the bounding box.

On clicking the button, the viewer becomes drawable. You're expected to draw a bounding box around the text you want to extract from the page. Once done, click on the 'Extract Text' button again.

On doing so, you will be prompted with the word that's closest to the bounding box you drew.


Run OCR:

The 'Run OCR' option allows you to run Google's Optical Character Reading (OCR) model, Tesseract, on the current PDF document.

This comes in handy when an image file is opened using PDFViewer.

Another scenario where you may have to use the OCR is when a PDF document contains embedded images and you need to extract text from the embedded images.

On clicking the button, Tessearact is ran on the current PDF and the extracted text is stored as another PDF file which you have the option of saving in your system.


Next/Previous File:

The 'Next File' and 'Previous File' options allow you to cycle through the list of files that were opened using PDFViewer.


Viewer:

The PDF Tool Bar gives you functions to manipulate your PDF documents.
- Use the 'Next Page' and 'Previous Page' buttons to cycle through the different pages in your PDF.
- Use the 'Last Page' and 'First Page' buttons to directly go to the pages on the extreme ends.
- Use the 'Zoom In', 'Zoom Out' and 'Fit-To-Screen' buttons to make the current page bigger or smaller.
- Use the 'Rotate' button to rotate PDF pages.