GitHub - wearethoughtfox/amnesty-urgent-action-pdfs: Parse PDFs and display as HTML

The brief

In the wilds of .org these would need to be extracted and displayed initially on the pages below through if possible looking for and converting the first listed PDF URL in the select box or if single language the the PDF linked to in the button. It would need to happen during the page load process without excessive delay. (UA's are quite small files)

The script requires pdftotext and pdftotextjs on the machine running the code. Installed pdftotext via brew install xpdf.
In the NodeJS script we have defined a set of labels based on the sample PDF provided.
We need to know all possible labels, but not all labels need to be in all PDFs.
The script uses pdftotext through the pdftotextjs wrapper to convert the sample PDF to a string.
Then it looks through that string for all the labels defined and splits the string up into separate key value pairs based on the labels.
It outputs these as an object.
This object is written to a file
This file is loaded into index.html
It populates a simple template with minimal styling as a proof of concept

targetContactDetails doesn’t have any line breaks for some reason
The page numbers “1” and “2” are shown when perhaps we don’t need them.
Marking up links to https://www.amnesty.org/en/documents/ in some way but this gets into the way you display the data.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
css		css
javascripts		javascripts
sample		sample
.gitignore		.gitignore
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
pdf2json-test.js		pdf2json-test.js
pdf2json-test.json		pdf2json-test.json
pdfoutput.json		pdfoutput.json
pdftotext-alternative.js		pdftotext-alternative.js
pdftotext-test.js		pdftotext-test.js
readme.md		readme.md