ferguson-grand-jury-transcripts

We have converted the 5000 page Ferguson Grand Jury transcript PDF into something more readable. This conversion makes the transcript searchable and linkable. We hope these features add to the national discussion about institutional racism and police brutality. We're using the SayIt service by MySociety.

Conversion

The Ferguson Grand Just testimony PDF is 5000 pages long. It was transcribed over 100 days by different people, so the formatting is all over the place. There is also a lot of redacted content. We've done the best we can to get these transcripts up and live. Done is better than perfect. Compare anything that looks weird to the official released PDF.

How You Can Help

Share with your local media
Find and share the most important parts of the transcripts.
Review the transcripts and write down any issues.
Improve our converter by writing code to fixing issues

Tasks

OCR the transcripts
Use pdf2txt.py to convert PDF to XML
Use parse_transcript_xml.py to convert XML to formatted txt
Use converted_text_to_akoma_ntoso.py script to convert the new formatted text to Akoma Ntoso
Upload to SayIt
Promote to media and activists

Raw Transcripts

http://graphics8.nytimes.com/newsgraphics/2014/11/24/ferguson-assets/grand-jury-testimony.pdf

OCR Transcripts

https://www.dropbox.com/s/67unqhdrb8jhgr0/Ferguson%20Grand%20Jury%20Testimony.pdf?dl=0

PDF to text script

parse_transcript_xml.py is a script that parses the XML files created by running pdf2txt.py (PDFMiner) on the OCR Transcripts.

The XML files generated by PDFMiner preserve formatting information which we can use to more accurately identify text attributed to individuals whose names have been redacted from the transcript, by examining text indentations.

The XML file of the full transcript can be generated with the command: pdf2txt.py -o files/ferguson_grand_jury_testimony.xml Ferguson\ Grand\ Jury\ Testimony.pdf

Final Transcripts

http://ferguson.sayit.mysociety.org/

Twitter Discussion

https://twitter.com/steiny/status/537297171255943168

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
files		files
README.md		README.md
civic.json		civic.json
converted_text_to_akoma_ntoso.py		converted_text_to_akoma_ntoso.py
parse_transcript_xml.py		parse_transcript_xml.py
volume.py		volume.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ferguson-grand-jury-transcripts

Conversion

How You Can Help

Tasks

Raw Transcripts

OCR Transcripts

PDF to text script

Final Transcripts

Twitter Discussion

About

Releases

Packages

Contributors 2

Languages

ondrae/ferguson-grand-jury-transcripts

Folders and files

Latest commit

History

Repository files navigation

ferguson-grand-jury-transcripts

Conversion

How You Can Help

Tasks

Raw Transcripts

OCR Transcripts

PDF to text script

Final Transcripts

Twitter Discussion

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages