-
Notifications
You must be signed in to change notification settings - Fork 0
/
todo.txt
executable file
·101 lines (72 loc) · 5.16 KB
/
todo.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
- Generate report on what was done
- table of illustrations procesed in the form id|fileName|width|caption(line breaks converted to <br/>)
- outline of chapter/sections
id|CHAPTER I
...id|SECTION I
- Add option to squash paragraphs to one line .. not sure this would be useful?
- New feature (maybe a whole new tool in itself) to help resolve -* issues
- generate report on all -* words
- use google ngram api to get statistics on hyphen-ated vs hyphenated
https://github.com/econpy/google-ngrams
http://www.culturomics.org/Resources/get-ngrams
- search for usage withing text to get statistics
- generate report
- autofix?
- Automate section id naming
- Check that --keeporiginal works with all operations (doesnt with --footnotes for example)
- More comprehensive error check on pgdp markup
- Add configuration file to allow for more customizability
- Templates? Could just be config file
- Convert UTF8?
- New application that performs all checks in "make errorcheck" and generates an html report with the full text that was scanned included in the report. All issues are hyperlinked to the line/column in document that they reference. Maybe split view with error report on one side and document on other (like guiguts but in a browser). Have original be editable in browser so corrections could be made as well would be bonus. Another bonus would be to map errors from scanned doc (book-utf8) back into ppgen source file (book-src.txt) and be able to navigate/make corrections directly to source file.
- Generate index?
- Add --fixup option that performs guiguts fixup function:
• Remove spaces at end of line.
• Remove blank lines at end of pages.
• Remove spaces on either side of hyphens.
• Remove space before periods.
• Remove space before exclamation points.
• Remove space before question marks.
• Remove space before commas.
• Remove space before semicolons.
• Remove space before colons.
• Remove space after opening and before closing brackets. () [] {}
• Remove space after open angle quote and before close angle quote.
• Remove space after beginning and before ending double quote.
• Ensure space before ellipses except after period.
• Format any line that contains only 5 *s and whitespace to be the standard 5 asterisk thought break.
• Convert multiple space to single space.
• Fix obvious l<-->1 problems.
You can also specify whether to skip text inside the /* */ markers or not.
- Add support for guiguts style formatting markup /p /f .. /toc? /*[4] /#[6,53] /#[4.6,70]
- Split off general "fixup" type functions that are not ppgen specific into a seperate application?
- Change design so that instead of parsing the whole file per operation do one pass. Change functions to take line instead of file buffer.. (for single line conversions..).. maybe pass group of lines?
- Would be nice to have .li HTML in a separate file for easier edit workflow (edit/refresh), add feature that parses out .li HTML snippets and .de statements and creates an HTML file out of them.
- Add option to convert tables, use guiguts style markup or autodetect (-----+ within /* */). Output .li t (original text in /* */) .li h rst2htmlb, .de statements
- Option to convert TOC, LOI. Autodetect with regex "(.+) {6,}(\d+)" and replace with #\1:Page_\2#|#\2#, or use user supplied .sr style regex.
- Detect diacritics in utf8 conversions and add necesary .cv statements
- Add --autodetect for tables/toc, maybe remove --tables and --toc and just process marked blocks by default, or use generic --processoutoflineformatting
- Add fix for nested [] issue inside [Sidenote (already fixed in [Footnote)
- Footnote anchors in a footnote (footnote to a footnote).. Manual resolution? Unsure how common or difficulty (possible?) to handle programatically
- Add the ability to reorganize ppgen formatted footnotes.
- Add --report=(txt,html,csv) to generate a report on the input file containing:
toc outline
table of footnotes
table of illustrations (in ppgimg too?)
list of proofer notes
markup errors with associated line numbers
list each usage of all <i>, <cite>, <i><lang>, ...
list each usage of all <b>, <sc>, ...
- Recognize .pn 1 type statements r"\.pn (?<!\+)([IXVCL0-9]+)". Map pn to scanpagenum (overkill? is delta enough?). Use this data for TN generation.
- Add option --tngen to generate TN statement from [**] markup.
[**original=>change] becomes
• #Page 018:Page_18#: original → change
[**Footnotes have been renumbered and moved to the end of the book.] becomes
Footnotes have been renumbered and moved to the end of the book.
- Remove -d option (dryrun)? Not so useful now that dp2ppgen does not modify input file.
- Verify that nested OOLF markup is handled properly
- Handle -*</i> .. pb .. *<i> case inside spanned footnotes
- Redesign application with a parse object to reduce all the redundant checks for things like (am I inside OOLF markup block?)
- Add option to automatically relocate *[Illustration *[Sidenote statements
move outside of paragraph (always before?)
leave ppgen comment at original location and another comment at new location