Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Make deskewing efficient+robust, and add orientation #84

Open
bertsky opened this issue Feb 12, 2021 · 1 comment
Open

Make deskewing efficient+robust, and add orientation #84

bertsky opened this issue Feb 12, 2021 · 1 comment

Comments

@bertsky
Copy link
Collaborator

bertsky commented Feb 12, 2021

As outlined a while ago,

# TODO: make zoomable, i.e. interpolate down to max 300 DPI to be faster
# TODO: sweep through angles very coarse, then hill climbing for precision
# TODO: try with shear (i.e. simply numpy shift) instead of true rotation
# TODO: use square of difference instead of variance as projection score
# (more reliable with background noise or multi-column)
# TODO: offer flip (90°) test (comparing length-normalized projection profiles)
# TODO: offer mirror (180°, or + vs - 90°) test based on ascender/descender signal
# (Latin scripts have more ascenders e.g. bhdkltſ than descenders e.g. qgpyj)

there are plenty of opportunities to improve ocrd-cis-ocropy-deskew:

  • downscale during estimation when pixel density (or resolution) is large
  • use hill-climbing approach with increasing precision (and interpretable performance/quality trade-off parameter) to find best angle (instead of exhaustive linear sweep)
  • approximate expensive rotation by cheap shear operation during estimation
  • use square of difference between rows of projection profile as score (instead of variance) to be more robust against noise and non-aligning text columns
  • clip/delete extremely large connected components during estimation to be more robust against separators, images and borders
  • also detect orientation (³) (multiples of 90°) by:
    • comparing horizontal and vertical projection profiles after length normalization: if the (best) vertical profile scores much better than the (best) horizontal profile, then the image needs to be reflected by 90° – because straight pages' text lines align horizontally, causing maximal fg/bg variance in the horizontal profile (¹)
    • comparing the steepness of the foreground/background transition flanks: if (going from "top to bottom") the gradient is (sufficiently) larger from bg to fg than from fg to bg, then the image needs to be reflected by 180° – because ascenders are much more frequent than descenders in Latin-based scripts, and thus the gradient above the line is supposed to be less steep than below (²)
  1. This would of course have to be inverted for vertical text like traditional Chinese or Japanese
  2. This might of course not work for other scripts. Preliminary OCR might be the only choice there.
  3. Both steps can be combined: -90° = 90°+180°
@bertsky
Copy link
Collaborator Author

bertsky commented Feb 13, 2021

One obstacle on that road is sufficient ground truth data for testing, though.

Does anyone know of a suitable dataset (with differently oriented regions or pages, and/or with skewed regions or pages), correctly represented in PAGE-XML or ALTO-XML?

Looking at PRImA's Layout Analysis Dataset for reference, there seem to be serious issues with the very definition for PAGE-XML, too.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant