You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# TODO: make zoomable, i.e. interpolate down to max 300 DPI to be faster
# TODO: sweep through angles very coarse, then hill climbing for precision
# TODO: try with shear (i.e. simply numpy shift) instead of true rotation
# TODO: use square of difference instead of variance as projection score
# (more reliable with background noise or multi-column)
# TODO: offer flip (90°) test (comparing length-normalized projection profiles)
# TODO: offer mirror (180°, or + vs - 90°) test based on ascender/descender signal
# (Latin scripts have more ascenders e.g. bhdkltſ than descenders e.g. qgpyj)
there are plenty of opportunities to improve ocrd-cis-ocropy-deskew:
downscale during estimation when pixel density (or resolution) is large
use hill-climbing approach with increasing precision (and interpretable performance/quality trade-off parameter) to find best angle (instead of exhaustive linear sweep)
approximate expensive rotation by cheap shear operation during estimation
use square of difference between rows of projection profile as score (instead of variance) to be more robust against noise and non-aligning text columns
clip/delete extremely large connected components during estimation to be more robust against separators, images and borders
also detect orientation (³) (multiples of 90°) by:
comparing horizontal and vertical projection profiles after length normalization: if the (best) vertical profile scores much better than the (best) horizontal profile, then the image needs to be reflected by 90° – because straight pages' text lines align horizontally, causing maximal fg/bg variance in the horizontal profile (¹)
comparing the steepness of the foreground/background transition flanks: if (going from "top to bottom") the gradient is (sufficiently) larger from bg to fg than from fg to bg, then the image needs to be reflected by 180° – because ascenders are much more frequent than descenders in Latin-based scripts, and thus the gradient above the line is supposed to be less steep than below (²)
This would of course have to be inverted for vertical text like traditional Chinese or Japanese
This might of course not work for other scripts. Preliminary OCR might be the only choice there.
Both steps can be combined: -90° = 90°+180°
The text was updated successfully, but these errors were encountered:
One obstacle on that road is sufficient ground truth data for testing, though.
Does anyone know of a suitable dataset (with differently oriented regions or pages, and/or with skewed regions or pages), correctly represented in PAGE-XML or ALTO-XML?
Looking at PRImA's Layout Analysis Dataset for reference, there seem to be serious issues with the very definition for PAGE-XML, too.
As outlined a while ago,
ocrd_cis/ocrd_cis/ocropy/common.py
Lines 111 to 118 in c3fad1a
there are plenty of opportunities to improve
ocrd-cis-ocropy-deskew
:The text was updated successfully, but these errors were encountered: