Make deskewing efficient+robust, and add orientation #84

bertsky · 2021-02-12T23:07:22Z

As outlined a while ago,

Lines 111 to 118 in c3fad1a

    
           # TODO: make zoomable, i.e. interpolate down to max 300 DPI to be faster 
        
           # TODO: sweep through angles very coarse, then hill climbing for precision 
        
           # TODO: try with shear (i.e. simply numpy shift) instead of true rotation 
        
           # TODO: use square of difference instead of variance as projection score 
        
           #       (more reliable with background noise or multi-column) 
        
           # TODO: offer flip (90°) test (comparing length-normalized projection profiles) 
        
           # TODO: offer mirror (180°, or + vs - 90°) test based on ascender/descender signal 
        
           #       (Latin scripts have more ascenders e.g. bhdkltſ than descenders e.g. qgpyj)

there are plenty of opportunities to improve ocrd-cis-ocropy-deskew:

downscale during estimation when pixel density (or resolution) is large
use hill-climbing approach with increasing precision (and interpretable performance/quality trade-off parameter) to find best angle (instead of exhaustive linear sweep)
approximate expensive rotation by cheap shear operation during estimation
use square of difference between rows of projection profile as score (instead of variance) to be more robust against noise and non-aligning text columns
clip/delete extremely large connected components during estimation to be more robust against separators, images and borders
also detect orientation (³) (multiples of 90°) by:
- comparing horizontal and vertical projection profiles after length normalization: if the (best) vertical profile scores much better than the (best) horizontal profile, then the image needs to be reflected by 90° – because straight pages' text lines align horizontally, causing maximal fg/bg variance in the horizontal profile (¹)
- comparing the steepness of the foreground/background transition flanks: if (going from "top to bottom") the gradient is (sufficiently) larger from bg to fg than from fg to bg, then the image needs to be reflected by 180° – because ascenders are much more frequent than descenders in Latin-based scripts, and thus the gradient above the line is supposed to be less steep than below (²)

This would of course have to be inverted for vertical text like traditional Chinese or Japanese
This might of course not work for other scripts. Preliminary OCR might be the only choice there.
Both steps can be combined: -90° = 90°+180°

The text was updated successfully, but these errors were encountered:

bertsky · 2021-02-13T13:46:39Z

One obstacle on that road is sufficient ground truth data for testing, though.

Does anyone know of a suitable dataset (with differently oriented regions or pages, and/or with skewed regions or pages), correctly represented in PAGE-XML or ALTO-XML?

Looking at PRImA's Layout Analysis Dataset for reference, there seem to be serious issues with the very definition for PAGE-XML, too.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make deskewing efficient+robust, and add orientation #84

Make deskewing efficient+robust, and add orientation #84

bertsky commented Feb 12, 2021 •

edited

Loading

bertsky commented Feb 13, 2021 •

edited

Loading

Make deskewing efficient+robust, and add orientation #84

Make deskewing efficient+robust, and add orientation #84

Comments

bertsky commented Feb 12, 2021 • edited Loading

bertsky commented Feb 13, 2021 • edited Loading

bertsky commented Feb 12, 2021 •

edited

Loading

bertsky commented Feb 13, 2021 •

edited

Loading