-
Notifications
You must be signed in to change notification settings - Fork 55
Implement Bentley-Ottmann algorithm for detecting grids #16
Comments
When ready, we should replace @jeremybmerrill 's TableGuesser.find_tables with Bentley-Ottman. |
Instead of implementing Bentley, let's do it naively (O(n^2)) and see what happens |
See this algorithm in Anssi Nurminen's master thesis:
|
cf this too http://www.drdobbs.com/database/the-maximal-rectangle-problem/184410529 and "convex hulls" Whatever I/we write that more efficiently transforms a collection of lines into (a) the tables and (b) their constituent cells should take the place of TableGuesser.find_tables and Spreadsheet.new |
I would note that it might be better to detect cells (i.e. minimal rectangles) first, then piece together a Spreadsheet from that. We could then more elegantly deal with weirdly shaped tables. Started a little bit implementing the nurminen algorithm in |
awesome |
Nurminen algo implemented in |
WHOOOOOOO! Manuel Aristarán On Tue, Dec 3, 2013 at 12:25 AM, Jeremy B. Merrill <notifications@github.com
|
Oh, I realized I still have to write the algorithm to transform cells (minimal rectangles) into Spreadsheets (maximal rectangles or just Areas ). Do you know of any smart implementations of this anywhere? Otherwise I'll just write something naive that hopefully works. |
Probably gonna implement this one: http://stackoverflow.com/questions/13746284/merging-multiple-adjacent-rectangles-into-one-polygon but may also refer to: http://www.cs.mcgill.ca/~cs644/Godfried/2005/Fall/sdroui9/p4_algorithm.html |
See more details in the commit: ccbf671 |
Oh, and rectalgo is obsolete and can be deleted. It's merged into spreadsheets |
Awesome, Jeremy. Thanks! Can you write a short test method when you have the chance? (I'd like to figure out how to implement the spreadsheet extractor in the UI workflow) |
Yeah, totally. I'll write the test tomorrow when I get to work. I think it'll work pretty well everywhere, but my tests so far are pretty limited. As far as actually using it is concerned, most of the work is done for free. There's efficiency gains to be had (e.g. lazy table-extraction, etc.), but the simplest script looks just like this:
|
BTW, O'Rourke's algorithm is beautiful. |
that McGill demo, in particular, is pretty cool. today was one of those days when I had Ruby, Java, JRuby and Python docs open all at once trying to convert the Python example into (J)Ruby. As I said, the implementation is a little hairy because of the nearly-intersecting-lines problem, but, hey, it works for now. |
Aren't those the best days? ;) |
Idea: now that we're getting serious with computational geometry algos, we should start to consider snap rounding for lines. |
This'd fix #38? On Tue, Dec 3, 2013 at 11:08 PM, Manuel Aristarán
|
It might, not sure. Also, if we're going to snap-round lines we need to determine the size of the "pixel" first. If I'm not mistaken, an upper bound to that value would be the (average?) witdh of the lines present in our area of interest. We would need to track that value, we don't do it now. |
Nope. That would be a good reason to retain the old stuff in bin/tabula.... :P |
LOL. Ok, so I guess we should have a |
yeah working on that now |
better now... fe88cb4 didn't write the heuristic. I haven't profiled the spreadsheets stuff; if it's quick enough, we can just do all/most of that work, see if there are a handful of Spreadsheet objects that were detected, and if so, use the SpreadsheetExtractor method. |
I'm going to merge What do you think? |
Fine by me. Only objection is that |
Aight, new merge candidate is now Closing this one. |
If we detect that a set of horizontal and vertical lines form a grid, we can construct the set of cells using Bentley-Ottmann.
See:
The text was updated successfully, but these errors were encountered: