From 7f7f1a25c1f281edce2317aea9e9aa4fd9929976 Mon Sep 17 00:00:00 2001
From: fscelliott <42477011+fscelliott@users.noreply.github.com>
Date: Fri, 7 Mar 2025 09:48:37 -0700
Subject: [PATCH] multicolumn publish
---
.../2000 - field-query-object/1200 - method.md | 2 +-
.../1300 - document-range.md | 5 ++---
.../1660 - paragraph.md | 10 +++-------
.../5000 - preprocessors/1055 - multicolumn.md | 17 +----------------
4 files changed, 7 insertions(+), 27 deletions(-)
diff --git a/readme-sync/v0/senseml-reference/2000 - field-query-object/1200 - method.md b/readme-sync/v0/senseml-reference/2000 - field-query-object/1200 - method.md
index 7c02fe701..a37425383 100644
--- a/readme-sync/v0/senseml-reference/2000 - field-query-object/1200 - method.md
+++ b/readme-sync/v0/senseml-reference/2000 - field-query-object/1200 - method.md
@@ -27,7 +27,7 @@ The following global parameters are available to all methods:
| typeFilters | array of [Types](doc:types) | Filters out the specified types from the method results. For example, for a target box containing a delivery date, a street address, and delivery notes, you can filter out the lines containing Date and Address types in order to extract the delivery notes. Note that less strict types, such as Name and Currency types, are less useful in this filter than stricter types such as the Phone Number type.
For an example, see the Examples section. |
| wordFilters | string array | Filters out the specified strings from the method results. |
| whitespaceFilter | `spaces`, `all` | Remove extra whitespaces.
`spaces` - remove solely extra spaces.
`all` - remove all whitespace characters, including newlines. |
-| xRangeFilter | object | Defines left and right boundaries in which to capture lines. For example, in combination with the Document Range method, the X Range Filter parameter defines a "column" that's bounded at the top and bottom by text matches. This column excludes any lines that partially fall outside the defined rectangular region. Contains the following parameters:
`start` - `right`,`left` - Defines the starting point of the "column" at either the right or left boundary of the anchor line.
`offsetX` - Adjusts the horizontal position of the starting point defined by the Start parameter.
`width` - The width of the page region to capture, in inches.
For an example, see the Examples section. |
+| xRangeFilter | object | Defines left and right boundaries in which to capture lines. For example, in combination with the Document Range method, the X Range Filter parameter defines a "column" that's bounded at the top and bottom by text matches. This column excludes any lines that partially fall outside the defined rectangular region. Contains the following parameters:
`start` - `right`,`left` - Defines the starting point of the "column" at either the right or left boundary of the anchor line.
`offsetX` - Adjusts the horizontal position of the starting point defined by the Start parameter.
`width` - The width of the page region to capture, in inches.
As an alternative to this parameter, use the [Multicolumn](doc:multicolumn) preprocessor.
For an example, see the Examples section. |
| **(Deprecated)** xMajorSort | boolean | **Deprecated:** Use the Sort Lines parameter instead. |
| sortLines | `readingOrderLeftToRight` | Set this parameter to `readingOrderLeftToRight` to sort lines whose height and vertical position are misaligned. For example, with misaligned handwritten text, slight jitter in the vertical positions of lines can cause Sensible to incorrectly sort lines that a human reader interprets as following left to right. The Sort Lines parameter corrects this problem by sorting lines by their likely reading order. |
diff --git a/readme-sync/v0/senseml-reference/2000 - layout-based-methods/1300 - document-range.md b/readme-sync/v0/senseml-reference/2000 - layout-based-methods/1300 - document-range.md
index c897158a3..9c4ce6999 100644
--- a/readme-sync/v0/senseml-reference/2000 - layout-based-methods/1300 - document-range.md
+++ b/readme-sync/v0/senseml-reference/2000 - layout-based-methods/1300 - document-range.md
@@ -4,6 +4,8 @@ hidden: false
---
Extracts consecutive lines succeeding the anchor line, for example, paragraphs of legal text. For the full definition of "succeeding", see [Line sorting](doc:lines#line-sorting).
+The Document Range method extracts all the text between an upper and a lower bound. To extract text from columns, use this method in combination with the [Multicolumn](doc:multicolumn) preprocessor, or use the [Paragraph](doc:paragraph) method as an alternative.
+
Or, use this method to return the coordinates of regions containing images.
[**Parameters**](doc:document-range#parameters)
@@ -272,8 +274,5 @@ To extract images, set `"includeImages":true` for the Document Range method. Sen
- Extract a partial bitmap defined by the PPI coordinates of the image from the rendered page.
- Encode the bitmap to bytes in the image format of your choice.
-Document range versus paragraphs
-----
-The Document Range method extracts all the text between an upper and a lower bound. If you instead want to extract paragraphs, for example in a two-column format, then use the [Paragraph](doc:paragraph) method.
diff --git a/readme-sync/v0/senseml-reference/2000 - layout-based-methods/1660 - paragraph.md b/readme-sync/v0/senseml-reference/2000 - layout-based-methods/1660 - paragraph.md
index bdf6e0648..5e6466d8b 100644
--- a/readme-sync/v0/senseml-reference/2000 - layout-based-methods/1660 - paragraph.md
+++ b/readme-sync/v0/senseml-reference/2000 - layout-based-methods/1660 - paragraph.md
@@ -13,9 +13,9 @@ Parameters
**Note:** For additional parameters available for this method, see [Global parameters for methods](doc:method#global-parameters-for-methods). The following table shows parameters most relevant to or specific to this method.
-| key | value | description |
-| ----------------- | ----------- | ----------- |
-| id (**required**) | `paragraph` | |
+| key | value | description |
+| ----------------- | ----------- | ------------------------------------------------------------ |
+| id (**required**) | `paragraph` | This method uses document layout, including columns, to detect paragraphs. To format the extracted paragraph with newlines, use in combination with the [Paragraph](doc:types#paragraph) type. |
Examples
@@ -85,8 +85,4 @@ The following image shows the example document used with this example config:
```
-Notes
-====
-
-This method uses document layout to detect paragraphs. In contrast, the Document Range method extracts all the text between an upper and a lower bound.
diff --git a/readme-sync/v0/senseml-reference/5000 - preprocessors/1055 - multicolumn.md b/readme-sync/v0/senseml-reference/5000 - preprocessors/1055 - multicolumn.md
index d335e523e..d70e58928 100644
--- a/readme-sync/v0/senseml-reference/5000 - preprocessors/1055 - multicolumn.md
+++ b/readme-sync/v0/senseml-reference/5000 - preprocessors/1055 - multicolumn.md
@@ -1,25 +1,10 @@
---
title: "Multicolumn"
-hidden: true
+hidden: false
---
-*TODO -*
-
- *contrast this to:*
-
-- *x range filter*
-- *document range*
-- *paragraph*
-- *readingOrderLeftToRight (? test how it works...shouldn't have an effect)*
-
-*ohh! Add a MULTIMODAL QUERY FOR THE CHARTS!!!*
-
-
-
Use this preprocessor for documents containing columns of text. Ensures that Sensible [sort lines](doc:lines#line-sorting) into columns when present, rather than the default behavior of sorting lines left to right across the page.
-
-
[**Parameters**](doc:multicolumn#parameters)
[**Examples**](doc:multicolumn#examples)
[**Notes**](doc:multicolumn#notes)