Skip to content

Commit

Permalink
llms
Browse files Browse the repository at this point in the history
  • Loading branch information
fscelliott committed Feb 28, 2025
1 parent 8912696 commit ecb85b4
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 2 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ Parameters
| llmEngine | object | Configures the LLM model to which Sensible submits the full prompt. Contains the following parameters:<br/><br/>`provider`: Contains the following options:<br/> - `open-ai` (default): Sensible uses OpenAI's models. <br/> - `anthropic`: Sensible uses Anthropic's models. Select this option to troubleshoot situations in which Sensible correctly identifies the part of the document that contains the answers to your prompts, but the LLM's answer contains problems. For example, Sensible returns an LLM error because the answer isn't properly formatted, or the LLM doesn't follow instructions in your prompt. <br/><br/>`mode`: Contains the following options<br/>- `fast` (default): Sensible uses a faster LLM model (GPT-4o mini or (Claude 3.5 Haiku, depending on the provider) . <br/>- `thorough`: Sensible uses a slower and more powerful LLM model (GPT-4o or Claude 3.5 Sonnet, depending on the provider). Sensible can take several minutes to return the list. Use this option if the Fast parameter results in incomplete extractions for multi-page lists.<br/>- `long`: Sensible uses a faster LLM model (GPT-4o mini or Claude 3.5 Haiku, depending on the provider). If you set this value, then Sensible can output lists extracted from up to 100 potentially nonconsecutive, 1-page source [chunks](doc:list#notes). Otherwise Sensible by default extracts from 20 1-page chunks. If the list in the document is longer than the number of source chunks, Sensible truncates the list.<br/>For more information, see [Notes](#notes). | If you set the Mode parameter to Long, then Sensible sets the Chunk Count parameter to 100. |
| singleLLMCompletion | boolean. default: false | If Sensible returns incomplete or duplicate results in a list that's under ~20 pages long, set this parameter to True to troubleshoot.<br/> If true, Sensible concatenates the top-scoring chunks into a single batch to send as context to the LLM, instead of batching calls to the LLM. By avoiding splitting context, you can avoid problems such as the LLM failing to recognize the end of one list and the start of another. See Notes for more information.<br/><br/>- The following limits apply:<br/>&nbsp;&nbsp; - If the extracted list exceeds the output limit for a single API response from an LLM engine (~16k tokens or ~20 pages for all supported LLM engines), Sensible truncates the list.<br/>&nbsp;&nbsp;- Rarely, Sensible fails to truncate a list at ~16k token output and returns an error. This can happen if the source text in the [context](doc:list#notes) has a very small font size. To avoid this type of error, set this parameter to false. | Don't set this parameter to true if you set the LLM Engine parameter to Long. The Single LLM Completion parameter limits output to ~16k tokens, so it'll truncate longer lists. |
| | | ***FIND CONTEXT*** | |
| source_ids | array of field IDs in the current config | If specified, prompts an LLM to extract data from another field's output. For example, if you extract a field `_checking_transactions` and specify it in this parameter, then Sensible answers the prompts `rank deposits by size` and `reformat withdrawals with a minus sign if they're formatted with parentheses` using `_checking_transactions` rather than searching the whole document to locate the [context](doc:prompt#notes). Note that the `_checking_transactions` field must precede the `transactions_frequencies` field in the fields array in this example. <br/>Use this parameter to:<br/>- reformat or otherwise transform the outputs of other fields. <br/> - narrow down the [context](doc:prompt#notes) for your prompts to a specific part of the document. <br/>- troubleshoot or simplify complex prompts that aren't performing reliably. Break the prompt into several simpler parts, and chain them together using successive Source Ids in the fields array. <br/>For an example, see [Examples](doc:list#example-extract-data-from-other-fields). | If you configure this parameter, then generally don't configure: <br/>- LLM Engine parameter<br/>- Single LLM Completion parameter<br/><br/>- If you configure this parameter, Sensible doesn't allow you to specify the following parameters:<br/>- Context Description parameter<br/>- Multimodal Engine parameter <br/>- Chunk Scoring Text parameter<br/>- Search By Summarization parameter<br/>- Page Hinting parameter<br/>- Chunk Count parameter<br/>- Chunk Size parameter<br/>- Chunk Overlap Percentage parameter<br/>- Page Range parameter |
| source_ids | array of field IDs in the current config | If specified, prompts an LLM to extract data from another field's output. For example, if you extract a field `_checking_transactions` and specify it in this parameter, then Sensible answers the prompts `rank deposits by size` and `reformat withdrawals with a minus sign if they're formatted with parentheses` using `_checking_transactions` rather than searching the whole document to locate the [context](doc:prompt#notes). Note that the `_checking_transactions` field must precede the `transactions_frequencies` field in the fields array in this example. <br/>Use this parameter to:<br/>- reformat or otherwise transform the outputs of other fields. <br/> - narrow down the [context](doc:prompt#notes) for your prompts to a specific part of the document. <br/>- troubleshoot or simplify complex prompts that aren't performing reliably. Break the prompt into several simpler parts, and chain them together using successive Source Ids in the fields array. <br/>For an example, see [Examples](doc:list#example-extract-data-from-other-fields). | If you configure this parameter, then generally don't configure: <br/>- LLM Engine parameter<br/>- Single LLM Completion parameter<br/><br/>- If you configure this parameter, Sensible doesn't support the following parameters:<br/>- Anchor parameter in the field<br/>- Context Description parameter<br/>- Multimodal Engine parameter <br/>- Chunk Scoring Text parameter<br/>- Search By Summarization parameter<br/>- Page Hinting parameter<br/>- Chunk Count parameter<br/>- Chunk Size parameter<br/>- Chunk Overlap Percentage parameter<br/>- Page Range parameter |
| searchBySummarization | boolean. default: false | Set this to true to troubleshoot situations in which Sensible misidentifies the part of the document that contains the answers to your prompts. <br/>This parameter is compatible with documents up to 1,280 pages long.<br/>When true, Sensible uses a [completion-only retrieval-augmented generation (RAG) strategy](https://www.sensible.so/blog/embeddings-vs-completions-only-rag): Sensible prompts an LLM to summarize each page in the document, prompts a second LLM to return the pages most relevant to your prompt based on the summaries, and extracts the answers to your prompts from those pages. | If you set this parameter to true, then Sensible sets the following for chunk-related parameters and ignores any configured values:<br/><br/>- Chunk Size parameter: 1<br/>- Chunk Overlap Percentage parameter: 0<br/>- Chunk Count parameter: 5 |
| | | ***GLOBAL PARAMETERS*** | |
| contextDescription | | For information about this parameter, see [Advanced LLM prompt configuration](doc:prompt#parameters) | |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ Parameters
| id (**required**) | `queryGroup` | | |
| queries | array of objects | An array of query objects, where each extracts a single fact and outputs a single field. Each query contains the following parameters:<br/>`id` (**required**) - The ID for the extracted field. <br/>`description` (**required**) - A free-text question about information in the document. For example, `"what's the policy period?"` or `"what's the client's first and last name?"`. For more information about how to write questions (or "prompts"), see [Query Group](https://docs.sensible.so/docs/query-group-tips) extraction tips. | |
| | | ***FIND CONTEXT*** | |
| source_ids | array of field IDs in the current config | If specified, prompts an LLM to extract data from another field's output. For example, if you extract a field `_checking_transactions` and specify it in this parameter, then Sensible searches for the answer to `what is the largest transaction?` in `_checking_transactions`, rather than searching the whole document to locate the [context](doc:prompt#notes). Note that the `_checking_transactions` field must precede the `largest_transaction` field in the fields array in this example. <br/><br/>Use this parameter to:<br/> - narrow down the [context](doc:prompt#notes) for your prompts to a specific part of the document. <br/>- reformat or otherwise transform the outputs of other fields. For example, you can use this as an alternative to types such as the [Compose](doc:types#compose) type with prompts such as `if the context includes a date, return it in mm/dd/yyy format`.<br/>- troubleshoot or simplify complex prompts that aren't performing reliably. Break the prompt into several simpler parts, and chain them together using successive Source ID parameters in the fields array. <br/>To extract repeating data, such as a list, specify the Source Ids parameter for the [List](doc:list#parameters) method rather than for the Query Group method. <br/><br/>For an example, see [Examples](doc:query-group#example-transform-fields). | If you configure this parameter, then:<br/><br/>- In the field, Sensible doesn't support the Anchor parameter.<br/><br/> In the query group:<br/>- Sensible doesn't support confidence signals.<br/>- Sensible doesn't allow you to specify the following parameters:<br/><br/>- Context Description parameter<br/>- Multimodal Engine parameter <br/>- Chunk Scoring Text parameter<br/>- Search By Summarization parameter<br/>- Page Hinting parameter<br/>- Chunk Count parameter<br/>- Chunk Size parameter<br/>- Chunk Overlap Percentage parameter<br/>- Page Range parameter |
| source_ids | array of field IDs in the current config | If specified, prompts an LLM to extract data from another field's output. For example, if you extract a field `_checking_transactions` and specify it in this parameter, then Sensible searches for the answer to `what is the largest transaction?` in `_checking_transactions`, rather than searching the whole document to locate the [context](doc:prompt#notes). Note that the `_checking_transactions` field must precede the `largest_transaction` field in the fields array in this example. <br/><br/>Use this parameter to:<br/> - narrow down the [context](doc:prompt#notes) for your prompts to a specific part of the document. <br/>- reformat or otherwise transform the outputs of other fields. For example, you can use this as an alternative to types such as the [Compose](doc:types#compose) type with prompts such as `if the context includes a date, return it in mm/dd/yyy format`.<br/>- troubleshoot or simplify complex prompts that aren't performing reliably. Break the prompt into several simpler parts, and chain them together using successive Source ID parameters in the fields array. <br/>To extract repeating data, such as a list, specify the Source Ids parameter for the [List](doc:list#parameters) method rather than for the Query Group method. <br/><br/>For an example, see [Examples](doc:query-group#example-transform-fields). | If you configure this parameter, then the following parameters aren't supported:<br/>- Anchor parameter in the field<br/>- Confidence Signals<br/>- Context Description parameter<br/>- Multimodal Engine parameter <br/>- Chunk Scoring Text parameter<br/>- Search By Summarization parameter<br/>- Page Hinting parameter<br/>- Chunk Count parameter<br/>- Chunk Size parameter<br/>- Chunk Overlap Percentage parameter<br/>- Page Range parameter |
| chunkScoringText | string | Use this parameter to narrow down the page location of the answer to your prompt. For details about context and chunks, see the Notes section.<br/>A representative snippet of text from the part of the document where you expect to find the answer to your prompt. For example, if your prompt has multiple candidate answers, and the correct answer is located near unique or distinctive text that's difficult to incorporate into your question, then specify the distinctive text in this parameter.<br/>If specified, Sensible uses this text to score chunks' relevancy. If unspecified, Sensible uses the prompt to score chunks.<br/>Sensible recommends that the snippet is specific to the target chunk, semantically similar to the chunk, and structurally similar to the chunk. <br/>For example, if the chunk contains a street address formatted with newlines, then provide a snippet with an example street address that contains newlines, like `123 Main Street\nLondon, England`. If the chunk contains a street address in a free-text paragraph, then provide an unformatted street address in the snippet. | If you set the Search By Summarization parameter to true, Sensible ignores any configured value for this parameter for the queries in the group. |
| searchBySummarization | boolean. default: false | Set this to true to troubleshoot situations in which Sensible misidentifies the part of the document that contains the answers to your prompts. <br/>This parameter is compatible with documents up to 1,280 pages long.<br/>When true, Sensible uses a [completion-only retrieval-augmented generation (RAG) strategy](https://www.sensible.so/blog/embeddings-vs-completions-only-rag): Sensible prompts an LLM to summarize each page in the document, prompts a second LLM to return the pages most relevant to your prompt based on the summaries, and extracts the answers to your prompts from those pages. | If you set this parameter to true, then Sensible sets the following for chunk-related parameters and ignores any configured values:<br/><br/>- Chunk Size parameter: 1<br/>- Chunk Overlap Percentage parameter: 0<br/>- Chunk Count parameter: 5 <br/>- Chunk Scoring Text parameter<br/> |
| | | ***EXTRACT FROM IMAGES*** | |
Expand Down

0 comments on commit ecb85b4

Please # to comment.