docs: update warning on early truncation

huggingface · Feb 11, 2025 · 892feda · 892feda
1 parent 0db38f1
commit 892feda
Showing 1 changed file with 1 addition and 1 deletion.
diff --git a/docs/source/text_environments.md b/docs/source/text_environments.md
@@ -114,7 +114,7 @@ Let's decompose the settings:
 | `max_tool_response`| The tool response is truncated to this number to avoid running out of model context.|
 | `max_length`       |  The maximum number of tokens to allow in an episode. |
 | `generation_kwargs`| Generation settings used by the language model. |
-| `use_cache` | Cache keys and values between segment generation. Known Limitations: When using caching, [`TextEnvironment`] is not suited for training use, i.e. backpropagation through the generated graph. Use with trl Trainers is of course possible. Furthermore, caching requires, that there be no calculation dependencies between examples at inference time. When using `BatchNorm`, the model should thus be in eval model. Caching is not guaranteed to produce identical results compared to not using caching and you should test for yourself, if it is suited to your needs, model and `generation_kwargs`. In some cases, when a history (including padding) exceeds `max_length` all other histories are truncated as well. Compatibility with Encoder-Decoder architectures is untested. Incompatible with `num_beams`>1 as generation kwarg. `use_cache` may currently be incompatible with torch.compile due to a possible issue in the transformers library's generate method. See this comment in [_get_initial_cache_position](https://github.com/huggingface/transformers/blob/2e752ead46a8845e8a160d2043c1336447895690/src/transformers/generation/utils.py#L1582).|
+| `use_cache` | Cache keys and values between segment generation. Known Limitations: When using caching, [`TextEnvironment`] is not suited for training use, i.e. backpropagation through the generated graph. Use with trl Trainers is of course possible. Furthermore, caching requires, that there be no calculation dependencies between examples at inference time. When using `BatchNorm`, the model should thus be in eval model. Caching is not guaranteed to produce identical results compared to not using caching and you should test for yourself, if it is suited to your needs, model and `generation_kwargs`. In some cases, when a history (including padding) exceeds `max_length` all other histories are truncated as well. This behavior also occurs when not using caching but is exacerbated by the additional padding caused by caching. Compatibility with Encoder-Decoder architectures is untested. Incompatible with `num_beams`>1 as generation kwarg. `use_cache` may currently be incompatible with torch.compile due to a possible issue in the transformers library's generate method. See this comment in [_get_initial_cache_position](https://github.com/huggingface/transformers/blob/2e752ead46a8845e8a160d2043c1336447895690/src/transformers/generation/utils.py#L1582).|
 | `save_logits` | Whether to save logits for the generated tokens in the returned histories. Mainly intended to help the user test caching for their use case. Backpropagation through logits is not supported. |
 
 You can customize the environment to your needs and add custom tools and settings. Let's see how you can use the environment to have the model interact with the available tools!