Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Improve agenda review and add custom LLM support #21

Merged
merged 15 commits into from
Apr 17, 2024
Merged

Improve agenda review and add custom LLM support #21

merged 15 commits into from
Apr 17, 2024

Conversation

bakaburg1
Copy link
Owner

@bakaburg1 bakaburg1 commented Apr 17, 2024

Enhancements

  • Improve user interaction for agenda review: the speech_to_summary_workflow function now displays the generated agenda in the console, facilitating review and reducing the need to locate the agenda file separately. (Commit: 3bed1cc).
  • Add support for custom language model endpoints: replace use_local_llm() with use_custom_llm() to send requests to custom (local or remote) language model endpoints compatible with the OpenAI API specification, allowing to use also API Keys. (Commit: 0fdfe57).
  • Dynamic output_length parameter as default: dynamically set summarization_output_length in the workflow function based on whether a multipart summary is generated (shorter outputs) or not (longer outputs), optimizing the readability and relevance of summaries. (Commit: 2511287).

Fixes

  • Fix output_length usage: output_length was not being considered in the summarization process. (Commit: 08e95d1).
  • Fix agenda file validation: update the validation logic for the 'agenda' argument in the speech_to_summary_workflow function to handle character type inputs correctly and provide clearer error messages. (Commit: d200a55).
  • Fix agenda validation: add checks for empty agenda elements and missing required items, improve error messages for invalid time formats, and update tests. (Commit: 6d870ee).

Documentation

  • Fix messages typos: correct typos in messages. (Commit: 0fdfe57).

Summary

This pull request focuses on improving the user experience and adding support for custom language model endpoints. It enhances the agenda review process, ensures dynamic output length adjustment, fixes agenda validation, and adds documentation typo corrections.

Summary by CodeRabbit

  • New Features

    • Enhanced interrogate_llm function to improve error handling and added generation speed calculation.
    • Updated summarization functions to handle variable output lengths and display summarization approach.
    • Introduced use_custom_llm function allowing integration with custom language model endpoints, including optional API key support.
  • Bug Fixes

    • Corrected minor typo in the use_openai_llm function description.
  • Documentation

    • Updated documentation to reflect dynamic settings for summarization output based on conditions.
    • Added detailed documentation for the new use_custom_llm function.
  • Refactor

    • Renamed use_local_llm to use_custom_llm and enhanced its functionality.
    • Improved validation logic across various functions to handle different scenarios more effectively.

This change allows user to provide custom llm calling functions
- Modified the `tictoc::toc()` function call to store the elapsed time in a variable named `elapsed`.
- Added a new line to calculate and display the generation speed in tokens per second.
This commit introduces a new function `use_custom_llm()` that allows users to send requests to custom (local or remote) language model endpoints compatible with the OpenAI API specification. The function accepts an API key as an optional argument for services that require it. This function replaces `use_local_llm()` which didn't allow to use API keys
- Add checks for empty agenda elements and missing required items
- Convert some stop statements to warning statements
- Improve error messages for invalid time formats
- Update tests to reflect changes
Added a call to set_prompts() at the beginning of the entity_extractor function to ensure that the prompts are initialized before processing the text. This change ensures that any customizations to the prompts are applied correctly throughout the text processing workflow.
- Modified the `speech_to_summary_workflow` function to dynamically set the `summarization_output_length` based on whether a multipart summary is being generated. This change ensures that for multipart summaries, the output length per part is set to 1 page, while for single summaries it remains at 3 pages, optimizing the readability and relevance of generated summaries.
The validation logic for the 'agenda' argument in the speech_to_summary_workflow function has been updated to handle character type inputs correctly. Previously, the function stopped if the 'agenda' list had more than one element, which was not handling cases where 'agenda' was a character vector representing file paths. The updated condition now checks if 'agenda' is a character vector and has more than one element, providing a clearer error message that only one agenda file can be provided.
The speech_to_summary_workflow function now explicitly prompts the user to review the generated agenda by displaying it in the console. This enhancement aims to facilitate the review process by providing immediate visibility of the agenda content, reducing the need for users to manually locate and open the agenda file separately.
Improve error handling in LLM calls by checking for different error message formats and extracting the error message accordingly.
@bakaburg1 bakaburg1 added bug Something isn't working enhancement New feature or request labels Apr 17, 2024
Copy link
Contributor

coderabbitai bot commented Apr 17, 2024

Walkthrough

The recent updates encompass refining error handling, enhancing customization options for language model endpoints, and improving the user interface in R functions. Changes include better validation mechanisms, dynamic parameter adjustments, and clearer user messages. These modifications aim to provide a more robust, user-friendly, and flexible toolset for handling language models and data processing tasks in R.

Changes

Files Change Summary
R/LLM_calls.R Enhanced error handling, added custom LLM endpoint support, and improved time tracking.
R/data_management.R Updated summarization output handling and user feedback.
R/summarization.R Included output_length in function arguments and added prompt initialization.
R/validation.R Improved validation logic and user warnings.
man/..._workflow.Rd, man/use_custom_llm.Rd, man/use_openai_llm.Rd, man/validate_agenda.Rd Updated documentation to reflect changes in functions and added new guides for custom LLM usage.

Poem

🐇✨
In the land of code and script,
A rabbit hopped and tweaks it flipped.
Errors tamed and features new,
Paths to explore, in R we drew.
Hop along, the changes gleam,
In the coder's crafted dream. 🌟📜


Recent Review Details

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 2cb1bd5 and d8c5a50.
Files selected for processing (1)
  • NEWS.md (1 hunks)
Additional comments not posted (4)
NEWS.md (4)

5-8: The enhancements section clearly outlines the new features and improvements with specific commit references, enhancing traceability.


10-13: The fixes section effectively addresses key issues with detailed commit references, which is good for maintaining a clear history of changes.


15-16: Documentation updates are succinctly described, focusing on improving clarity and correctness in user-facing messages.


18-19: The summary effectively encapsulates the main enhancements and fixes introduced in this release, providing a clear overview of the PR's impact.


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

DESCRIPTION Show resolved Hide resolved
R/validation.R Show resolved Hide resolved
@@ -191,7 +191,7 @@ minutes if the "rolling" method is used. See \code{summarise_transcript} for
more details.}

\item{summarization_output_length}{An indication to the LLM regarding the
length of the output. See \code{summarise_transcript} for more details.}
length of the output in pages. See \code{summarise_transcript} for more details.}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tip

Codebase Verification

The expanded search in the Markdown files has successfully located a mention of the feature related to specifying the output length in terms of pages within the README.md file. This confirms that the feature is documented, albeit not in a formal "release notes" section, but rather as part of the general documentation in the README.

Given this information, it appears that the feature is indeed documented, though perhaps not in the specific location expected by the original review comment. This suggests that while the feature is mentioned, it might not be highlighted in a dedicated section for release updates, which could affect how users are informed about new features.

Result:
The feature regarding the output length specification in pages is documented in the README.md file. However, it is not explicitly mentioned in a section labeled as "release notes." It would be beneficial to ensure that such significant features are also highlighted in the release notes for better visibility and clarity for the users.

Analysis chain

The documentation now specifies the output length in pages, which aligns with the dynamic output length feature. This change should be clearly communicated in the release notes to inform users about the new output format.

Scripts Executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Check if the release notes mention the new output length specification.
rg --type md 'output length in pages'

Length of output: 39



Script:

#!/bin/bash
# Broaden the search to include variations and check if the feature is mentioned in any Markdown files.
rg --type md -i 'output length|pages' -C 3

Length of output: 516

@bakaburg1 bakaburg1 merged commit 8fe8e60 into main Apr 17, 2024
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant