Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

fix evaluation script #1819

Merged
merged 1 commit into from
Feb 24, 2025
Merged

fix evaluation script #1819

merged 1 commit into from
Feb 24, 2025

Conversation

LawyZheng
Copy link
Collaborator

@LawyZheng LawyZheng commented Feb 24, 2025

Important

Add load_dotenv() to load environment variables in evaluation scripts and rename eval_webvoyager_cruise.py to eval_webvoyager_task_v2.py.

  • Environment Setup:
    • Add load_dotenv() to create_webvoyager_evaluation_result.py, create_webvoyager_task_v2.py, create_webvoyager_workflow.py, and eval_webvoyager_task_v2.py to load environment variables.
  • File Rename:
    • Rename eval_webvoyager_cruise.py to eval_webvoyager_task_v2.py for consistency with other scripts.

This description was created by Ellipsis for 143f2fd. It will automatically update as commits are pushed.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Reviewed everything up to 143f2fd in 3 minutes and 2 seconds

More details
  • Looked at 81 lines of code in 4 files
  • Skipped 0 files when reviewing.
  • Skipped posting 12 drafted comments based on config settings.
1. evaluation/script/create_webvoyager_task_v2.py:15
  • Draft comment:
    Remove duplicate dotenv initialization - environment variables are already loaded in the main server entry point.

  • dotenv initialization (main.py)

  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50%
    This script (create_webvoyager_task_v2.py) appears to be a standalone script that can be run independently. The other dotenv initialization is in main.py which is only executed when running the server. These are separate entry points, so both might need their own dotenv initialization. Removing it from this script could break it when run directly.
    The comment assumes this script is only run as part of the server, but it has its own main block suggesting it's meant to be run independently.
    Since this is a standalone script with its own entry point, it needs its own environment initialization to work correctly when run directly.
    The comment is incorrect - this dotenv initialization is necessary for when the script is run independently, and is not actually duplicate functionality.

2. evaluation/script/create_webvoyager_workflow.py:17
  • Draft comment:
    This dotenv initialization is duplicated. Consider importing and using the existing initialization instead.

  • dotenv initialization (main.py)

  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50%
    While it's true that load_dotenv() is called in both files, this isn't necessarily a problem. load_dotenv() is idempotent - calling it multiple times is safe. These are separate entry points in different packages, and each entry point should be responsible for its own environment setup. Having load_dotenv() in each entry point is actually a good practice for modularity.
    Maybe there's a central initialization module that both packages should use? The duplication could lead to inconsistent environment loading if the calls are configured differently.
    The benefits of each entry point being self-contained outweigh the minor duplication. If there were specific configuration issues, those would be worth addressing, but just calling load_dotenv() twice isn't harmful.
    This comment should be deleted. The duplication of load_dotenv() is not problematic and follows good practices for modular code organization.

3. evaluation/script/create_webvoyager_evaluation_result.py:11
  • Draft comment:
    load_dotenv() is added correctly to load environment variables before execution.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50%
    None
4. evaluation/script/create_webvoyager_task_v2.py:15
  • Draft comment:
    load_dotenv() is properly invoked early; consider centralizing dotenv loading if shared across scripts.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50%
    None
5. evaluation/script/create_webvoyager_workflow.py:17
  • Draft comment:
    load_dotenv() usage is appropriate here to ensure env variables are loaded before use.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50%
    None
6. evaluation/script/eval_webvoyager_task_v2.py:12
  • Draft comment:
    load_dotenv() is called at the start, ensuring environment variables are available for subsequent operations.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50%
    None
7. evaluation/script/create_webvoyager_evaluation_result.py:6
  • Draft comment:
    Good use of dotenv – loading environment variables at module level. Consider moving load_dotenv() into the main guard if side effects on import are a concern.
  • Reason this comment was not posted:
    Confidence changes required: 50% <= threshold 50%
    None
8. evaluation/script/create_webvoyager_task_v2.py:7
  • Draft comment:
    Added load_dotenv() to initialize environment variables. Consider centralizing dotenv loading if many scripts require it.
  • Reason this comment was not posted:
    Confidence changes required: 50% <= threshold 50%
    None
9. evaluation/script/create_webvoyager_workflow.py:8
  • Draft comment:
    Using load_dotenv() here ensures all env variables are available. If this pattern repeats across multiple scripts, consider centralizing dotenv initialization.
  • Reason this comment was not posted:
    Confidence changes required: 50% <= threshold 50%
    None
10. evaluation/script/eval_webvoyager_task_v2.py:7
  • Draft comment:
    Load dotenv is added here as well; the practice is consistent. Consider centralizing this setup if applicable to reduce redundancy.
  • Reason this comment was not posted:
    Confidence changes required: 50% <= threshold 50%
    None
11. evaluation/script/eval_webvoyager_task_v2.py:1
  • Draft comment:
    File renamed from eval_webvoyager_cruise.py to eval_webvoyager_task_v2.py, which improves naming consistency with other task_v2 scripts.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
12. evaluation/script/eval_webvoyager_task_v2.py:69
  • Draft comment:
    On line 69, the printed message ends with two periods ('csv..'). Consider removing one period for consistent and clear punctuation.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_e5ZswHclTSwGkFnI


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.


from evaluation.core import Evaluator, SkyvernClient
from skyvern.forge.sdk.workflow.models.workflow import WorkflowRunStatus

load_dotenv()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This dotenv initialization is duplicated. Consider centralizing environment variable loading in one place.

@LawyZheng LawyZheng merged commit 148693a into main Feb 24, 2025
7 checks passed
@LawyZheng LawyZheng deleted the lawy/fix-evalutaion-script branch February 24, 2025 05:17
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants