fix evaluation script #1819

LawyZheng · 2025-02-24T05:03:25Z

Important

Add load_dotenv() to load environment variables in evaluation scripts and rename eval_webvoyager_cruise.py to eval_webvoyager_task_v2.py.

Environment Setup:
- Add load_dotenv() to create_webvoyager_evaluation_result.py, create_webvoyager_task_v2.py, create_webvoyager_workflow.py, and eval_webvoyager_task_v2.py to load environment variables.
File Rename:
- Rename eval_webvoyager_cruise.py to eval_webvoyager_task_v2.py for consistency with other scripts.

^{This description was created by}^{for 143f2fd. It will automatically update as commits are pushed.}

ellipsis-dev

❌ Changes requested. Reviewed everything up to 143f2fd in 3 minutes and 2 seconds

More details

Looked at 81 lines of code in 4 files
Skipped 0 files when reviewing.
Skipped posting 12 drafted comments based on config settings.

1. evaluation/script/create_webvoyager_task_v2.py:15

Draft comment:
Remove duplicate dotenv initialization - environment variables are already loaded in the main server entry point.
dotenv initialization (main.py)
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50%
This script (create_webvoyager_task_v2.py) appears to be a standalone script that can be run independently. The other dotenv initialization is in main.py which is only executed when running the server. These are separate entry points, so both might need their own dotenv initialization. Removing it from this script could break it when run directly.
The comment assumes this script is only run as part of the server, but it has its own main block suggesting it's meant to be run independently.
Since this is a standalone script with its own entry point, it needs its own environment initialization to work correctly when run directly.
The comment is incorrect - this dotenv initialization is necessary for when the script is run independently, and is not actually duplicate functionality.

2. evaluation/script/create_webvoyager_workflow.py:17

Draft comment:
This dotenv initialization is duplicated. Consider importing and using the existing initialization instead.
dotenv initialization (main.py)
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50%
While it's true that load_dotenv() is called in both files, this isn't necessarily a problem. load_dotenv() is idempotent - calling it multiple times is safe. These are separate entry points in different packages, and each entry point should be responsible for its own environment setup. Having load_dotenv() in each entry point is actually a good practice for modularity.
Maybe there's a central initialization module that both packages should use? The duplication could lead to inconsistent environment loading if the calls are configured differently.
The benefits of each entry point being self-contained outweigh the minor duplication. If there were specific configuration issues, those would be worth addressing, but just calling load_dotenv() twice isn't harmful.
This comment should be deleted. The duplication of load_dotenv() is not problematic and follows good practices for modular code organization.

3. evaluation/script/create_webvoyager_evaluation_result.py:11

Draft comment:
load_dotenv() is added correctly to load environment variables before execution.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50%
None

4. evaluation/script/create_webvoyager_task_v2.py:15

Draft comment:
load_dotenv() is properly invoked early; consider centralizing dotenv loading if shared across scripts.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50%
None

5. evaluation/script/create_webvoyager_workflow.py:17

Draft comment:
load_dotenv() usage is appropriate here to ensure env variables are loaded before use.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50%
None

6. evaluation/script/eval_webvoyager_task_v2.py:12

Draft comment:
load_dotenv() is called at the start, ensuring environment variables are available for subsequent operations.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50%
None

7. evaluation/script/create_webvoyager_evaluation_result.py:6

Draft comment:
Good use of dotenv – loading environment variables at module level. Consider moving load_dotenv() into the main guard if side effects on import are a concern.
Reason this comment was not posted:
Confidence changes required: 50% <= threshold 50%
None

8. evaluation/script/create_webvoyager_task_v2.py:7

Draft comment:
Added load_dotenv() to initialize environment variables. Consider centralizing dotenv loading if many scripts require it.
Reason this comment was not posted:
Confidence changes required: 50% <= threshold 50%
None

9. evaluation/script/create_webvoyager_workflow.py:8

Draft comment:
Using load_dotenv() here ensures all env variables are available. If this pattern repeats across multiple scripts, consider centralizing dotenv initialization.
Reason this comment was not posted:
Confidence changes required: 50% <= threshold 50%
None

10. evaluation/script/eval_webvoyager_task_v2.py:7

Draft comment:
Load dotenv is added here as well; the practice is consistent. Consider centralizing this setup if applicable to reduce redundancy.
Reason this comment was not posted:
Confidence changes required: 50% <= threshold 50%
None

11. evaluation/script/eval_webvoyager_task_v2.py:1

Draft comment:
File renamed from eval_webvoyager_cruise.py to eval_webvoyager_task_v2.py, which improves naming consistency with other task_v2 scripts.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

12. evaluation/script/eval_webvoyager_task_v2.py:69

Draft comment:
On line 69, the printed message ends with two periods ('csv..'). Consider removing one period for consistent and clear punctuation.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_e5ZswHclTSwGkFnI

Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev · 2025-02-24T05:07:10Z

evaluation/script/eval_webvoyager_task_v2.py


 from evaluation.core import Evaluator, SkyvernClient
 from skyvern.forge.sdk.workflow.models.workflow import WorkflowRunStatus

+load_dotenv()


This dotenv initialization is duplicated. Consider centralizing environment variable loading in one place.

dotenv initialization (main.py)

dotenv initialization (create_webvoyager_evaluation_result.py)

fix evaluation script

143f2fd

ellipsis-dev bot reviewed Feb 24, 2025

View reviewed changes

wintonzheng approved these changes Feb 24, 2025

View reviewed changes

LawyZheng merged commit 148693a into main Feb 24, 2025
7 checks passed

LawyZheng deleted the lawy/fix-evalutaion-script branch February 24, 2025 05:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix evaluation script #1819

fix evaluation script #1819

LawyZheng commented Feb 24, 2025 •

edited by ellipsis-dev bot

Loading

ellipsis-dev bot left a comment

ellipsis-dev bot Feb 24, 2025

fix evaluation script #1819

fix evaluation script #1819

Conversation

LawyZheng commented Feb 24, 2025 • edited by ellipsis-dev bot Loading

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot Feb 24, 2025

Choose a reason for hiding this comment

LawyZheng commented Feb 24, 2025 •

edited by ellipsis-dev bot

Loading