Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

task v2 refactor part 6 - observer_cruise_id -> task_v2_id #1817

Merged
merged 2 commits into from
Feb 24, 2025

Conversation

wintonzheng
Copy link
Contributor

@wintonzheng wintonzheng commented Feb 23, 2025

Important

Refactor codebase to replace observer_cruise_id with task_v2_id, updating functions, logs, and database interactions to support task_v2.

  • Refactor:
    • Rename observer_cruise_id to task_v2_id across multiple files and functions, including task_v2_service.py, async_executor.py, and agent_protocol.py.
    • Update function names from create_cruise to create_task_v2 and run_observer_task to run_task_v2 in task_v2_service.py and async_executor.py.
    • Change log messages and error handling to reflect the new task_v2 terminology.
  • Functionality:
    • Modify initialize_observer_task and run_observer_task in task_v2_service.py to handle task_v2 logic.
    • Update database interactions to use task_v2_id in models.py and db/client.py.
  • Miscellaneous:
    • Adjust logging in forge_log.py to include task_v2_id in context.
    • Rename create_webvoyager_observer.py to create_webvoyager_task_v2.py and update its content accordingly.

This description was created by Ellipsis for cb1f754. It will automatically update as commits are pushed.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on f7911fd in 1 minute and 48 seconds

More details
  • Looked at 52 lines of code in 2 files
  • Skipped 0 files when reviewing.
  • Skipped posting 6 drafted comments based on config settings.
1. evaluation/core/__init__.py:58
  • Draft comment:
    Good refactor; ensure the response JSON schema for task_v2 returns a 'task_id' field as expected.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50%
    None
2. evaluation/script/create_webvoyager_task_v2.py:45
  • Draft comment:
    Inconsistent attribute name: using cruise.observer_cruise_id while the refactor renames it to task_v2_id. Verify whether it should be updated to cruise.task_id.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50%
    The comment raises a valid point about naming consistency, but several things make me hesitant: 1) We don't have access to the cruise object's class definition to know if observer_cruise_id is the correct attribute name 2) The comment is speculative with "Verify whether..." 3) The comment suggests cruise.task_id but we have no evidence this attribute exists 4) This could be an intentional mapping of an old attribute name to a new key name.
    Without seeing the cruise object's class definition, we can't be certain if this is actually an issue. The mapping could be intentional as part of a gradual refactoring.
    The comment violates our rules by asking for verification and being speculative. We don't have strong evidence that this is incorrect.
    Delete this comment as it's speculative, asks for verification, and we lack the context to be certain there's an actual issue.
3. evaluation/core/__init__.py:55
  • Draft comment:
    Ensure consistency in the refactor: create_task_v2 asserts for 'task_id' but the returned ObserverTask may still expose the old 'observer_cruise_id'. Update the model/attribute to use the new naming (e.g. task_v2_id or task_id) throughout.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50%
    Without seeing the ObserverTask model definition, we can't verify if there's actually an inconsistency. The code looks well-structured - it first validates the presence of task_id, then uses proper model validation. The comment is speculative since it says "may still expose" without evidence. This feels like asking for confirmation rather than pointing out a clear issue.
    I could be wrong about the model validation - perhaps there is a real inconsistency that would cause runtime errors. The comment might be trying to prevent a real issue.
    However, the model validation would fail immediately if there was a mismatch between the API response and the model schema. This is exactly what Pydantic validation is for. The comment is asking for verification rather than pointing out a concrete issue.
    The comment should be deleted because it's speculative and asks for verification without evidence of an actual problem. If there was a schema mismatch, it would fail at runtime.
4. evaluation/script/create_webvoyager_task_v2.py:45
  • Draft comment:
    Update property access: Instead of accessing cruise.observer_cruise_id, use the new field (e.g. cruise.task_id) to match the refactored naming.
  • Reason this comment was not posted:
    Comment was on unchanged code.
5. evaluation/script/create_webvoyager_task_v2.py:15
  • Draft comment:
    Rename the function: Consider renaming 'create_observer_cruise' to something like 'create_task_v2' for full consistency with the refactor.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
6. evaluation/core/__init__.py:126
  • Draft comment:
    Typo: In the assert message on line 126, 'but got NONE:' should be changed to 'but got None:' for consistency with typical Python conventions.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_3EuH2NLHQASDFqEB


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Reviewed everything up to f7911fd in 2 minutes and 36 seconds

More details
  • Looked at 1108 lines of code in 15 files
  • Skipped 0 files when reviewing.
  • Skipped posting 28 drafted comments based on config settings.
1. skyvern/forge/sdk/services/task_v2_service.py:113
  • Draft comment:
    When setting context.task_v2_id, use same naming across services for clarity. Consider renaming variable if refactored.
  • Reason this comment was not posted:
    Comment looked like it was already resolved.
2. skyvern/forge/sdk/services/task_v2_service.py:215
  • Draft comment:
    Error handling in run_observer_task: if observer_task is not found, logging then raising TaskV2NotFound is good, but consider including additional context in the exception message.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
3. skyvern/forge/sdk/executor/async_executor.py:160
  • Draft comment:
    When updating observer cruise to queued, ensure consistency in naming. The parameter is 'task_v2_id', not observer_cruise_id.
  • Reason this comment was not posted:
    Comment looked like it was already resolved.
4. skyvern/forge/sdk/forge_log.py:35
  • Draft comment:
    Ensure logging key names match new parameter names (e.g. task_v2_id) to ease transition.
  • Reason this comment was not posted:
    Comment looked like it was already resolved.
5. skyvern/forge/sdk/routes/agent_protocol.py:1258
  • Draft comment:
    In observer_task endpoint, ensure task_v2_id is passed properly from observer_task object. Some endpoints may still use legacy naming.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50%
    The comment is asking the author to ensure that a specific parameter is passed properly, which violates the rule against asking the author to ensure behavior. It does not provide a specific suggestion or point out a clear issue with the code.
6. skyvern/forge/sdk/db/id.py:159
  • Draft comment:
    Naming for task_v2_id generation looks correct. Confirm that the prefix TASK_V2_ID = 'oc' remains intentional.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 0% vs. threshold = 50%
    This comment violates the rule about not asking for confirmations or verifications. It's asking the author to confirm their intention, which is explicitly listed as not useful. The prefix value itself being 'oc' vs something else is an implementation detail that the author clearly chose deliberately during the rename.
    Maybe keeping 'oc' as the prefix for TASK_V2_ID is confusing since it no longer relates to "observer cruise" and could cause maintenance issues later.
    While the prefix might be suboptimal, asking for confirmation doesn't add value. If there was a strong case for changing it, the comment should directly suggest a better alternative instead of asking for confirmation.
    Delete this comment as it violates the rule about not asking for confirmations or verifications of intent.
7. skyvern/forge/sdk/models.py:572
  • Draft comment:
    ObserverCruiseModel default now uses generate_task_v2_id. Verify existing integrations and consumers of observer_cruise_id are updated to use task_v2_id.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
8. skyvern/forge/sdk/core/skyvern_context.py:15
  • Draft comment:
    Adding task_v2_id field in context is good. Confirm that it is consistently used throughout the refactor.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50%
    This comment is asking the PR author to confirm consistency in the use of a field throughout a refactor. It is not making a specific code suggestion or pointing out a specific issue, but rather asking for confirmation of consistency, which violates the rule against asking the author to confirm their intention or double-check things.
9. skyvern/forge/sdk/routes/agent_protocol.py:2174
  • Draft comment:
    In TaskV2Block execution, ensure observer task is updated with new field naming. Check that all DB updates use task_v2_id consistently.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
10. skyvern/forge/sdk/services/task_v2_service.py:113
  • Draft comment:
    In initialize_observer_task, context.task_v2_id is set from observer_task.observer_cruise_id. To fully complete the refactor, consider renaming the field in ObserverTask (and corresponding DB model) to 'task_v2_id' so that the terminology is consistent.
  • Reason this comment was not posted:
    Comment was on unchanged code.
11. skyvern/forge/sdk/db/id.py:40
  • Draft comment:
    The constant TASK_V2_ID is defined as 'oc', which seems legacy (possibly from observer cruise). Consider using a more descriptive prefix (e.g. 't2') to better reflect its use for task v2.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50%
    While the comment correctly identifies a potential naming inconsistency, changing ID prefixes in a database system is a risky operation that could affect existing data and require migrations. The comment doesn't demonstrate strong evidence that this change is necessary or worth the risk. It's more of a nice-to-have suggestion than a critical issue.
    The comment might be identifying a real technical debt issue that could cause confusion for future developers. Consistent naming conventions are important for maintainability.
    However, changing ID prefixes in a production system is a major operation that requires careful consideration of existing data. Without strong evidence that this causes actual problems, it's better to leave it as is.
    The comment should be removed as it suggests a potentially risky change without demonstrating clear necessity or benefit that outweighs the risks.
12. skyvern/forge/sdk/db/models.py:570
  • Draft comment:
    ObserverCruiseModel defines 'observer_cruise_id' with a default using generate_task_v2_id. Since the refactor is shifting to task_v2 naming, consider renaming this column (and its foreign key references) to 'task_v2_id'.
  • Reason this comment was not posted:
    Comment was on unchanged code.
13. skyvern/forge/sdk/executor/async_executor.py:160
  • Draft comment:
    In the execute_cruise method, log messages and variable naming still refer to 'observer cruise' while the parameter is task_v2_id. Update log messages and terminology to maintain consistency with the new task_v2 nomenclature.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
14. skyvern/forge/sdk/workflow/models/block.py:2119
  • Draft comment:
    The TaskV2Block class still refers to observer_cruise_id (e.g. when calling initialize_observer_task). For clarity and consistency with the refactor, update these references to use task_v2_id instead.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
15. evaluation/core/__init__.py:126
  • Draft comment:
    Typo: Consider replacing 'NONE' with 'None' in the assert message, to maintain consistent capitalization.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
16. evaluation/script/create_webvoyager_task_v2.py:55
  • Draft comment:
    There's a minor typographical/grammatical issue in the print statement on line 55. The message 'Queued {cnt} cruises to launch webvoyager evaluation test. saving the records file in {record_file_path}' could be rephrased for better clarity and consistency. Consider something like: 'Queued {cnt} cruises for webvoyager evaluation test and saved the records file in {record_file_path}'.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
17. skyvern/agent/local.py:70
  • Draft comment:
    Consider updating the comment on line 70 from "mark observer cruise as queued" to refer to "task_v2" instead, for consistency with the code changes.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50%
    While the suggestion technically aligns with the code changes, updating comments to match parameter names is a relatively minor concern. The existing comment still accurately describes what the code does functionally. The parameter rename seems to be part of a larger refactoring effort but doesn't impact the code's behavior or clarity.
    The comment could help maintain consistency in terminology throughout the codebase. Inconsistent terminology between comments and code can sometimes be confusing for future maintainers.
    However, the existing comment is still functionally accurate and clear. This kind of minor comment update suggestion creates noise in the PR review without adding significant value.
    The comment should be removed as it suggests a purely cosmetic change that doesn't meaningfully improve code quality or prevent potential issues.
18. skyvern/agent/local.py:171
  • Draft comment:
    Consider updating the exception message on line 171 from "Observer cruise missing workflow run id" to use the new terminology (e.g., "Task v2 missing workflow run id") to match the refactored code.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
19. skyvern/exceptions.py:391
  • Draft comment:
    Typo in the error message for UnsupportedActionType: currently it says 'Unsupport action type', but it should read 'Unsupported action type'.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
20. skyvern/exceptions.py:529
  • Draft comment:
    Typo in the class name 'NoElementBoudingBox': 'Bouding' should be corrected to 'Bounding' to maintain consistency and clarity.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
21. skyvern/forge/sdk/db/client.py:2033
  • Draft comment:
    Typographical issue: In the get_totp_codes function, the variable used to hold the list of TOTP codes (assigned to 'totp_code') is then also used as the loop variable in the list comprehension (i.e., 'for totp_code in totp_code'). Consider renaming one of them (e.g., using 'totp_codes' for the list and 'code' for the iteration variable) to improve clarity.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
22. skyvern/forge/sdk/db/id.py:18
  • Draft comment:
    Typo: In the comment at line 18, 'Bit shits (left)' should be corrected to 'Bit shifts (left)'.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
23. skyvern/forge/sdk/db/models.py:237
  • Draft comment:
    Typo detected: In the comment near line 237, 'workfow runs with parent_workflow_run_id...' should be corrected to 'workflow runs with parent_workflow_run_id...' to improve clarity.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
24. skyvern/forge/sdk/executor/async_executor.py:162
  • Draft comment:
    The error message "No observer cruise or no workflow run associated with observer cruise" still references 'observer cruise'. As we are now using 'task_v2_id', please update the message for consistency.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50%
    The parameter rename from observer_cruise_id to task_v2_id appears to be an API-level change, but internally the code still works with "observer cruise" objects. The error message is actually describing what's missing - an observer cruise record or its workflow run. The message is technically accurate as written. Changing it might actually make it less clear what's actually missing.
    I might be wrong about the scope of the rename - perhaps there's a larger refactoring happening across the codebase to move away from "observer cruise" terminology entirely.
    Without access to the broader codebase context, we should assume the error message is intentionally referring to the actual data structure. The message describes the technical problem accurately.
    The error message should be kept as is since it correctly describes the underlying data structure that's missing, regardless of the parameter name change.
25. skyvern/forge/sdk/executor/async_executor.py:164
  • Draft comment:
    The inline comment '# mark observer cruise as queued' still indicates 'observer cruise'. Please update it to reflect the new naming convention, e.g., 'mark task_v2 as queued'.
  • Reason this comment was not posted:
    Comment was on unchanged code.
26. skyvern/forge/sdk/services/task_v2_service.py:187
  • Draft comment:
    Typo: The comment says "update oserver cruise". It should be "update observer cruise".
  • Reason this comment was not posted:
    Comment was on unchanged code.
27. skyvern/forge/sdk/services/task_v2_service.py:333
  • Draft comment:
    Typo: The log message uses "max_iterationss_override" with an extra 's'. It should be "max_iterations_override".
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
28. skyvern/forge/sdk/services/task_v2_service.py:882
  • Draft comment:
    Typo: The comment contains "wofklow" instead of "workflow".
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_cykj0hUj5oUw8sKG


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

<!-- ELLIPSIS_HIDDEN -->

> [!IMPORTANT]
> Refactor codebase to rename `observer_cruise_id` to `task_v2_id` for consistency with task v2 naming conventions.
>
>   - **Renaming**:
>     - Rename `observer_cruise_id` to `task_v2_id` in function parameters, logging, and database operations across multiple files.
>     - Update related variable names and log messages to reflect the new naming convention.
>   - **Files and Functions**:
>     - `cloud_async_executor.py`: Update `execute_cruise()` and related functions.
>     - `observer_cruises.py`: Update `reset_observer_cruise()`.
>     - `run_task_v2.py`: Update `run_observer_task()` and related functions.
>   - **Miscellaneous**:
>     - Update `RunObserverCruiseParams` dataclass in `temporal_workflow_parameters.py`.
>     - Adjust logging and error messages to use `task_v2_id`.
>     - Minor updates to URLs in `run_evals.py`.
>
> <sup>This description was created by </sup>[<img alt="Ellipsis" src="https://img.shields.io/badge/Ellipsis-blue?color=175173">](https://www.ellipsis.dev?ref=Skyvern-AI%2Fskyvern-cloud&utm_source=github&utm_medium=referral)<sup> for 6da00bc389af50cd5ee444261138328ea602526b. It will automatically update as commits are pushed.</sup>

<!-- ELLIPSIS_HIDDEN -->
<!-- ELLIPSIS_HIDDEN -->

> [!IMPORTANT]
> Refactor codebase to rename `observer_cruise_id` to `task_v2_id` for consistency with task v2 naming conventions.
>
>   - **Renaming**:
>     - Rename `observer_cruise_id` to `task_v2_id` in function parameters, logging, and database operations across multiple files.
>     - Update related variable names and log messages to reflect the new naming convention.
>   - **Files and Functions**:
>     - `cloud_async_executor.py`: Update `execute_cruise()` and related functions.
>     - `observer_cruises.py`: Update `reset_observer_cruise()`.
>     - `run_task_v2.py`: Update `run_observer_task()` and related functions.
>   - **Miscellaneous**:
>     - Update `RunObserverCruiseParams` dataclass in `temporal_workflow_parameters.py`.
>     - Adjust logging and error messages to use `task_v2_id`.
>     - Minor updates to URLs in `run_evals.py`.
>
> <sup>This description was created by </sup>[<img alt="Ellipsis" src="https://img.shields.io/badge/Ellipsis-blue?color=175173">](https://www.ellipsis.dev?ref=Skyvern-AI%2Fskyvern-cloud&utm_source=github&utm_medium=referral)<sup> for 6da00bc389af50cd5ee444261138328ea602526b. It will automatically update as commits are pushed.</sup>

<!-- ELLIPSIS_HIDDEN -->
@wintonzheng wintonzheng force-pushed the shu/task_v2_refactor_part6 branch from f7911fd to cb1f754 Compare February 23, 2025 12:18
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on cb1f754 in 1 minute and 55 seconds

More details
  • Looked at 124 lines of code in 4 files
  • Skipped 0 files when reviewing.
  • Skipped posting 13 drafted comments based on config settings.
1. evaluation/core/__init__.py:26
  • Draft comment:
    Good use of replacing base URL for v2 endpoints.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50%
    None
2. evaluation/core/__init__.py:58
  • Draft comment:
    Renamed method to create_task_v2 and updated assertions; looks good. Consider ensuring consistent naming in return values.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50%
    None
3. evaluation/core/__init__.py:216
  • Draft comment:
    queue_skyvern_cruise still uses 'cruise' naming even though it calls create_task_v2; consider renaming to queue_skyvern_task_v2 for consistency.
  • Reason this comment was not posted:
    Comment was on unchanged code.
4. evaluation/script/create_webvoyager_evaluation_result.py:51
  • Draft comment:
    Updated references from observer_cruise to observer_task; change is consistent.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50%
    None
5. evaluation/script/create_webvoyager_task_v2.py:15
  • Draft comment:
    Function renamed from create_observer_cruise to create_task_v2; consider also renaming local variables to avoid confusion with 'cruise'.
  • Reason this comment was not posted:
    Confidence changes required: 33% <= threshold 50%
    None
6. evaluation/script/create_webvoyager_task_v2.py:45
  • Draft comment:
    Mismatch: dumping 'task_v2_id' from cruise.observer_cruise_id; consider using a consistent attribute, e.g. cruise.task_id.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50%
    The rename of the file and function suggests an intentional migration from 'observer' to 'task' terminology. The comment suggests using 'task_id', but the code specifically uses 'task_v2_id' which aligns with the new file name. Without seeing the schema for ObserverTaskRequest or knowing the full context of this migration, we can't be certain this is actually a problem.
    I might be missing important context about the data model and whether observer_cruise_id should also be renamed as part of this migration. The suggested 'task_id' might be the correct term according to some schema I haven't seen.
    While there might be more context, the systematic renaming in the diff suggests this is an intentional transition to v2 terminology, and the comment's suggestion of 'task_id' doesn't align with the clear v2 naming pattern.
    Delete the comment. The inconsistency might be intentional during a migration, and without more context about the data model, we can't be confident this is actually a problem that needs fixing.
7. evaluation/script/eval_webvoyager_cruise.py:35
  • Draft comment:
    Updated references from observer_cruise to observer_task are consistent; no issues found.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50%
    None
8. evaluation/core/__init__.py:27
  • Draft comment:
    Ensure base_url contains '/api/v1' before replacement. Otherwise, v2_base_url may be incorrect.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50%
    The comment points out a potential issue - if base_url doesn't contain '/api/v1', the replacement won't work as intended. However, this is a speculative issue ("if X then Y could be wrong"). The comment also starts with "Ensure that..." which is a red flag. There's no strong evidence that this is actually causing problems in practice.
    The comment identifies a real potential edge case that could cause issues. If base_url doesn't contain '/api/v1', the v2 URL would be incorrect.
    While technically correct, this falls into the category of asking the author to verify/ensure something rather than pointing out a definite issue. We don't have evidence that this is causing actual problems.
    The comment should be deleted as it's speculative and starts with "Ensure that..." which violates our commenting guidelines.
9. evaluation/core/__init__.py:58
  • Draft comment:
    Verify that response fields and model attributes are updated. The method asserts 'task_id' but later uses legacy naming.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 40% <= threshold 50%
    The comment is asking the author to verify something, which is against the rules. However, it also points out a potential inconsistency in naming, which could be useful. I need to determine if the comment is more about verification or if it provides a specific suggestion or observation that could be useful.
10. evaluation/core/__init__.py:216
  • Draft comment:
    Consider renaming 'queue_skyvern_cruise' to match the new task nomenclature (e.g. 'queue_skyvern_task') for consistency.
  • Reason this comment was not posted:
    Comment was on unchanged code.
11. evaluation/script/create_webvoyager_task_v2.py:45
  • Draft comment:
    Update property access: use cruise.task_id (or corresponding new field) instead of cruise.observer_cruise_id for clarity.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50%
    While the code is being renamed from observer to task_v2 theme, there's no evidence that the cruise object's attribute name has changed. The comment assumes the existence of cruise.task_id without proof. We can't verify if this suggestion is correct without seeing the cruise object's class definition.
    The code might be in an intermediate state of a larger refactoring where cruise.task_id will be added later. The comment could be pointing out an inconsistency in the renaming pattern.
    Without seeing the cruise class definition or related changes, we can't verify if cruise.task_id exists or will exist. We should err on the side of removing speculative comments.
    Delete the comment as it makes assumptions about the existence of cruise.task_id without clear evidence, and we can't verify its correctness from the available context.
12. evaluation/script/create_webvoyager_evaluation_result.py:59
  • Draft comment:
    The print message on line 59 uses double periods at the end ('csv..'). Please check if this is intentional or if it should be a single period for consistency.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
13. evaluation/script/create_webvoyager_task_v2.py:55
  • Draft comment:
    In the print statement at line 55, consider refining the text for clarity and proper capitalization. For example, instead of "Queued {cnt} cruises to launch webvoyager evaluation test. saving the records file in {record_file_path}", you might update it to "Queued {cnt} cruises to launch webvoyager evaluation test. Saved the records file in {record_file_path}".
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_Rdwjf6qZY4R2g5Aj


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

@wintonzheng wintonzheng merged commit ffbc95e into main Feb 24, 2025
7 checks passed
@wintonzheng wintonzheng deleted the shu/task_v2_refactor_part6 branch February 24, 2025 00:03
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant