Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Miguel/observe a11y #412

Merged
merged 28 commits into from
Jan 20, 2025
Merged

Miguel/observe a11y #412

merged 28 commits into from
Jan 20, 2025

Conversation

miguelg719
Copy link
Collaborator

@miguelg719 miguelg719 commented Jan 17, 2025

why

Including a new format to get website context using a11y trees. It unlocks a new paradigm in processing context for web agents leveraging the full potential of CDP/playwright. By using the structured, semantic data from a11y trees, this approach aims to improve interaction fidelity, reduce token cost, speed up inference, and optimize contextual awareness when LLMs perform tasks in web-based environments through text.

what changed

Context is now provided optionally with the flag useAccessibilityTree for observe tasks. This changes the way DOM is processed by using a11y trees. The DOM function is still used for backward compatibility with selector maps, but I also include a backendNodeId which is another approach for directly interacting with elements through CDP.

Sample usage:

 const observations = await stagehand.page.observe({
   instruction: "Find all the links on the header section",
   useAccessibilityTree: true
 });

Sample output:

[
  {
    description: 'Sample link',
    selector: 'xpath=/html/body[1]/div[1]...',
    backendNodeId: 60
  },
...

test plan

  • Test against existing observe evals
  • Generate new evals (new approach) for various observe tasks (currently only 5)
  • Test against new evals and quantify improvement in fidelity and cost/speed vs current approaches
  • Include act functions for direct compatibility
  • Test with act

Copy link

changeset-bot bot commented Jan 17, 2025

🦋 Changeset detected

Latest commit: 0861e9c

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@browserbasehq/stagehand Minor

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

evals/tasks/extract_resistor_info.ts Outdated Show resolved Hide resolved
(fullPage: boolean) =>
fullPage ? window.processAllOfDom() : window.processDom([]),
fullPage,
const cdpClient = await this.stagehandPage.context.newCDPSession(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put raw CDP as a stagehandPage function? i wanna do something like this.stagehandPage.sendCDP(cmd, args) instead.

lib/handlers/observeHandler.ts Outdated Show resolved Hide resolved
lib/handlers/observeHandler.ts Outdated Show resolved Hide resolved
lib/handlers/observeHandler.ts Outdated Show resolved Hide resolved
lib/handlers/observeHandler.ts Outdated Show resolved Hide resolved
lib/handlers/observeHandler.ts Outdated Show resolved Hide resolved
lib/handlers/observeHandler.ts Outdated Show resolved Hide resolved
lib/handlers/observeHandler.ts Outdated Show resolved Hide resolved
package.json Outdated Show resolved Hide resolved
types/stagehand.ts Outdated Show resolved Hide resolved
lib/StagehandPage.ts Show resolved Hide resolved
lib/a11y/utils.ts Show resolved Hide resolved
lib/a11y/utils.ts Outdated Show resolved Hide resolved
lib/inference.ts Outdated Show resolved Hide resolved
@kamath kamath self-requested a review January 20, 2025 02:33
Copy link
Contributor

@kamath kamath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀 🚢 :shipit:

lib/a11y/utils.ts Show resolved Hide resolved
@miguelg719 miguelg719 merged commit 4aa4813 into main Jan 20, 2025
10 checks passed
@github-actions github-actions bot mentioned this pull request Jan 20, 2025
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants