Skip to content

feat: adding crawlee's EnqueueStrategy config #176

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

muzafferkadir
Copy link

@muzafferkadir muzafferkadir commented Sep 6, 2024

PR Description

Summary:
This pull request introduces changes to the configuration schema and the crawling logic to enhance the flexibility of the crawling strategy. For more information: https://crawlee.dev/api/core/enum/EnqueueStrategy#All
Changes Made:

  1. Updated Configuration Schema (config.ts):

    • Added crawlStrategy field to the configuration schema.
      • This field allows specifying the Crawlee strategy for checking certain parts of the URLs found.
      • Possible values are "all", "same-origin", "same-hostname", and "same-domain".
      • This field is optional.
  2. Updated Crawling Logic (core.ts):

    • Integrated the crawlStrategy configuration into the PlaywrightCrawler setup.
      • The strategy parameter in enqueueLinks now uses the config.crawlStrategy value if provided.
      • Ensures that the crawling strategy defined in the configuration is applied during the crawling process.

Impact:

  • These changes provide greater control over the crawling behavior, allowing users to specify how URLs are handled based on their origin and domain.

Examples:

  • When crawlStrategy is set to "same-origin", the crawler will only follow links within the same origin.
  • When crawlStrategy is set to "all", the crawler will follow all links regardless of their origin.

@steve8708
Copy link
Contributor

thanks @muzafferkadir ! looks like build is failing, so will need that in to merge. otherwise, great update

@muzafferkadir
Copy link
Author

thanks @muzafferkadir ! looks like build is failing, so will need that in to merge. otherwise, great update

thanks, i updated

Copy link
Author

@muzafferkadir muzafferkadir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any update?

@steve8708
Copy link
Contributor

steve8708 commented Mar 7, 2025

sorry @muzafferkadir - looks like theres a merge conflict. i can hop on this once green again

@muzafferkadir
Copy link
Author

sorry @muzafferkadir - looks like theres a merge conflict. i can hop on this once green again

i updated

@steve8708
Copy link
Contributor

looks like build not passing @muzafferkadir

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants