Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Web driver is not installed, but seems to be required #92

Closed
mabraham opened this issue Oct 8, 2024 · 6 comments · Fixed by #93
Closed

Web driver is not installed, but seems to be required #92

mabraham opened this issue Oct 8, 2024 · 6 comments · Fixed by #93

Comments

@mabraham
Copy link
Contributor

mabraham commented Oct 8, 2024

In a CI container where I have not installed a browser or web driver, I am trying to run urlchecker, but get an error message like

$ urlchecker check --files docs/html/index.html --save urlcheck.csv --exclude-patterns html-full,html-user,html-lib,.tar.gz,_sources --file-types "*.html" --serial .
WARNING:urlchecker.core.urlproc:Issue with driver, results will be improved if you have it! Please match your version from https://googlechromelabs.github.io/chrome-for-testing
           original path: .
              final path: /builds/gromacs/gromacs/build-docs
               subfolder: None
                  branch: main
                 cleanup: False
                  serial: True
              file types: ['*.html']
                   files: ['docs/html/index.html']
               print all: True
                 verbose: False
           urls excluded: []
   url patterns excluded: ['html-full', 'html-user', 'html-lib', '.tar.gz', '_sources']
  file patterns excluded: []
          no check certs: False
              force pass: False
             retry count: 2
                    save: urlcheck.csv
                 timeout: 5
Traceback (most recent call last):
  File "/root/.local/share/pipx/venvs/urlchecker/lib/python3.12/site-packages/urlchecker/core/urlproc.py", line 282, in check_urls
    if needs_driver_check and driver and driver.check(url):
                                         ^^^^^^^^^^^^^^^^^
  File "/root/.local/share/pipx/venvs/urlchecker/lib/python3.12/site-packages/urlchecker/core/webdriver.py", line 83, in check
    self.get_browser()
  File "/root/.local/share/pipx/venvs/urlchecker/lib/python3.12/site-packages/urlchecker/core/webdriver.py", line 102, in get_browser
    self.browser = webdriver.chrome.webdriver.WebDriver(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.local/share/pipx/venvs/urlchecker/lib/python3.12/site-packages/selenium/webdriver/chrome/webdriver.py", line 45, in __init__
    super().__init__(
  File "/root/.local/share/pipx/venvs/urlchecker/lib/python3.12/site-packages/selenium/webdriver/chromium/webdriver.py", line 66, in __init__
    super().__init__(command_executor=executor, options=options)
  File "/root/.local/share/pipx/venvs/urlchecker/lib/python3.12/site-packages/selenium/webdriver/remote/webdriver.py", line 212, in __init__
    self.start_session(capabilities)
  File "/root/.local/share/pipx/venvs/urlchecker/lib/python3.12/site-packages/selenium/webdriver/remote/webdriver.py", line 299, in start_session
    response = self.execute(Command.NEW_SESSION, caps)["value"]
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.local/share/pipx/venvs/urlchecker/lib/python3.12/site-packages/selenium/webdriver/remote/webdriver.py", line 354, in execute
    self.error_handler.check_response(response)
  File "/root/.local/share/pipx/venvs/urlchecker/lib/python3.12/site-packages/selenium/webdriver/remote/errorhandler.py", line 229, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.SessionNotCreatedException: Message: session not created: Chrome failed to start: exited normally.
  (session not created: DevToolsActivePort file doesn't exist)
  (The process started from chrome location /root/.cache/selenium/chrome/linux64/129.0.6668.89/chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)
Stacktrace:
#0 0x55d814f9602a <unknown>
#1 0x55d814c7c5e0 <unknown>
#2 0x55d814cb4921 <unknown>
#3 0x55d814cb02c5 <unknown>
#4 0x55d814cfcdf6 <unknown>
#5 0x55d814cfc446 <unknown>
#6 0x55d814cf08c3 <unknown>
#7 0x55d814cbe6b3 <unknown>
#8 0x55d814cbf68e <unknown>
#9 0x55d814f60a2b <unknown>
#10 0x55d814f649b1 <unknown>
#11 0x55d814f4d225 <unknown>
#12 0x55d814f65532 <unknown>
#13 0x55d814f32[38](https://gitlab.com/gromacs/gromacs/-/jobs/8020189000#L38)f <unknown>
#14 0x55d814f84f28 <unknown>
#15 0x55d814f850f3 <unknown>
#16 0x55d814f94e7c <unknown>
#17 0x7f8e82b09a94 <unknown>
#18 0x7f8e82b96c3c <unknown>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/root/.local/bin/urlchecker", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/root/.local/share/pipx/venvs/urlchecker/lib/python3.12/site-packages/urlchecker/client/__init__.py", line 208, in main
    main(args=args, extra=extra)
  File "/root/.local/share/pipx/venvs/urlchecker/lib/python3.12/site-packages/urlchecker/client/check.py", line 90, in main
    check_results = checker.run(
                    ^^^^^^^^^^^^
  File "/root/.local/share/pipx/venvs/urlchecker/lib/python3.12/site-packages/urlchecker/core/check.py", line 228, in run
    results[file_name] = check_task(**kwargs)
                         ^^^^^^^^^^^^^^^^^^^^
  File "/root/.local/share/pipx/venvs/urlchecker/lib/python3.12/site-packages/urlchecker/core/check.py", line 263, in check_task
    checker.check_urls(
  File "/root/.local/share/pipx/venvs/urlchecker/lib/python3.12/site-packages/urlchecker/core/urlproc.py", line 287, in check_urls
    if driver and driver.check(url):
                  ^^^^^^^^^^^^^^^^^
  File "/root/.local/share/pipx/venvs/urlchecker/lib/python3.12/site-packages/urlchecker/core/webdriver.py", line 83, in check
    self.get_browser()
  File "/root/.local/share/pipx/venvs/urlchecker/lib/python3.12/site-packages/urlchecker/core/webdriver.py", line 102, in get_browser
    self.browser = webdriver.chrome.webdriver.WebDriver(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.local/share/pipx/venvs/urlchecker/lib/python3.12/site-packages/selenium/webdriver/chrome/webdriver.py", line 45, in __init__
    super().__init__(
  File "/root/.local/share/pipx/venvs/urlchecker/lib/python3.12/site-packages/selenium/webdriver/chromium/webdriver.py", line 66, in __init__
    super().__init__(command_executor=executor, options=options)
  File "/root/.local/share/pipx/venvs/urlchecker/lib/python3.12/site-packages/selenium/webdriver/remote/webdriver.py", line 212, in __init__
    self.start_session(capabilities)
  File "/root/.local/share/pipx/venvs/urlchecker/lib/python3.12/site-packages/selenium/webdriver/remote/webdriver.py", line 299, in start_session
    response = self.execute(Command.NEW_SESSION, caps)["value"]
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.local/share/pipx/venvs/urlchecker/lib/python3.12/site-packages/selenium/webdriver/remote/webdriver.py", line 354, in execute
    self.error_handler.check_response(response)
  File "/root/.local/share/pipx/venvs/urlchecker/lib/python3.12/site-packages/selenium/webdriver/remote/errorhandler.py", line 229, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.SessionNotCreatedException: Message: session not created: Chrome failed to start: exited normally.
  (session not created: DevToolsActivePort file doesn't exist)
  (The process started from chrome location /root/.cache/selenium/chrome/linux64/129.0.6668.89/chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)
Stacktrace:
#0 0x563e6845602a <unknown>
#1 0x563e6813c5e0 <unknown>
#2 0x563e68174921 <unknown>
#3 0x563e681702c5 <unknown>
#4 0x563e681bcdf6 <unknown>
#5 0x563e681bc446 <unknown>
#6 0x563e681b08c3 <unknown>
#7 0x563e6817e6b3 <unknown>
#8 0x563e6817f68e <unknown>
#9 0x563e68420a2b <unknown>
#10 0x563e684249b1 <unknown>
#11 0x563e68[40](https://gitlab.com/gromacs/gromacs/-/jobs/8020189000#L40)d225 <unknown>
#12 0x563e68[42](https://gitlab.com/gromacs/gromacs/-/jobs/8020189000#L42)5532 <unknown>
#13 0x563e683f238f <unknown>
#14 0x563e68[44](https://gitlab.com/gromacs/gromacs/-/jobs/8020189000#L44)4f28 <unknown>
#15 0x563e684[45](https://gitlab.com/gromacs/gromacs/-/jobs/8020189000#L45)0f3 <unknown>
#16 0x563e68454e7c <unknown>
#17 0x7facaa0e7a94 <unknown>
#18 0x7facaa174c3c <unknown>

That makes it look like a driver is actually required.

If a driver is intended to be optional, then I think https://github.com/urlstechie/urlchecker-python/blob/master/urlchecker/core/urlproc.py#L161 should set driver = None so that https://github.com/urlstechie/urlchecker-python/blob/master/urlchecker/core/urlproc.py#L282 will not choke on an invalid driver.

@SuperKogito
Copy link
Member

Yes you are totally right, the driver should be set to None to avoid the error. This is being done in

When the exception is raised the returned value should be none unless line 156 changes the driver value, which means you do have a driver but it doesn't pass the sanity check. @vsoch might have a better explanation for this.
Also can you provide a bit more information on your setup please?
A simple fix is to replace the return statement with two different ones; one under try and one under except.

@vsoch
Copy link
Collaborator

vsoch commented Oct 8, 2024

When it crashes like that, it's a mismatch between the chrome you have and the driver.

@mabraham
Copy link
Contributor Author

mabraham commented Oct 8, 2024

Thanks for the prompt replies!

This was running in a Docker container based on ubuntu 24.04, customized for building and linting some static HTML pages. There is no browser or similar, except as might have been brought in by pipx install urlchecker. So I don't know what kind of driver urlchecker might have found :-(

A simple fix is to replace the return statement with two different ones; one under try and one under except.

Yes, or to replace the driver by None if an exception was caught.

@SuperKogito
Copy link
Member

So I don't know what kind of driver urlchecker might have found :-(

This is easy to test, simply run this from within your docker

from .webdriver import WebDriver

driver = WebDriver(port=port, timeout=timeout)
print(driver)

# Do a sanity check of the driver
driver.check("https://google.com")

A more robust version of the function could be the following.

def get_driver(self, port: Optional[int] = None, timeout: Optional[int] = 5):
    """
    Get a selenium web driver for a check session, if possible.
    Requires selenium driver to exit, fall back to not using
    """
    detected_driver = None
    try:
        from .webdriver import WebDriver

        driver = WebDriver(port=port, timeout=timeout)

        # Do a sanity check of the driver
        driver.check("https://google.com")
        detected_driver = driver
    except:
        logger.warning(
            "Issue with driver, results will be improved if you have it! Please match your version from https://googlechromelabs.github.io/chrome-for-testing"
        )
    return detected_driver

Feel free to submit a PR for this.

@vsoch
Copy link
Collaborator

vsoch commented Oct 9, 2024

You can also just mimic what we do in our own docker image:

https://github.com/urlstechie/urlchecker-python/blob/master/Dockerfile

@mabraham
Copy link
Contributor Author

mabraham commented Oct 9, 2024

I dropped in

            print(driver)
            print(driver.driver)
            print(driver.browser)

and saw

<urlchecker.core.webdriver.WebDriver object at 0x78f602cd3320>
Chrome
None

yet dpkg reports no Chrome installed. So I guess my original problem arose when selenium reported itself as a Chrome webdriver but then the actual request failed the sanity check for some reason, leading to urlchecker trying to use the unsanitized driver.

I'll make a PR

mabraham added a commit to mabraham/urlchecker-python that referenced this issue Oct 9, 2024
Without this, a driver that failed the sanity check could be returned. This leads to attempting to use it when checking URLs, which led to avoidable failures.

Fixes urlstechie#92
vsoch pushed a commit that referenced this issue Oct 10, 2024
* Return driver object only when valid

Without this, a driver that failed the sanity check could be returned. This leads to attempting to use it when checking URLs, which led to avoidable failures.

Fixes #92
@vsoch vsoch closed this as completed in #93 Oct 10, 2024
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants