Render action randomly fails when exporting to PDF: ERROR: Couldn't find open server #45

rogerbramon · 2022-09-13T13:24:22Z

Occasionally, quarto-actions/render@v2 fails to run, causing ERROR: Couldn't find open server. I've only experienced this problem when rendering to PDF, HTML output seems fine.

Has anyone experienced it?

Tested on:
Ubuntu 20.04 and 22.04
Quarto: latest version (1.1.189) and fixing one (1.0.37)

Workflow example:

name: Generate PDF
on:
  pull_request:
    types: [opened, synchronize, reopened, labeled]

jobs:
  generate-pdf:
    - name: Setup Quarto
      uses: quarto-dev/quarto-actions/setup@v2
      with:
        tinytex: true

    - name: Quarto check
      run: |
        quarto check --log-level info

    - name: Render Quarto Project
      uses: quarto-dev/quarto-actions/render@v2
      with:
        to: pdf
        path: input/P123-qmd-file.qmd

This is the output of all Quarto actions of the workflow when fails:

The text was updated successfully, but these errors were encountered:

cderv · 2022-09-13T14:02:07Z

This error does not ring a bell to me. @cscheid did you already encountered such error ?

@rogerbramon can you share more on the .qmd content where this error happens ? Do you have a link of the project ?

cscheid · 2022-09-13T14:18:34Z

This is the first time I see this error. Thank you for the report!

rogerbramon · 2022-09-13T14:19:05Z

@rogerbramon can you share more on the .qmd content where this error happens ? Do you have a link of the project ?

Content is basically pure Markdown and a mermaid diagram using ```{mermaid} format.
Unfortunately, it's a private repo, but I could try to create a test one and try to reproduce it there.

cderv · 2022-09-13T14:22:06Z

And a mermaid diagram using ```{mermaid} format.

I think this could come from that content. For PDF output, it needs to be printed to image using Chrome or Chromium. And it seems this cause issue (with server not found)

Google Chrome should be installed on GHA runners, but it seems something is not working as expected.

I could try to create a test one and try to reproduce it there.

If you do a test repo to reproduce, that will be really useful to reproduce and investigate

rogerbramon · 2022-09-13T14:44:32Z

Here you have the test repo, and I was able to reproduce the problem at the 3rd attempt (same code, just re-run jobs).

Successful attempt: https://github.com/rogerbramon/test-quarto/actions/runs/3045981181/attempts/2
Failure attempt: https://github.com/rogerbramon/test-quarto/actions/runs/3045981181

cderv · 2022-09-13T14:57:23Z

Thanks a lot!

I was able to reproduce the problem at the 3rd attempt (same code, just re-run jobs).

It seems not good that this is working on some run, and not on other... Like internal issue in the runners with connecting to chrome headless. 🤔

rogerbramon · 2022-09-13T17:13:40Z

Could it be that sometimes the Google Chrome takes a bit longer to spin up? If I get it right, you have a timeout of only 3 seconds on waitForServer.

https://github.com/quarto-dev/quarto-cli/blob/ed57d10eba33c34b5e1df2c22ed372a7e28da5d0/src/core/cri/cri.ts#L80-L90

I tried adding a previous step that calls the Chrome headless, and the problem is not occurring then. That's why timeout is my guess... Any thoughts?

    - name: Check chrome
      run: |
        echo $(which google-chrome)
        $(which google-chrome) --headless --single-process https://www.chromestatus.com

cscheid · 2022-09-13T17:23:36Z

Somewhat disconcertingly, the only hit on google for that error is another open issue on quarto quarto-dev/quarto-cli#1822

cderv · 2022-09-13T18:08:17Z

Oh interesting. Thanks @cscheid !

I am still trying to find a clean environment to reproduce the issue as now it working in my WSL, after I install the deb google-chrome. I just don't know the change. Before that I could reproduce each time though...

here it is also an error in Ubuntu in GHA. Maybe this will help us find the reason if I manage to debug this in the workflow directly.

rogerbramon · 2022-09-14T08:34:23Z

@cderv, you may use the action-tmate action to get access to the runner system via SSH and debug there.

rogerbramon · 2022-09-19T12:29:24Z

Just to add a bit of info, I've also experienced this problem on macOS a couple of times. But it's much more difficult to reproduce.

Not sure if that adds noise or are different issues, but besides the Couldn't find open server sometimes I also get on macOS:

ERROR: No inspectable targets

It's not easy to reproduce, but it's easier to get the error when there's no instance of Google Chrome opened.
HTH

cderv · 2022-09-19T16:25:04Z

So I manage to reproduce locally for quarto-dev/quarto-cli#1822 - the issue there is systematic not occasional and probably due to missing system requirement for the chromium that we install from puppeteer. I believe Github action has it all so not the issue here because occasional also.

Somewhat disconcertingly, the only hit on google for that error is another open issue on quarto

By the way this is the only hit because this is an error thrown by Quarto
https://github.com/quarto-dev/quarto-cli/blob/ed57d10eba33c34b5e1df2c22ed372a7e28da5d0/src/core/cri/cri.ts#L91

When the https://localhost:<port> can't be reach withing the timeout limit we through an error. This can happen if there is an issue with launching chrome or if chrome running. Currently we throw only the error.

I have made a PR in Quarto so that we can have more information if the error is a failed attempt to run chrome.
See quarto-dev/quarto-cli#2499

If I get it right, you have a timeout of only 3 seconds on waitForServer.

We could indeed look in the timeout. However, I think we attempt already 60 times with 50s between each attempt.

Maybe quarto-dev/quarto-cli#2499 will show some issue with running chrome itself and not with timeout. It should be merged soon and available in a pre-release to use it.

ERROR: No inspectable targets

@rogerbramon this issue is thrown by quarto also when targets are not valid somehow for the chrome remote interface. It means the chrome was launched correctly, quarto connected correctly to the remote debugging port, but... there is something else not correct in the interaction with the headless browser. If you have an example to share where this happens, that would help

rogerbramon · 2022-09-20T08:13:28Z

Thanks @cderv for your time. I tried the latest pre-release version (1.2.134) on the testing repo and, unfortunately, I've not been able to get the error because the render step now hangs (3 of 7 attempts). I had to cancel the workflow after 5min.

https://github.com/rogerbramon/test-quarto/actions/runs/3088361081/jobs/4994835499

Regarding the No inspectable targets, I have no further info to share, it occurred when running the same render command on the macOS.

cderv · 2022-09-20T08:28:57Z

That is bad. It is probably a different issue than quarto-dev/quarto-cli#1822 - I'll look into that next;

rogerbramon · 2023-05-10T14:23:02Z

Hi @cderv, not sure if it's related to that but with version 1.3 sometimes it hangs forever.

cderv · 2023-05-11T13:12:39Z

oh no... sorry about that.

We added printing stack trace by default when there is an error. Do you have more error too share ? or a link to an action log ?
It is possibly related to chrome and our usage with puppeteer within the Github Action context. Log can confirms that probably

Is this happening every time ?

rogerbramon · 2023-05-12T04:35:00Z

It happens randomly like before, but now it doesn't fail but keeps running, and you need to cancel it. I'm not able to see any log.

Using the same test repo, I just updated to use the latest version. You can see that Attempt #1 and Attempt#3 got stuck and I had to cancel them, and Attempt#2 succeeded.

I enabled debug logging on the latest attempt, but I don't see many insights.

Thanks.

cderv · 2023-05-12T09:02:36Z

That will not be easy to debug. I am surprised we get no log at all, no trace. Thanks for the report again.

I don't think that will change anything, because it is probably not the action itself and something on GHA runners with quarto render.

Sorry for the inconvenience, I will try to investigate but not sure where to look exactly.

rogerbramon · 2023-05-25T09:40:11Z

Thanks @cderv, what I've noticed is that this issue seems to disappear when adding a step that calls chromium before running quarto:

    - name: Check chromium
      run: |
        echo $(which chromium-browser)
        $(which chromium-browser) --headless https://www.chromestatus.com

With this step, the action ran successfully for 10 times in a row. However, as soon as I removed it, it started freezing again.

HTH

cderv · 2023-05-25T09:47:41Z

Oh really interesting debugging step. 🤔 Can you help me understand further your testing ?

a step that calls chromium before running quarto

Does this step only call chromium and close it ? Or does it leave it open for next step you think ?

I wonder what is the effect this command could have. I don't know if you have the environment needed, but did you observe the freezing on a non-gha unbuntu machine ?

i'll read through the code in quarto with this new information in mind. @cscheid if you have ideas, feel free to chime in.

cscheid · 2023-05-25T16:11:25Z

@rogerbramon That is super fascinating. I wonder if that check causes some deferred library loading that takes a while, and prevents an eventual race condition. I would be happy keeping the check in our actions. In fact, I wonder if this check would also fix some of the hard-to-track bugs we've been seeing with chromium in Linux on the quarto-cli repo!

@cderv What do you think about simply adding that action to our render step?

cderv · 2023-05-25T16:16:33Z

Oh great idea ! Not sure what it did not occured to me 😅

I guess it cost nothing to do it just in case someone needs chrome with Quarto. Sounds good !

cscheid · 2023-05-25T16:31:25Z

I guess it cost nothing to do it just in case someone needs chrome with Quarto. Sounds good !

It does cost the time to run it, but that really shouldn't be much. Let's try that!

cderv · 2023-05-26T10:03:54Z

I have a new v2 release of the action including this fix for Ubuntu only right now. MacOs and Windows runner needs som adjustment.

Adapt for MacOs runner
Adapt for Windows runner
Add to quarto-publish or even setup-quarto instead

rogerbramon · 2023-05-26T10:42:37Z

Thank you guys for looking into this. I don't have more info at this moment to answer your questions. I'll need to invest more time. So far, I've only experienced this issue on GHA. Locally, I use Mac. I can try to use devcontainers or codespaces to see if this issue can be reproduced.

@cderv What do you think about simply adding that action to our render step?

Would make sense to add this workaround on the Setup action instead of the render one? I'm saying this because the Publish action also renders by default and sometimes, depending on the parameters you need, you have to use a shell step to directly run quarto render.

cderv · 2023-05-26T12:38:23Z

Would make sense to add this workaround on the Setup action instead of the render one? I'm saying this because the Publish action also renders by default and sometimes, depending on the parameters you need, you have to use a shell step to directly run quarto render.

Oh indeed... It would probably make sense to add that to the setup action instead so that it covers render and publish.
Otherwise I can add it to publish also.

Thanks for the feedback !

rogerbramon · 2023-05-26T13:39:09Z

In my case, I don't use the render action because I need to define extra parameters to the render command which is not possible using the action, so I would need to manually add the step if that's not included in the Setup.

Not use if anyone is using the render or publish action without the setup, but in my case I find it very useful. I don't want to force anything, just explaining my use case.

cderv · 2023-05-26T13:41:18Z

I don't use the render action because I need to define extra parameters to the render command which is not possible using the action

Maybe we should allow that too ?

Not use if anyone is using the render or publish action without the setup, but in my case I find it very useful. I don't want to force anything, just explaining my use case.

That makes sense. We could probably expect someone using the publish or render action to have used the setup action in the first place. Maybe we should document this chromium trick also

rogerbramon · 2023-05-29T13:43:49Z

I don't use the render action because I need to define extra parameters to the render command which is not possible using the action

Maybe we should allow that too ?

Would be handy to have a free parameter to add whatever parameters you need.

rogerbramon · 2023-06-19T09:36:04Z

Hey, just to let you know that I've been using this workaround for a while, but unfortunately we still experience the problem randomly.

cderv · 2023-06-19T10:40:24Z

but unfortunately we still experience the problem randomly.

So even for ubuntu this is not enough ?

It seems really related to GHA runners. We are looking at new chrome development that may improve things for Quarto https://developer.chrome.com/blog/chrome-for-testing/

sebffischer · 2023-11-07T10:07:33Z

I just wanted to report that we were experience something similar in the mlr3 book, i.e. the actions just times out randomly.

When rendering with the --execute-debug flag, it turned out that the postprocessing failed for one of the chapters in our book. While other chapters ended with:

  |............................................| 100%                     	 
                                                                                                               	 
output file: preprocessing.knit.md

[knitr engine]: writing results
[knitr engine]: exiting
[knitr engine]: postprocess
[knitr engine]: writing results
[knitr engine]: exiting

the chapter that uses mermaid ended with

 |                                              	 
  |.............................................| 100%                    	 
                                                                                                              	 
output file: advanced_technical_aspects_of_mlr3.knit.md

[knitr engine]: writing results
[knitr engine]: exiting
Error: The operation was canceled.

One CI run where this happened can be found here.

We are rendering to both html and pdf.
This also happened randomly (around 50% of the runs) so it was not so easy to track this down as we also don't get any log output really.

What we also observed (not with 100% certainty, as the error is stochastic) is that rendering to html and pdf in two separate CI steps (quarto render book/ --cache-refresh --to html and then quarto render book/ --cache-refresh --to pdf) made the error disappear. When we removed the mermaid diagram, the error also seemed to disappear.

We also included the installation of chromium in our CI:

      - name: Install headless chromium
        run: quarto tools install chromium

cderv · 2023-11-07T10:24:27Z

Thanks a lot for the detailed explanation.

Error: The operation was canceled.

There was a hang in the CI and it was cancelled because The job running on runner GitHub Actions 7 has exceeded the maximum execution time of 360 minutes.

I am not surprised that the chapter with mermaid is the issue. We think this issue is related to using Chrome on GHA runner. But we really don't know what happens really, and how to solve.

It seems initiating Chromium somehow helped for some times (#45 (comment)) but still error appears.

Maybe the version we allow to install with quarto tools install chromium is not good enough for this usage on CI. I know about a specific action to install chromium: https://github.com/browser-actions/setup-chrome

It could worth a try and see if you still encounter the issue maybe ?

sebffischer · 2023-11-07T10:34:34Z

Thanks for the quick response! For now I will just render the mermaid diagram once and include it as a figure, until it is clear what the bug was and how it can be solved.

What was also surprising retrospect, is why the --execute-debug flag gives information about the knitr engine?
when running quarto help this flag is explained as

 --execute-debug                     - Show debug output for Jupyter kernel.

In our particular case the additional information about the knitr engine allowed us to track down the bug.

cderv · 2023-11-07T12:16:09Z

is why the --execute-debug flag gives information about the knitr engine?

This is mainly a documentation issue; This flag will set a debug variable to TRUE for which also applies for knitr rendering
Added a while back : quarto-dev/quarto-cli@9c86e98

sebffischer · 2023-11-07T12:18:35Z

Thanks! I have created an issue about this: quarto-dev/quarto-cli#7502

For a while, we have suffered from random timeouts when rendering the book. The log-output from GHA during the "Render book" stage just ended with: |.............................................| 100% output file: advanced_technical_aspects_of_mlr3.knit.md Error: The operation was canceled. after hitting the maximum runtime of 6 hours. (the success rate was around 50/50). When rendering with the --execute-debug flag more log-output was given. The log output at the end of rendering the technical chapter (pdf) was: | |.............................................| 100% output file: advanced_technical_aspects_of_mlr3.knit.md [knitr engine]: writing results [knitr engine]: exiting Error: The operation was canceled. for the other chapters when rendering to pdf, the output was |............................................| 100% output file: preprocessing.knit.md [knitr engine]: writing results [knitr engine]: exiting [knitr engine]: postprocess [knitr engine]: writing results [knitr engine]: exiting --> something with the postprocessing went wrong and the bug was identified to be in the technical chapter The problem was **NOT** the large-scale benchmarking chapter quarto-dev/quarto-actions#45

For a while, we have suffered from random timeouts when rendering the book. The log-output from GHA during the "Render book" stage just ended with: |.............................................| 100% output file: advanced_technical_aspects_of_mlr3.knit.md Error: The operation was canceled. after hitting the maximum runtime of 6 hours. (the success rate was around 50/50). When rendering with the --execute-debug flag more log-output was given. The log output at the end of rendering the technical chapter (pdf) was: | |.............................................| 100% output file: advanced_technical_aspects_of_mlr3.knit.md [knitr engine]: writing results [knitr engine]: exiting Error: The operation was canceled. for the other chapters when rendering to pdf, the output was |............................................| 100% output file: preprocessing.knit.md [knitr engine]: writing results [knitr engine]: exiting [knitr engine]: postprocess [knitr engine]: writing results [knitr engine]: exiting --> something with the postprocessing went wrong and the bug was identified to be in the technical chapter The problem was NOT the large-scale benchmarking chapter quarto-dev/quarto-actions#45 (comment)

we need to deal with initial issue considering new chrome update #45

Initially done because of quarto-dev/quarto-actions#45

cderv · 2024-08-29T22:01:38Z

FWIW chrome update is causing some problem, and the previous "fix" discussed above (#45 (comment)) is creating issues (it is handing the workflow).

It seems using chrome headless in CI is not that easy. I am going to probably revert the change of adding the above line by default in the render action. It is causing hanging in all actions using render currently.

…uild path doesn't support it right now. quarto-dev/quarto-actions#45

cderv self-assigned this Sep 19, 2022

cderv added the bug Something isn't working label May 12, 2023

sebffischer mentioned this issue Nov 7, 2023

fix(ci): include mermaid diagrams as png and svg (technical) mlr-org/mlr3book#794

Merged

sebffischer mentioned this issue Nov 7, 2023

Rendering mermaid diagrams to pdf causes random time-outs mlr-org/mlr3book#795

Open

cderv added a commit that referenced this issue Aug 29, 2024

let's remove this chrome check workaround

7a8d480

we need to deal with initial issue considering new chrome update #45

cderv added a commit to quarto-dev/quarto-cli that referenced this issue Aug 29, 2024

ff-matrix, gha - try without setting chrome checking "hack"

a6a5778

Initially done because of quarto-dev/quarto-actions#45

This was referenced Aug 30, 2024

Remove chrome checking #116

Merged

GitHub Actions stuck at "Warning: vkCreateInstance: Found no drivers!" #115

Closed

coatless added a commit to coatless-textbooks/c4ds that referenced this issue Oct 4, 2024

Disable PDF to avoid triggering a headless chrome session as the CI b…

1efdadf

…uild path doesn't support it right now. quarto-dev/quarto-actions#45

Render action randomly fails when exporting to PDF: ERROR: Couldn't find open server #45

Render action randomly fails when exporting to PDF: ERROR: Couldn't find open server #45

Comments

rogerbramon commented Sep 13, 2022 • edited Loading

cderv commented Sep 13, 2022

cscheid commented Sep 13, 2022

rogerbramon commented Sep 13, 2022

cderv commented Sep 13, 2022

rogerbramon commented Sep 13, 2022

cderv commented Sep 13, 2022 • edited Loading

rogerbramon commented Sep 13, 2022

cscheid commented Sep 13, 2022 • edited Loading

cderv commented Sep 13, 2022

rogerbramon commented Sep 14, 2022

rogerbramon commented Sep 19, 2022 • edited Loading

cderv commented Sep 19, 2022

rogerbramon commented Sep 20, 2022

cderv commented Sep 20, 2022 • edited Loading

rogerbramon commented May 10, 2023

cderv commented May 11, 2023

rogerbramon commented May 12, 2023

cderv commented May 12, 2023

rogerbramon commented May 25, 2023

cderv commented May 25, 2023

cscheid commented May 25, 2023 • edited Loading

cderv commented May 25, 2023

cscheid commented May 25, 2023

cderv commented May 26, 2023 • edited Loading

rogerbramon commented May 26, 2023 • edited Loading

cderv commented May 26, 2023

rogerbramon commented May 26, 2023

cderv commented May 26, 2023

rogerbramon commented May 29, 2023

rogerbramon commented Jun 19, 2023

cderv commented Jun 19, 2023

sebffischer commented Nov 7, 2023

cderv commented Nov 7, 2023

sebffischer commented Nov 7, 2023

cderv commented Nov 7, 2023

sebffischer commented Nov 7, 2023

cderv commented Aug 29, 2024

rogerbramon commented Sep 13, 2022 •

edited

Loading

cderv commented Sep 13, 2022 •

edited

Loading

cscheid commented Sep 13, 2022 •

edited

Loading

rogerbramon commented Sep 19, 2022 •

edited

Loading

cderv commented Sep 20, 2022 •

edited

Loading

cscheid commented May 25, 2023 •

edited

Loading

cderv commented May 26, 2023 •

edited

Loading

rogerbramon commented May 26, 2023 •

edited

Loading