Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Pandoc memory issue in amd64 docker container emulated on M1 Mac #2716

Closed
4 tasks done
jimjam-slam opened this issue Oct 4, 2022 · 16 comments
Closed
4 tasks done

Pandoc memory issue in amd64 docker container emulated on M1 Mac #2716

jimjam-slam opened this issue Oct 4, 2022 · 16 comments
Labels
bug Something isn't working
Milestone

Comments

@jimjam-slam
Copy link

Bug description

This might be better directed to @rocker-org, but I'll start here! I've endeavoured to set up a VSCode devcontainer for our Quarto projects based on the R community devcontainer template (which in turn uses the rocker/r-ver:4.2 Docker image). Quarto is included in the image, but attempts to render a test case lead to a Pandoc error:

image

$ quarto render index.qmd

Killed
Error in strsplit(info, "\n")[[1]] : subscript out of bounds
Calls: .main ... pandoc_available -> find_pandoc -> lapply -> FUN -> get_pandoc_version
In addition: Warning message:
In system(paste(shQuote(path), "--version"), intern = TRUE) :
  running command ''/usr/local/bin/pandoc' --version' had status 137
Execution halted

Even just running pandoc --version, using the version of Pandoc that Quarto appears to be calling, fails:

$ pandoc --version

Killed

I should say that the test case renders fine with my own local Quarto installation!

Checklist

  • Please include a minimal, fully reproducible example in a single .qmd file? Please provide the whole file rather than the snippet you believe is causing the issue.
  • Please format your issue so it is easier for us to read the bug report.
  • Please document the RStudio IDE version you're running (if applicable), by providing the value displayed in the "About RStudio" main menu dialog?
  • Please document the operating system you're running. If on Linux, please provide the specific distribution.
@jimjam-slam jimjam-slam added the bug Something isn't working label Oct 4, 2022
@jimjam-slam
Copy link
Author

Rendering a doc in the same docker image alone doesn't seem to trigger this problem, so perhaps it's specific to the community dev container spec:

$ docker run rocker/r-ver:4.2 echo -e '---\ntitle: Hello world\n---\n\nOh hi there' > index.qmd; cat index.qmd; quarto render index.qmd
---
title: Hello world
---

Oh hi there
pandoc 
  to: html
  output-file: index.html
  standalone: true
  section-divs: true
  html-math-method: mathjax
  wrap: none
  default-image-extension: png
  
metadata
  document-css: false
  link-citations: true
  date-format: long
  lang: en
  title: Hello world
  
Output created: index.html

Does Quarto typically use an internally bundled version of Pandoc on Linux, or one that's available to the system?

@jimjam-slam
Copy link
Author

Okay, I'm very confused - after doing some testing with Docker, I can now re-open the devcontainer in VSCode and things work fine (including pandoc --version). Not sure what happened here! Might try to scrap and rebuild a couple of times to verify.

@jimjam-slam
Copy link
Author

Trying to run the simpler test case works, and pandoc --version works, but if I then go back to trying to render the original test case, Quarto fails again (although pandoc --version keeps working). Very confusing 🤔

@jimjam-slam
Copy link
Author

jimjam-slam commented Oct 4, 2022

Oops! Looks like pandoc error 137 is an out of memory error. Increasing the RAM available to Docker from 8 GB to 12 GB did the trick.

I am a little surprised, though: there is quite a long delay before quarto render index.qmd starts producing output that I don't get locally, and monitoring memory usage with htop during the render, it climbs from a baseline of 1.8 GB to 8.8 GB before receding (and after it recedes, I start getting console output from Quarto). What could be going on to spike memory usage like that?

@jimjam-slam jimjam-slam changed the title Pandoc not working in rocker dev container Memory issue working in rocker dev container Oct 4, 2022
@cscheid
Copy link
Collaborator

cscheid commented Oct 4, 2022

We've observed pandoc uses a lot of memory when self-contained: true. I don't understand why it's so high (there's probably unnecessary laziness in the Haskell code while dealing with large strings), but that's what I suspect.

@jimjam-slam
Copy link
Author

Interesting! This could be a separate issue: when I use self-contained: false (or omit it), the memory spikes to 9 GB but then falls again just before I see processing file: index.qmd. If I use self-contained: true, it spikes but then remains high until the render finishes.

@jimjam-slam
Copy link
Author

Not sure if this helps, but this is what I see in htop when memory is spiking (sorted descending by memory use). Maybe I'm running the container on the wrong platform?

image

@jimjam-slam
Copy link
Author

jimjam-slam commented Oct 4, 2022

In fact, just running pandoc --version causes a similar memory spike:

image

With the expected output about 10 seconds later:

$ pandoc --version
pandoc 2.19.2
Compiled with pandoc-types 1.22.2.1, texmath 0.12.5.2, skylighting 0.13,
citeproc 0.8.0.1, ipynb 0.2, hslua 2.2.1
Scripting engine: Lua 5.4
User data directory: /home/rstudio/.local/share/pandoc
Copyright (C) 2006-2022 John MacFarlane. Web:  https://pandoc.org
This is free software; see the source for copying conditions. There is no
warranty, not even for merchantability or fitness for a particular purpose.

@jimjam-slam
Copy link
Author

I should note that I'm on an M1 Mac, and this appears to be an x86_64 container, so this could be a weird Docker/Pandoc/platform edge case 🤷🏻‍♂️

$ uname -a
Linux 9137e99d112a 5.10.104-linuxkit #1 SMP PREEMPT Thu Mar 17 17:05:54 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

@dragonstyle dragonstyle added this to the v1.3 milestone Oct 7, 2022
@eitsupi
Copy link
Contributor

eitsupi commented Oct 8, 2022

This issue has nothing to do with the Rocker Project so I recommend changing the title.
I think it is related to Pandoc and Docker Desktop on arm mac's amd64 emulation.

@jimjam-slam
Copy link
Author

@eitsupi It could be that the sensible thing is to close this issue altogether and open it with @pandoc, but I'll leave it for now (or at least until we confirm that it appears in Docker containers generally!). I don't intend it as a slight against Rocker :)

@eitsupi
Copy link
Contributor

eitsupi commented Oct 10, 2022

Sorry I am not expressing this well.
This is a problem that always occurs with Docker Desktop for mac on arm64 because there is no arm64 build of quarto cli.
There are multiple related issues (#190, #781), and I do not think this title is appropriate because it does not mention arm or Docker and emphasizes the Rocker Project and DevContainer, which are unrelated.

@jimjam-slam
Copy link
Author

No worries, @eitsupi —I just wanted to explicitly test a non-Rocker container before I came to any conclusions! I've reproduced it on an Ubuntu container (again, amd64 being emulated on my M1 Mac), so I'm satisfied it isn't a Rocker problem. I haven't been able to test on a native arm64 container yet (just trying to find one!).

@jimjam-slam jimjam-slam changed the title Memory issue working in rocker dev container Pandoc memory issue in amd64 docker container emulated on M1 Mac Oct 11, 2022
@eitsupi
Copy link
Contributor

eitsupi commented Oct 11, 2022

I don't think there is an arm64 image that includes quarto-cli because there is no arm64 build of quarto-cli.

pandoc has arm64 builds, so opening the issue in the pandoc repository may not be sympathetic....

@jimjam-slam
Copy link
Author

Yeah, I can't imagine them being too interested in tackling a memory issue that is specific to arm64 emulation when they already put an arm64 build out 😅 I might need to just swallow this one until there're Linux arm64 Quarto builds out!

@cscheid
Copy link
Collaborator

cscheid commented Nov 15, 2022

@jimjam-slam looks like this isn't a quarto bug, right? I'm going to go ahead and close this one, but feel free to reopen it if you narrow this down to something on our side that we can control.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants