Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Docker Version 17.03.0-ce-mac1 (15583) crashing systematically - not an issue in 1.13.1 #1378

Closed
bgrieder opened this issue Mar 3, 2017 · 14 comments

Comments

@bgrieder
Copy link

bgrieder commented Mar 3, 2017

Since the update this morning, the docker VM will systematically crash after a couple of minutes with the following error

Version 17.03.0-ce-mac1 (15583)
Channel: stable
e5a07a1a69

 Mar  3 10:23:40 BGR Docker[5204]: Assertion failed: (remaining == 0 || i < n), function vmn_read, file src/lib/pci_virtio_net_vpnkit.c, line 564.
 Mar  3 10:23:40 BGR Docker[5204]: virtio-net-vpnkit: initialising, opts="uuid=81bc8471-8f17-4825-abc4-fe75fb8d9c1e,path=/Users/bgrieder/Library/Containers/com.docker.docker/Data/s50,macfile=/Users/bgrieder/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/mac.0"
 Mar  3 10:23:40 BGR Docker[5204]: Interface will have uuid 81bc8471-8f17-4825-abc4-fe75fb8d9c1e
 Mar  3 10:23:40 BGR Docker[5204]: Connection established with MAC=c0:ff:ee:c0:ff:ee and MTU 8000
 Mar  3 10:23:40 BGR Docker[5204]: virtio-9p: initialising path=/Users/bgrieder/Library/Containers/com.docker.docker/Data/s40,tag=db
 Mar  3 10:23:40 BGR Docker[5204]: virtio-9p: initialising path=/Users/bgrieder/Library/Containers/com.docker.docker/Data/s51,tag=port
 Mar  3 10:23:40 BGR Docker[5202]: Fatal unexpected exception: Socket.Closed
 Mar  3 10:23:40 BGR Docker[5204]: linkname /Users/bgrieder/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/tty
 Mar  3 10:23:40 BGR Docker[5204]: COM1 connected to /dev/ttys000
 Mar  3 10:23:40 BGR Docker[5204]: COM1 linked to /Users/bgrieder/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/tty
 Mar  3 10:23:40 BGR Docker[5204]: Hypervisor subprocess has shutdown
 Mar  3 10:23:40 BGR Docker[626]: VM shutdown at 2017-03-03 10:23:40 +0100 CET
 Mar  3 10:23:40 BGR Docker[620]: Reap com.docker.driver.amd64-linux (pid 5204): exit status 0
 Mar  3 10:23:40 BGR Docker[620]: Reap com.docker.osxfs (pid 5202): exit status 1
 Mar  3 10:23:40 BGR Docker[5203]: Socket.Pipe.writev unix:/Users/bgrieder/Library/Containers/com.docker.docker/Data/s50: caught broken pipe returning Eof
 Mar  3 10:23:41 BGR Docker[620]: Stop 1 children with order 1: com.docker.driver.amd64-linux (pid 5204)
 Mar  3 10:23:41 BGR Docker[620]: Stop 2 children with order 2: com.docker.osxfs (pid 5202), com.docker.slirp (pid 5203)
 Mar  3 10:23:41 BGR Docker[620]: Signal terminated to com.docker.slirp (pid 5203)
 Mar  3 10:23:41 BGR Docker[620]: Reap com.docker.slirp (pid 5203): signal: terminated

This issue did not exist in the previous stable 1.13.1

@rogaha
Copy link

rogaha commented Mar 3, 2017

@bgrieder are you able to upload the logs files via the Diagnose and Feedback menu?

@bgrieder
Copy link
Author

bgrieder commented Mar 3, 2017

@rogaha Ah - I just rolled back to 1.13.0 (could find how to download the 1.13.1). Let me switch again

@bgrieder
Copy link
Author

bgrieder commented Mar 3, 2017

@rogaha Done - It crashed, I ran the Diagnose and Feedback which says everything is OK (...)

The Diagnostic ID is: CE054807-E6B0-4CFE-922A-A039A3C6BCFF

  Docker for Mac: version: 17.03.0-ce-mac1 (e5a07a1a6)
  macOS: version 10.12.3 (build: 16D32)
  logs: /tmp/CE054807-E6B0-4CFE-922A-A039A3C6BCFF/20170303-111829.tar.gz
  [OK]     vmnetd
  [OK]     dns
  [OK]     driver.amd64-linux
  [OK]     virtualization VT-X
  [OK]     app
  [OK]     moby
  [OK]     system
  [OK]     moby-syslog
  [OK]     db
  [OK]     env
  [OK]     virtualization kern.hv_support
  [OK]     slirp
  [OK]     osxfs
  [OK]     moby-console
  [OK]     logs
  [OK]     docker-cli
  [OK]     menubar
  [OK]     disk

@djs55
Copy link
Contributor

djs55 commented Mar 3, 2017

This looks like a duplicate of #1228

Could you describe what you're doing when it crashes? Are you running a network-heavy workload?

@rogaha
Copy link

rogaha commented Mar 3, 2017

@bgrieder it looks like you are having the same issue as in #1228. In the logs it shows that the app crashed 3 days ago:

2017-02-28 12:43:12+01 BGR Docker[com.docker.slirp][56255] <Error>: PPP.listen callback caught socket is not connected
2017-02-28 12:45:22+01 BGR Docker[com.docker.slirp][56255] <Error>: Socket.TCPv4.write tcp:37.187.124.42:30303: caught broken pipe returning Eof
2017-02-28 12:45:22+01 BGR Docker[com.docker.slirp][56255] <Error>: PPP.listen callback caught socket is not connected
2017-02-28 12:45:27+01 BGR Docker[com.docker.osxfs][56254] <Error>: Fatal unexpected exception: Socket.Closed
2017-02-28 12:45:27+01 BGR Docker[com.docker.driver.amd64-linux][56256] <Notice>: Assertion failed: (remaining == 0 || i < n), function vmn_read, file src/lib/pci_virtio_net_vpnkit.c, line 564.
2017-02-28 12:45:27+01 BGR Docker[com.docker.driver.amd64-linux][56256] <Notice>: virtio-net-vpnkit: initialising, opts="uuid=76d57016-3765-45ce-b9f1-c408a9f9f821,path=/Users/bgrieder/Library/Containers/com.docker.docker/Data/s50,macfile=/Users/bgrieder/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/mac.0"
2017-02-28 12:45:27+01 BGR Docker[com.docker.driver.amd64-linux][56256] <Notice>: Hypervisor subprocess has shutdown

Let's continue the discussion on the older ticket. I'm closing this in favor of #1228

@rogaha rogaha closed this as completed Mar 3, 2017
@bgrieder
Copy link
Author

bgrieder commented Mar 3, 2017

Sure.

(3 days ago, I may have tried the beta channel and rolled back)

I launch a docker-compose with multiple processes, in particular 2, a Java process and a blockhain client, geth (from Ethereum).

On startup, these 2 processes create pretty intensive load both in terms of CPU and IO to disk. The java process startup usually completes while the geth process does not. geth also connects to a P2P network and establishes multiple outbound network connections.

Let me know if you want more

@bgrieder
Copy link
Author

bgrieder commented Mar 3, 2017

@djs55 I confirm I can reproduce this issue by just running the Ethereum geth client. I can give you my docker file if you are interested.

@djs55
Copy link
Contributor

djs55 commented Mar 3, 2017

@bgrieder yes please -- a Dockerfile would be very helpful!

@bgrieder
Copy link
Author

bgrieder commented Mar 3, 2017

@djs55 Here you go. I had the issue running the image with

docker run -v /opt/geth.data/data:/opt/geth.data/data -p 8545:8545 GETH_IMAGE

I guess you do no have to do more than running the image; the geth client will start to establish connections.

In case this this not sufficient, you may have to download the Ethereum Wallet from the Ethereum site and connect it to the port 8545 of you running container to create an account (this runs in the Ethereum test environment, no worries)

Dockerfile for GETH_IMAGE

FROM ubuntu
EXPOSE 8545
EXPOSE 8546
EXPOSE 18002
VOLUME /opt/geth.data

RUN apt-get update && \
    apt-get install -y supervisor vim && \
    apt-get install -y software-properties-common

RUN add-apt-repository -y ppa:ethereum/ethereum && \
    apt-get update && \
    apt-get install -y ethereum

RUN rm -rf /var/lib/apt/lists/* && \
    mkdir -p /var/log/supervisor && \
    mkdir -p /opt/geth.data/data


## Supervisord conf
ADD supervisord.conf /etc/supervisor/conf.d/supervisord.conf

#bgeth run script
ADD geth-run.sh /opt/geth-run.sh
RUN chmod 755 /opt/geth-run.sh

#Supervisor in foreground
CMD ["/usr/bin/supervisord","-c","/etc/supervisor/conf.d/supervisord.conf"]

geth-run.sh

#!/bin/sh

geth \
     --testnet \
     --fast \
     --lightserv 15\
     --datadir "/opt/geth.data/data" \
     --rpc --rpcaddr "0.0.0.0" --rpcport "8545"  --rpcapi "admin,net,miner,personal,eth,web3,debug" --rpccorsdomain "*"\
     --ws --wsaddr "0.0.0.0" --wsport "8546" --wsapi "admin,miner,net,personal,eth,web3" --wsorigins "*"\
     --ipcpath "/opt/geth.data/data/testnet/geth.ipc" --ipcapi "admin,miner,eth,net,web3,personal,debug" \
     --port "18002" \
     --cache 1024 \
     --autodag \
     --nat "any"
#     -verbosity 5

supervisord.conf

[supervisord]
nodaemon=true

[program:geth]
command=/opt/geth-run.sh
stdout_logfile=/dev/stdout
stdout_logfile_maxbytes=0
stderr_logfile=/dev/stderr
stderr_logfile_maxbytes=0

@djs55
Copy link
Contributor

djs55 commented Mar 4, 2017

@bgrieder thanks very much for the Dockerfile. Looking at the error logs I suspect there's a bug in the MTU negotiation which allows the host to send an over-large ethernet frame to the VM, which triggers the assert.

The bug did reproduce for me, and then I adjusted the MTU setting:

cd ~/Library/Containers/com.docker.docker/Data/database/
git reset --hard
echo -n 1500 > com.docker.driver.amd64-linux/slirp/mtu
git add  com.docker.driver.amd64-linux/slirp/mtu
git commit -s -m 'force 1500 MTU'

After I restarted the app and re-ran the test it ran for a lot longer. In my logs I can see

$ syslog -k Sender Docker
...
... Creating slirp server peer_ip:192.168.65.2 local_ip:192.168.65.1 domain_search: mtu:1500

confirming that the correct MTU was set.

Could you try changing the MTU configuration and let me know if it gets better?

@bgrieder
Copy link
Author

bgrieder commented Mar 6, 2017

@djs55 Sorry for the late reply; I was busy this week-end

The changes prevent the crash but seem however to break the application since frames are dropped

Mar  3 13:41:44 BGR Docker[9419] <Notice>: Creating slirp server peer_ip:192.168.65.2 local_ip:192.168.65.1 domain_search: mtu:8000
Mar  3 15:12:01 BGR Docker[10574] <Error>: Dropping over-large ethernet frame, length = 1571, mtu = 1500
Mar  3 15:13:59 BGR Docker[10574] <Error>: Dropping over-large ethernet frame, length = 1571, mtu = 1500
Mar  3 15:24:10 BGR Docker[10574] <Error>: Dropping over-large ethernet frame, length = 1648, mtu = 1500
Mar  3 15:26:24 BGR Docker[10574] <Error>: Dropping over-large ethernet frame, length = 1559, mtu = 1500
Mar  3 18:13:16 BGR Docker[10574] <Error>: Dropping over-large ethernet frame, length = 1558, mtu = 1500
Mar  3 18:17:22 BGR Docker[10574] <Error>: Dropping over-large ethernet frame, length = 1569, mtu = 1500
Mar  3 18:18:22 BGR Docker[10574] <Error>: Dropping over-large ethernet frame, length = 1645, mtu = 1500
Mar  3 18:27:10 BGR Docker[10574] <Error>: Dropping over-large ethernet frame, length = 1560, mtu = 1500
Mar  6 10:19:35 BGR Docker[24741] <Notice>: Creating slirp server peer_ip:192.168.65.2 local_ip:192.168.65.1 domain_search: mtu:8000
Mar  6 10:20:38 BGR Docker[24741] <Notice>: slirp/mtu changed to 1500 in the database: restarting
Mar  6 10:20:39 BGR Docker[24783] <Notice>: Creating slirp server peer_ip:192.168.65.2 local_ip:192.168.65.1 domain_search: mtu:1500
Mar  6 10:21:34 BGR Docker[25146] <Notice>: Creating slirp server peer_ip:192.168.65.2 local_ip:192.168.65.1 domain_search: mtu:1500
Mar  6 10:26:55 BGR Docker[25146] <Error>: Dropping over-large ethernet frame, length = 1565, mtu = 1500

What are the side effects of increasing the mtu ?

@bgrieder
Copy link
Author

bgrieder commented Mar 6, 2017

@djs55 OK. Looks like I misread that assertion; increasing the mtu to say 8000, leads to an application crash on that assertion

@bgrieder
Copy link
Author

bgrieder commented Mar 6, 2017

@djs55 Sorry for the multiple messages. Scratch that, with the mtu set to 1500, and despite the drop frames, the application "seems" to work, at least on one leg. I will stay on these settings for now.

After a while all sockets start failing and the application hangs

Mar  6 16:08:04 BGR Docker[27288]: Socket.TCPv4.write tcp:78.192.117.13:30303: caught broken pipe returning Eof
Mar  6 16:08:05 BGR Docker[27288]: Socket.TCPv4.write tcp:47.91.138.220:30303: caught broken pipe returning Eof
Mar  6 16:08:09 BGR Docker[27288]: Socket.TCPv4.write tcp:167.114.247.130:30303: caught broken pipe returning Eof
Mar  6 16:08:15 BGR Docker[27288]: Socket.TCPv4.write tcp:138.68.104.147:30303: caught broken pipe returning Eof
Mar  6 16:08:32 BGR Docker[27288]: Socket.TCPv4.write tcp:52.184.162.64:30388: caught broken pipe returning Eof
Mar  6 16:08:32 BGR Docker[27288]: PPP.listen callback caught socket is not connected
Mar  6 16:08:34 BGR Docker[27288]: Socket.TCPv4.write tcp:192.241.159.215:30303: caught broken pipe returning Eof
Mar  6 16:08:35 BGR Docker[27288]: Socket.TCPv4.write tcp:37.187.146.34:30388: caught broken pipe returning Eof
Mar  6 16:08:45 BGR Docker[27288]: Socket.TCPv4.write tcp:138.68.143.220:30303: caught broken pipe returning Eof

That stillAlso this does not explain why hyperkit and osxfs use about 110 + 40 ~ 150% of my CPUs although inside the VM, the load is much smaller

Mem: 7511832K used, 136832K free, 167344K shrd, 24880K buff, 2261400K cached
CPU:   7% usr   6% sys   0% nic  64% idle  20% io   0% irq   0% sirq
Load average: 1.48 1.40 0.71 3/478 3505
  PID  PPID USER     STAT   VSZ %VSZ CPU %CPU COMMAND
 2788  2781 root     S    3763m  50%   3   8% geth --testnet --fast --lightserv

but I guess that is a different subject :)

@docker-robott
Copy link
Collaborator

Closed issues are locked after 30 days of inactivity.
This helps our team focus on active issues.

If you have found a problem that seems similar to this, please open a new issue.

Send feedback to Docker Community Slack channels #docker-for-mac or #docker-for-windows.
/lifecycle locked

@docker docker locked and limited conversation to collaborators Jun 23, 2020
# for free to subscribe to this conversation on GitHub. Already have an account? #.
Projects
None yet
Development

No branches or pull requests

4 participants