与容器的通信不总是可能的

我在一个docker主机上运行一个docker swarm中的几个服务。 所有的服务都在同一个覆盖networking中运行。 这些服务都暴露了Web服务器可用的不同端口。 docker主机运行CoreOS(1520.0.0 Alpha通道)。

有时候我最终会遇到在http://docker-host.local :timeout 发出的请求。 当我在docker-host上login并向localhost发出请求时:它也超时。 但是,从不同容器中的shell向服务的请求确实成功没有问题。

docker service ls显示正确的端口映射。

无法访问的服务看起来是随机的。 有时候,所有的function都是正确的,有时候一个是无法访问的,有时会在一段时间后解决。

我检查了dockernetworking,他们不与主机networking冲突。

我可以通过创build一个nginx服务栈来重现这一点,托pipe默认的网页。 file:docker-compose-test.yml

 version: '3.1' services: nginx1: image: nginx:1.11.8-alpine networks: - test ports: - "10081:80" deploy: replicas: 1 restart_policy: condition: on-failure nginx2: image: nginx:1.11.8-alpine networks: - test ports: - "10082:80" deploy: replicas: 1 restart_policy: condition: on-failure nginx3: image: nginx:1.11.8-alpine networks: - test ports: - "10083:80" deploy: replicas: 1 restart_policy: condition: on-failure nginx4: image: nginx:1.11.8-alpine networks: - test ports: - "10084:80" deploy: replicas: 1 restart_policy: condition: on-failure nginx5: image: nginx:1.11.8-alpine networks: - test ports: - "10085:80" deploy: replicas: 1 restart_policy: condition: on-failure nginx6: image: nginx:1.11.8-alpine networks: - test ports: - "10086:80" deploy: replicas: 1 restart_policy: condition: on-failure nginx7: image: nginx:1.11.8-alpine networks: - test ports: - "10087:80" deploy: replicas: 1 restart_policy: condition: on-failure nginx8: image: nginx:1.11.8-alpine networks: - test ports: - "10088:80" deploy: replicas: 1 restart_policy: condition: on-failure nginx9: image: nginx:1.11.8-alpine networks: - test ports: - "10089:80" deploy: replicas: 1 restart_policy: condition: on-failure networks: test: 

该脚本将部署堆栈,testing可用性并取下堆栈,直到出现错误情况。 文件:test-docker-swarm.sh

 #!/bin/bash DOCKER_HOST=$1 fail=0 while [[ ${fail} -eq 0 ]] ; do docker -H ${DOCKER_HOST} stack deploy -c docker-compose-test.yml test sleep 15 for i in $(seq 1 9) ; do request="http://${DOCKER_HOST}:1008${i}" echo "making request: ${request}" curl -s -o /dev/null --max-time 2 ${request} if [[ $? -ne 0 ]] ; then echo request failed: ${request} fail=1 fi done if [[ ${fail} -eq 0 ]] ; then docker -H ${DOCKER_HOST} stack down test while [[ $(docker -H ${DOCKER_HOST} network ls --filter 'name=^test_' | wc -l) -ne 1 ]]; do echo "waiting for stack to go down" sleep 2 done fi done 

执行运行:`./test-docker-swarm.sh

我不知道我可以采取哪些步骤进行debugging,并解决这个问题。 任何指针赞赏。

docker版本

 Client: Version: 17.06.1-ce API version: 1.30 Go version: go1.8.2 Git commit: 874a737 Built: Tue Aug 29 23:50:27 2017 OS/Arch: linux/amd64 Server: Version: 17.06.1-ce API version: 1.30 (minimum version 1.12) Go version: go1.8.2 Git commit: 874a737 Built: Tue Aug 29 23:50:09 2017 OS/Arch: linux/amd64 Experimental: false 

docker信息

 Containers: 9 Running: 9 Paused: 0 Stopped: 0 Images: 1 Server Version: 17.06.1-ce Storage Driver: overlay2 Backing Filesystem: extfs Supports d_type: true Native Overlay Diff: true Logging Driver: json-file Cgroup Driver: cgroupfs Plugins: Volume: local Network: bridge host macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog Swarm: active NodeID: x06mlhlwqyo3dg4lmigy18z1q Is Manager: true ClusterID: qy022nd3bjn1157sxcc6qzr9n Managers: 1 Nodes: 1 Orchestration: Task History Retention Limit: 5 Raft: Snapshot Interval: 10000 Number of Old Snapshots to Retain: 0 Heartbeat Tick: 1 Election Tick: 3 Dispatcher: Heartbeat Period: 5 seconds CA Configuration: Expiry Duration: 3 months Force Rotate: 0 Root Rotation In Progress: false Node Address: 10.255.11.40 Manager Addresses: 10.255.11.40:2377 Runtimes: runc Default Runtime: runc Init Binary: docker-init containerd version: 6e23458c129b551d5c9871e5174f6b1b7f6d1170 runc version: 810190ceaa507aa2727d7ae6f4790c76ec150bd2 init version: v0.13.2 (expected: 949e6facb77383876aeff8a6944dde66b3089574) Security Options: seccomp Profile: default selinux Kernel Version: 4.13.0-rc7-coreos Operating System: Container Linux by CoreOS 1520.0.0 (Ladybug) OSType: linux Architecture: x86_64 CPUs: 4 Total Memory: 5.776GiB Name: fqfs-development ID: RCNI:3ZUR:LTDA:ABIB:EYEW:HCIY:H2RC:XDNT:LC77:BMQH:FKXI:T6YZ Docker Root Dir: /var/lib/docker Debug Mode (client): false Debug Mode (server): false Registry: https://index.docker.io/v1/ Experimental: false Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false 

github上有一个与你看到的症状相匹配的开放问题 。 我build议在那里继续,为开发人员提供自己的日志,这样他们就可以看到各种报告之间是否有共同之处。