我遇到了一个问题,在我们的网站上进行滚动更新,该网站运行在我们集群的一个容器中,名为website-cluster。 该群集包含两个豆荚。 一个吊舱有一个容器,它运行我们的生产网站,另一个容器运行同一站点的分段版本。 以下是生产窗格的复制控制器的yaml:
apiVersion: v1 kind: ReplicationController metadata: # These labels describe the replication controller labels: project: "website-prod" tier: "front-end" name: "website" name: "website" spec: # specification of the RC's contents replicas: 1 selector: # These labels indicate which pods the replication controller manages project: "website-prod" tier: "front-end" name: "website" template: metadata: labels: # These labels belong to the pod, and must match the ones immediately above # name: "website" project: "website-prod" tier: "front-end" name: "website" spec: containers: - name: "website" image: "us.gcr.io/skywatch-app/website" ports: - name: "http" containerPort: 80 command: ["nginx", "-g", "daemon off;"] livenessProbe: httpGet: path: "/" port: 80 initialDelaySeconds: 60 timeoutSeconds: 3
我们做了一个改变,在我们的网站上增加了一个新的页面。 在将其部署到生产站之后,在testing生产现场时,我们得到了间歇的404s。 我们使用以下命令来更新窗格(假设版本95.0当前正在运行):
packer build website.json gcloud docker push us.gcr.io/skywatch-app/website gcloud container clusters get-credentials website-cluster --zone us-central1-f kubectl rolling-update website --update-period=20s --image=us.gcr.io/skywatch-app/website:96.0
以下是这些命令的输出:
==> docker: Creating a temporary directory for sharing data... ==> docker: Pulling Docker image: nginx:1.9.7 docker: 1.9.7: Pulling from library/nginx docker: d4bce7fd68df: Already exists docker: a3ed95caeb02: Already exists docker: a3ed95caeb02: Already exists docker: 573113c4751a: Already exists docker: 31917632be33: Already exists docker: a3ed95caeb02: Already exists docker: 1e7c116578c5: Already exists docker: 03c02c160fd7: Already exists docker: f852bb4464c4: Already exists docker: a3ed95caeb02: Already exists docker: a3ed95caeb02: Already exists docker: a3ed95caeb02: Already exists docker: Digest: sha256:3b50ebc3ae6fb29b713a708d4dc5c15f4223bde18ddbf3c8730b228093788a3c docker: Status: Image is up to date for nginx:1.9.7 ==> docker: Starting docker container... docker: Run command: docker run -v /tmp/packer-docker358675979:/packer-files -d -i -t nginx:1.9.7 /bin/bash docker: Container ID: 0594bf37edd1311535598971140535166df907b1c19d5f76ddda97c53f884d5b ==> docker: Provisioning with shell script: /tmp/packer-shell010711780 ==> docker: Uploading nginx.conf => /etc/nginx/nginx.conf ==> docker: Uploading ../dist/ => /var/www ==> docker: Uploading ../dist => /skywatch/website ==> docker: Uploading /skywatch/ssl/ => /skywatch/ssl ==> docker: Committing the container docker: Image ID: sha256:d469880ae311d164da6786ec73afbf9190d2056accedc9d2dc186ef8ca79c4b6 ==> docker: Killing the container: 0594bf37edd1311535598971140535166df907b1c19d5f76ddda97c53f884d5b ==> docker: Running post-processor: docker-tag docker (docker-tag): Tagging image: sha256:d469880ae311d164da6786ec73afbf9190d2056accedc9d2dc186ef8ca79c4b6 docker (docker-tag): Repository: us.gcr.io/skywatch-app/website:96.0 Build 'docker' finished. ==> Builds finished. The artifacts of successful builds are: --> docker: Imported Docker image: sha256:d469880ae311d164da6786ec73afbf9190d2056accedc9d2dc186ef8ca79c4b6 --> docker: Imported Docker image: us.gcr.io/skywatch-app/website:96.0 [2016-05-16 15:09:39,598, INFO] The push refers to a repository [us.gcr.io/skywatch-app/website] e75005ca29bf: Preparing 5f70bf18a086: Preparing 5f70bf18a086: Preparing 5f70bf18a086: Preparing 0b3fbb980e2d: Preparing 40f240c1cbdb: Preparing 673cf6d9dedb: Preparing 5f70bf18a086: Preparing ebfc3a74f160: Preparing 031458dc7254: Preparing 5f70bf18a086: Preparing 5f70bf18a086: Preparing 12e469267d21: Preparing ebfc3a74f160: Waiting 031458dc7254: Waiting 12e469267d21: Waiting 5f70bf18a086: Layer already exists 673cf6d9dedb: Layer already exists 40f240c1cbdb: Layer already exists 0b3fbb980e2d: Layer already exists ebfc3a74f160: Layer already exists 031458dc7254: Layer already exists 12e469267d21: Layer already exists e75005ca29bf: Pushed 96.0: digest: sha256:ff865acd292409f3b5bf3c14494a6016a45d5ea831e5260304007a2b83e21189 size: 7328 [2016-05-16 15:09:40,483, INFO] Fetching cluster endpoint and auth data. kubeconfig entry generated for website-cluster. [2016-05-16 15:10:18,823, INFO] Created website-8c10af72294bdfc4d2d6a0e680e84f09 Scaling up website-8c10af72294bdfc4d2d6a0e680e84f09 from 0 to 1, scaling down website from 1 to 0 (keep 1 pods available, don't exceed 2 pods) Scaling website-8c10af72294bdfc4d2d6a0e680e84f09 up to 1 Scaling website down to 0 Update succeeded. Deleting old controller: website Renaming website-8c10af72294bdfc4d2d6a0e680e84f09 to website replicationcontroller "website" rolling updated
这一切看起来不错,但是在完成之后,我们在新的页面上得到了随机的404s。 当我跑kubectl得到豆荚,我发现我有三个豆荚运行,而不是预期的两个豆荚:
NAME READY STATUS RESTARTS AGE website-8c10af72294bdfc4d2d6a0e680e84f09-iwfjo 1/1 Running 0 1d website-keys9 1/1 Running 0 1d website-staging-34caf57c958848415375d54214d98b8a-yo4sp 1/1 Running 0 3d
使用kubectl describe pod命令,我确定pod website-8c10af72294bdfc4d2d6a0e680e84f09-iwfjo正在运行新版本(96.0),而pod website-keys9正在运行旧版本(95.0)。 我们正在获得404s,因为传入的请求将随机提供给旧版本的网站。 当我手动删除运行旧版本的pod时,404会消失。
有谁知道在什么情况下滚动更新不会删除运行旧版本的网站的吊舱? 是否有需要在yaml或命令中更改的内容,以确保运行旧版本的pod的删除始终发生?
感谢任何帮助或build议。
这是Kubernetes错误#27721 。 但即使不是这样,你仍然有一段时间你的用户stream量正在传递给旧的和新的豆荚。 对于大多数应用程序来说,这很好,但在你的情况下,这是不可取的,因为它会导致意外的404s。 我build议你用一个与旧的标签集不同的标签集来创build新的标签集,比如把图像版本放在标签中。 然后你可以更新服务来select新的标签 – 这将很快(不是自动的,但很快)将所有的stream量从旧的服务后端切换到新的。
但切换到使用部署可能更容易。