Kubernetes节点度量标准端点返回401

我有一个GKE集群,为了简单起见,只运行Prometheus,监视每个成员节点。 最近我最近把API服务器升级到了1.6(引入了RBAC),并没有问题。 然后我添加了一个新的节点,运行版本1.6 kubelet。 普罗米修斯无法访问这个新节点的指标API。

普罗米修斯的目标页面

所以,我添加了一个ClusterRoleClusterRoleBinding和一个ServiceAccount到我的命名空间,并configuration了部署使用新的ServiceAccount。 然后,我删除了一个好的措施:

 apiVersion: v1 kind: ServiceAccount metadata: name: prometheus --- apiVersion: rbac.authorization.k8s.io/v1beta1 kind: ClusterRole metadata: name: prometheus rules: - apiGroups: [""] resources: - nodes - services - endpoints - pods verbs: ["get", "list", "watch"] - apiGroups: [""] resources: - configmaps verbs: ["get"] - nonResourceURLs: ["/metrics"] verbs: ["get"] --- apiVersion: rbac.authorization.k8s.io/v1beta1 kind: ClusterRoleBinding metadata: name: prometheus roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: prometheus subjects: - kind: ServiceAccount name: prometheus namespace: default --- apiVersion: v1 kind: ServiceAccount metadata: name: prometheus namespace: default secrets: - name: prometheus-token-xxxxx --- apiVersion: extensions/v1beta1 kind: Deployment metadata: labels: app: prometheus-prometheus component: server release: prometheus name: prometheus-server namespace: default spec: replicas: 1 selector: matchLabels: app: prometheus-prometheus component: server release: prometheus strategy: rollingUpdate: maxSurge: 1 maxUnavailable: 1 type: RollingUpdate template: metadata: labels: app: prometheus-prometheus component: server release: prometheus spec: dnsPolicy: ClusterFirst restartPolicy: Always schedulerName: default-scheduler serviceAccount: prometheus serviceAccountName: prometheus ... 

但情况不变。

度量值端点返回HTTP/1.1 401 Unauthorized ,当我修改部署以包含安装了bash + curl的另一个容器并手动创build请求时,我得到:

 # curl -vsSk -H "Authorization: Bearer $(</var/run/secrets/kubernetes.io/serviceaccount/token)" https://$NODE_IP:10250/metrics * Trying $NODE_IP... * Connected to $NODE_IP ($NODE_IP) port 10250 (#0) * found XXX certificates in /etc/ssl/certs/ca-certificates.crt * found XXX certificates in /etc/ssl/certs * ALPN, offering http/1.1 * SSL connection using TLS1.2 / ECDHE_RSA_AES_128_GCM_SHA256 * server certificate verification SKIPPED * server certificate status verification SKIPPED * common name: node-running-kubelet-1-6@000000000 (does not match '$NODE_IP') * server certificate expiration date OK * server certificate activation date OK * certificate public key: RSA * certificate version: #3 * subject: CN=node-running-kubelet-1-6@000000000 * start date: Fri, 07 Apr 2017 22:00:00 GMT * expire date: Sat, 07 Apr 2018 22:00:00 GMT * issuer: CN=node-running-kubelet-1-6@000000000 * compression: NULL * ALPN, server accepted to use http/1.1 > GET /metrics HTTP/1.1 > Host: $NODE_IP:10250 > User-Agent: curl/7.47.0 > Accept: */* > Authorization: Bearer **censored** > < HTTP/1.1 401 Unauthorized < Date: Mon, 10 Apr 2017 20:04:20 GMT < Content-Length: 12 < Content-Type: text/plain; charset=utf-8 < * Connection #0 to host $NODE_IP left intact 
  • 为什么这个令牌不允许我访问该资源?
  • 如何检查授予ServiceAccount的访问权限?

我遇到了同样的问题,并创build了https://github.com/prometheus/prometheus/issues/2606这个问题,在讨论中通过PR更新了configuration示例https://github.com/prometheus/prometheus/pull / 2641 。

您可以在https://github.com/prometheus/prometheus/blob/master/documentation/examples/prometheus-kubernetes.yml#L76-L84查看更新&#x7684;kubernetes-nodes作业的重新标记

复制以供参考:

  relabel_configs: - action: labelmap regex: __meta_kubernetes_node_label_(.+) - target_label: __address__ replacement: kubernetes.default.svc:443 - source_labels: [__meta_kubernetes_node_name] regex: (.+) target_label: __metrics_path__ replacement: /api/v1/nodes/${1}/proxy/metrics 

对于RBAC本身,您需要使用您自己的服务帐户运行Prometheus

 apiVersion: v1 kind: ServiceAccount metadata: name: prometheus namespace: default 

请务必使用以下Pod规范将该服务帐户传递到Pod:

 spec: serviceAccount: prometheus 

然后,Kubernetes将显示设置适当的RBACangular色和绑定,以使普罗米修斯服务帐户可以访问所需的API端点, url为https://github.com/prometheus/prometheus/blob/master/documentation/examples/rbac-setup .yml

复制以供参考

 apiVersion: rbac.authorization.k8s.io/v1beta1 kind: ClusterRole metadata: name: prometheus rules: - apiGroups: [""] resources: - nodes - nodes/proxy - services - endpoints - pods verbs: ["get", "list", "watch"] - nonResourceURLs: ["/metrics"] verbs: ["get"] --- apiVersion: v1 kind: ServiceAccount metadata: name: prometheus namespace: default --- apiVersion: rbac.authorization.k8s.io/v1beta1 kind: ClusterRoleBinding metadata: name: prometheus roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: prometheus subjects: - kind: ServiceAccount name: prometheus namespace: default 

将所有清单中的名称空间replace为与您运行Prometheus的名称空间相对应的名称空间,然后使用具有集群pipe理员权限的帐户应用清单。

我没有在没有ABAC后备的集群中testing过,所以RBACangular色可能仍然缺less一些必要的东西。

根据@ JorritSalverda的票据讨论; https://github.com/prometheus/prometheus/issues/2606#issuecomment-294869099

由于GKE不允许你获得允许你使用kubelet进行身份validation的客户端证书,对于GKE用户的最佳解决scheme似乎是使用kubernetes API服务器作为代理请求节点。

要做到这一点(引用@JorritSalverda);

“对于我在GKE内部运行的Prometheus服务器,现在可以运行以下重新标记:

 relabel_configs: - action: labelmap regex: __meta_kubernetes_node_label_(.+) - target_label: __address__ replacement: kubernetes.default.svc.cluster.local:443 - target_label: __scheme__ replacement: https - source_labels: [__meta_kubernetes_node_name] regex: (.+) target_label: __metrics_path__ replacement: /api/v1/nodes/${1}/proxy/metrics 

下面的ClusterRole绑定到Prometheus使用的服务帐户:

 apiVersion: rbac.authorization.k8s.io/v1beta1 kind: ClusterRole metadata: name: prometheus rules: - apiGroups: [""] resources: - nodes - nodes/proxy - services - endpoints - pods verbs: ["get", "list", "watch"] 

因为在RBAC失败的情况下,GKE集群仍然有一个ABAC回退,我不是100%肯定的,但它包含了所有必需的权限。