K8s 集群監(jiān)控之Kube Prometheus(Prometheus Operator)

Kube Prometheus項(xiàng)目地址

https://github.com/coreos/kube-prometheus

項(xiàng)目的Helm安裝包地址

https://github.com/helm/charts/blob/master/stable/prometheus-operator

Prometheus官網(wǎng)地址

https://prometheus.io/

Prometheus Operator項(xiàng)目地址

https://github.com/coreos/prometheus-operator/

一個(gè)部署樣例

https://github.com/coreos/kube-prometheus/blob/master/examples/example-app/

Prometheus Operator是什么

Prometheus Operator是運(yùn)行在Kubernetes之上的監(jiān)控和告警工具。部署時(shí)不用創(chuàng)建和修改prometheus的配置文件,所有的操作通過(guò)創(chuàng)建prometheus自己的資源對(duì)象來(lái)實(shí)現(xiàn)。對(duì)于監(jiān)控配置的修改可以做到實(shí)時(shí)生效。

Prometheus Operator的自定義資源(CustomResourceDefinitions CRD)

  • Prometheus: 定義Prometheus監(jiān)控系統(tǒng)的部署。
  • ServiceMonitor:監(jiān)控一組service。該service需要暴露監(jiān)控?cái)?shù)據(jù),供prometheus收集。
  • PodMonitor:監(jiān)控一組pod。
  • PrometheusRule:Prometheus的規(guī)則文件。包含告警規(guī)則。
  • AlertManager:定義告警管理器的部署。

QuickStart

下載kube-prometheus項(xiàng)目。

git clone https://github.com/coreos/kube-prometheus.git

執(zhí)行:

# Create the namespace and CRDs, and then wait for them to be availble before creating the remaining resources
kubectl create -f manifests/setup
# 下面命令為等待setup過(guò)程運(yùn)行完畢
until kubectl get servicemonitors --all-namespaces ; do date; sleep 1; echo ""; done
kubectl create -f manifests/

移除Kube Prometheus

執(zhí)行:

kubectl delete --ignore-not-found=true -f manifests/ -f manifests/setup

訪(fǎng)問(wèn)儀表盤(pán)

可以使用port forward方式訪(fǎng)問(wèn)儀表盤(pán)。

訪(fǎng)問(wèn)Prometheus

$ kubectl --namespace monitoring port-forward svc/prometheus-k8s 9090

訪(fǎng)問(wèn)Grafana

$ kubectl --namespace monitoring port-forward svc/grafana 3000

訪(fǎng)問(wèn)Alert Manager

$ kubectl --namespace monitoring port-forward svc/alertmanager-main 9093

這些服務(wù)的端口可以通過(guò)localhost訪(fǎng)問(wèn)到。

注意:如果需要通過(guò)其他地址訪(fǎng)問(wèn),需要增加address參數(shù)。舉例如下:

$ kubectl --namespace monitoring port-forward --address 0.0.0.0 svc/prometheus-k8s 9090

手動(dòng)部署prometheus operator

上面步驟使用的是Kube Prometheus。該項(xiàng)目?jī)?nèi)置了一系列prometheus operator的資源對(duì)象配置,可以做到一鍵安裝。

Prometheus operator也可以手工方式部署。

安裝Prometheus Operator

  1. Git下載Prometheus Operator項(xiàng)目
git clone https://github.com/coreos/prometheus-operator.git
  1. 執(zhí)行命令,創(chuàng)建prometheus-operator對(duì)象和相關(guān)CRD
kubectl apply -f bundle.yaml
  1. 啟用prometheus資源對(duì)象的RBAC規(guī)則

創(chuàng)建ServiceAccount:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus

創(chuàng)建ClusterRole:

apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  name: prometheus
rules:
- apiGroups: [""]
  resources:
  - nodes
  - services
  - endpoints
  - pods
  verbs: ["get", "list", "watch"]
- apiGroups: [""]
  resources:
  - configmaps
  verbs: ["get"]
- nonResourceURLs: ["/metrics"]
  verbs: ["get"]

創(chuàng)建ClusterRoleBinding

apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: prometheus
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus
subjects:
- kind: ServiceAccount
  name: prometheus
  namespace: default
  1. 創(chuàng)建prometheus資源對(duì)象
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: prometheus
spec:
  serviceAccountName: prometheus
  serviceMonitorSelector:
    matchLabels:
      team: frontend
  podMonitorSelector:
    matchLabels:
      team: frontend
  resources:
    requests:
      memory: 400Mi
  enableAdminAPI: false

通過(guò)serviceMonitorSelectorpodMonitorSelector決定哪些ServiceMonitor和PodMonitor生效。如果選擇器為空({})意味著會(huì)選擇所有的對(duì)象。

  1. 部署自己的應(yīng)用。
    下面舉一個(gè)例子:

創(chuàng)建一個(gè)Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: example-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: example-app
  template:
    metadata:
      labels:
        app: example-app
    spec:
      containers:
      - name: example-app
        image: fabxc/instrumented_app
        ports:
        - name: web
          containerPort: 8080

這里假定我們的監(jiān)控?cái)?shù)據(jù)在8080端口暴露。

再創(chuàng)建一個(gè)service,即訪(fǎng)問(wèn)監(jiān)控?cái)?shù)據(jù)的service。

kind: Service
apiVersion: v1
metadata:
  name: example-app
  labels:
    app: example-app
spec:
  selector:
    app: example-app
  ports:
  - name: web
    port: 8080
  1. 創(chuàng)建ServiceMonitor

這一步我們需要Prometheus讀取上一步創(chuàng)建的service暴露的監(jiān)控?cái)?shù)據(jù)。需要借助于ServiceMonitor完成。

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: example-app
  labels:
    team: frontend
spec:
  selector:
    matchLabels:
      app: example-app
  endpoints:
  - port: web

注意:

  • 這里的selector需要匹配上一步創(chuàng)建出來(lái)的service。
  • endpoints的port只能配置為service中的命名端口,不能使用數(shù)字。
  • 需要確保prometheus對(duì)象的serviceMonitorSelectorserviceMonitorNamespaceSelector匹配這一步創(chuàng)建出的ServiceMonitor對(duì)象。
  1. 暴露prometheus端口

如果需要暴露prometheus端口可以在集群外訪(fǎng)問(wèn),需要執(zhí)行此步驟。

apiVersion: v1
kind: Service
metadata:
  name: prometheus
spec:
  type: NodePort
  ports:
  - name: web
    nodePort: 30900
    port: 9090
    protocol: TCP
    targetPort: web
  selector:
    prometheus: prometheus

這里使用創(chuàng)建了一個(gè)使用NodePort的Service。

Prometheus資源對(duì)象

Prometheus資源對(duì)象的作用相當(dāng)于整個(gè)Prometheus的配置中心。

Prometheus資源對(duì)象描述文件如下:

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  creationTimestamp: "2020-02-12T04:38:38Z"
  generation: 1
  labels:
    prometheus: k8s
  name: k8s
  namespace: monitoring
  resourceVersion: "3745"
  selfLink: /apis/monitoring.coreos.com/v1/namespaces/monitoring/prometheuses/k8s
  uid: 3d66375e-b8fb-453b-bcd2-a9ef1fd75387
spec:
  alerting:
    alertmanagers:
    - name: alertmanager-main
      namespace: monitoring
      port: web
  baseImage: quay.io/prometheus/prometheus
  nodeSelector:
    kubernetes.io/os: linux
  podMonitorNamespaceSelector: {}
  podMonitorSelector: {}
  replicas: 2
  resources:
    requests:
      memory: 400Mi
  ruleSelector:
    matchLabels:
      prometheus: k8s
      role: alert-rules
  securityContext:
    fsGroup: 2000
    runAsNonRoot: true
    runAsUser: 1000
  serviceAccountName: prometheus-k8s
  serviceMonitorNamespaceSelector: {}
  serviceMonitorSelector: {}
  version: v2.15.2

其中,限制監(jiān)控范圍的配置有如下四個(gè):

  • podMonitorNamespaceSelector:掃描哪個(gè)namespace下的PodMonitor,如果為空,則掃描所有的namespace。
  • serviceMonitorNamespaceSelector:掃描哪個(gè)namespace下的ServiceMonitor,如果為空,則掃描所有的namespace。
  • podMonitorSelector:通過(guò)selector配置掃描哪些PodMonitor。如果為空,則掃描所有PodMonitor。
  • serviceMonitorSelector:通過(guò)selector配置掃描哪些ServiceMonitor。如果為空,則掃描所有ServiceMonitor。

除此之外還有一個(gè)ruleSelector,只有匹配該selector的PrometheusRules才會(huì)被讀取。因此我們?nèi)缡怯媚J(rèn)的prometheus配置,自己創(chuàng)建的PrometheusRules需要有如下兩個(gè)標(biāo)簽:

prometheus: k8s
role: alert-rules

指定Prometheus的遠(yuǎn)程存儲(chǔ)

生產(chǎn)環(huán)境Prometheus的監(jiān)控?cái)?shù)據(jù)需要落地到數(shù)據(jù)庫(kù)中。

建議使用Influx數(shù)據(jù)庫(kù)。它和Prometheus的兼容性最好。

安裝InfluxDB

InfluxDB官網(wǎng)鏈接:https://www.influxdata.com/

下載安裝并啟動(dòng)服務(wù)即可。

# 啟動(dòng)InfluxDB
systemctl start influxdb

# 進(jìn)入InfluxDB
influx

創(chuàng)建一個(gè)名為prometheus的數(shù)據(jù)庫(kù):

curl -XPOST http://localhost:8086/query --data-urlencode "q=CREATE DATABASE prometheus"

編譯并運(yùn)行Remote storage adapter

Prometheus使用Influx作為遠(yuǎn)程存儲(chǔ)需要一個(gè)remote_storage_adapter。remote_storage_adapter可以支持Graphite, Influxdb和Opentsdb。其中Influxdb支持READ和WRITE模式。

源代碼鏈接如下:https://github.com/prometheus/prometheus/tree/master/documentation/examples/remote_storage/remote_storage_adapter

使用Git clone源代碼之后,執(zhí)行go build命令編譯。

接下來(lái)運(yùn)行Remote storage adapter

./remote_storage_adapter --influxdb-url=http://localhost:8086/ --influxdb.database=prometheus --influxdb.retention-policy=autogen

注意:這里Influxdb默認(rèn)端口是8086,使用的數(shù)據(jù)庫(kù)名為prometheus。

配置prometheus資源對(duì)象

涉及的配置項(xiàng)解釋如下:

  • remoteRead 獲取數(shù)據(jù)的URL
  • remoteWrite 寫(xiě)入數(shù)據(jù)的URL

修改prometheus資源對(duì)象的配置文件,增加:

spec:
  remoteRead:
    - url: "http://localhost:9201/read"
  remoteWrite:
    - url: "http://localhost:9201/write"

注意:9201端口是remote_storage_adapter默認(rèn)監(jiān)聽(tīng)的端口。

PS:prometheus原生配置文件的配置方法如下:

# Remote write configuration (for Graphite, OpenTSDB, or InfluxDB).
remote_write:
  - url: "http://localhost:9201/write"

# Remote read configuration (for InfluxDB only at the moment).
remote_read:
  - url: "http://localhost:9201/read"

ServiceMonitor資源資源對(duì)象

配置Prometheus從一個(gè)Service讀取監(jiān)控信息。

首先配置一個(gè)service,用來(lái)指定監(jiān)控信息暴露端口。

kind: Service
apiVersion: v1
metadata:
  name: example-app
  labels:
    app: example-app
spec:
  selector:
    app: example-app
  ports:
  - name: web
    port: 8080

監(jiān)控信息從這個(gè)pod的8080端口暴露。

再創(chuàng)建一個(gè)ServiceMonitor:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: example-app
  labels:
    team: frontend
spec:
  selector:
    matchLabels:
      app: example-app
  endpoints:
  - port: web

port這個(gè)地方必須使用命名端口。

PodMonitor

配置Prometheus從一個(gè)Pod讀取監(jiān)控信息。

注意:目前配置項(xiàng)作用尚未明確,這里給出部分配置項(xiàng)。

apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: example-pod-monitor
  namespace: default
  labels:
    app: example
spec:
  podMetricsEndpoints:
  selector:
  podTargetLabels:
  sampleLimit:
  jobLabel:

PrometheusRule

用于配置告警規(guī)則。

示例如下:

kind: PrometheusRule
metadata:
labels: 
  prometheus: k8s
  role: alert-rules
name: prometheus-k8s-rules
spec:
  groups:
  - name: k8s.rules
  rules: 
  - alert: KubeletDown
    annotations:
      message: Kubelet has disappeared from Prometheus target discovery.
    expr: |
      absent(up{job="kubelet"} == 1)
    for: 15m
    labels:
      severity: critical

和Ingress配合使用

除了使用NodePort暴露prometheus服務(wù)到集群外,我們還可以使用Ingress的方式暴露服務(wù)。

Ingress的配置如下所示:

apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: monitoring
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: "/$1"
spec:
  rules:
  - http:
      paths:
      - backend:
          serviceName: prometheus
          servicePort: 9090
        path: /prometheus/(.*)

該Ingress將/prometheus/映射為prometheus這個(gè)service。此時(shí)可以通過(guò)http://hostname/prometheus/訪(fǎng)問(wèn)到Prometheus server。但有個(gè)問(wèn)題,頁(yè)面的靜態(tài)資源沒(méi)法加載。

為了解決這個(gè)問(wèn)題,接下來(lái)需要為Prometheus server添加一個(gè)context path的配置。

Prometheus對(duì)象有一個(gè)externalUrl的配置項(xiàng),它包含了context path的功能,需要配置為完整的對(duì)外暴露的URL。如下所示:

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: main
spec:
  replicas: 2
  version: v2.15.2
  externalUrl: http://hostname/prometheus/
  resources:
    requests:
      memory: 400Mi

更詳細(xì)的使用方式可參考:
https://coreos.com/operators/prometheus/docs/latest/user-guides/exposing-prometheus-and-alertmanager.html

使用示例

https://github.com/coreos/kube-prometheus/blob/master/examples/example-app/

參考文檔

https://github.com/coreos/prometheus-operator/blob/master/Documentation/user-guides/getting-started.md

https://coreos.com/operators/prometheus/docs/latest/user-guides/exposing-prometheus-and-alertmanager.html

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀(guān)點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

  • Operator Operator是由CoreOS公司開(kāi)發(fā)的,用來(lái)擴(kuò)展 Kubernetes API,特定的應(yīng)用程...
    祁恩達(dá)閱讀 9,253評(píng)論 1 2
  • 1. 組件版本和配置策略 組件版本: Kubernetes 1.10.4 Docker 18.03.1-ce Et...
    Anson前行閱讀 5,956評(píng)論 0 11
  • ########文章是對(duì)官網(wǎng)給出的文檔做了翻譯-_-以及實(shí)操后的個(gè)人理解所得,若有存在不足或者不同之處還望各位大神...
    Feel_狗煥閱讀 85,576評(píng)論 2 20
  • 去年今日,邯鄲籠罩在爆表的霧霾中,一年后你怎會(huì)想到家鄉(xiāng)又擁有了如此潔凈的藍(lán)天,點(diǎn)個(gè)贊!
    觀(guān)其變閱讀 282評(píng)論 0 0
  • 今天陪閨女去上了一節(jié)古箏課,收獲多多,閨女剛開(kāi)始有點(diǎn)拘謹(jǐn),后期慢慢放開(kāi)了,在課上活潑好動(dòng),讓我見(jiàn)識(shí)了她的另...
    若塵_27ac閱讀 221評(píng)論 0 0

友情鏈接更多精彩內(nèi)容