Kube Prometheus項(xiàng)目地址
https://github.com/coreos/kube-prometheus
項(xiàng)目的Helm安裝包地址
https://github.com/helm/charts/blob/master/stable/prometheus-operator
Prometheus官網(wǎng)地址
Prometheus Operator項(xiàng)目地址
https://github.com/coreos/prometheus-operator/
一個(gè)部署樣例
https://github.com/coreos/kube-prometheus/blob/master/examples/example-app/
Prometheus Operator是什么
Prometheus Operator是運(yùn)行在Kubernetes之上的監(jiān)控和告警工具。部署時(shí)不用創(chuàng)建和修改prometheus的配置文件,所有的操作通過(guò)創(chuàng)建prometheus自己的資源對(duì)象來(lái)實(shí)現(xiàn)。對(duì)于監(jiān)控配置的修改可以做到實(shí)時(shí)生效。
Prometheus Operator的自定義資源(CustomResourceDefinitions CRD)
- Prometheus: 定義Prometheus監(jiān)控系統(tǒng)的部署。
- ServiceMonitor:監(jiān)控一組service。該service需要暴露監(jiān)控?cái)?shù)據(jù),供prometheus收集。
- PodMonitor:監(jiān)控一組pod。
- PrometheusRule:Prometheus的規(guī)則文件。包含告警規(guī)則。
- AlertManager:定義告警管理器的部署。
QuickStart
下載kube-prometheus項(xiàng)目。
git clone https://github.com/coreos/kube-prometheus.git
執(zhí)行:
# Create the namespace and CRDs, and then wait for them to be availble before creating the remaining resources
kubectl create -f manifests/setup
# 下面命令為等待setup過(guò)程運(yùn)行完畢
until kubectl get servicemonitors --all-namespaces ; do date; sleep 1; echo ""; done
kubectl create -f manifests/
移除Kube Prometheus
執(zhí)行:
kubectl delete --ignore-not-found=true -f manifests/ -f manifests/setup
訪(fǎng)問(wèn)儀表盤(pán)
可以使用port forward方式訪(fǎng)問(wèn)儀表盤(pán)。
訪(fǎng)問(wèn)Prometheus
$ kubectl --namespace monitoring port-forward svc/prometheus-k8s 9090
訪(fǎng)問(wèn)Grafana
$ kubectl --namespace monitoring port-forward svc/grafana 3000
訪(fǎng)問(wèn)Alert Manager
$ kubectl --namespace monitoring port-forward svc/alertmanager-main 9093
這些服務(wù)的端口可以通過(guò)localhost訪(fǎng)問(wèn)到。
注意:如果需要通過(guò)其他地址訪(fǎng)問(wèn),需要增加address參數(shù)。舉例如下:
$ kubectl --namespace monitoring port-forward --address 0.0.0.0 svc/prometheus-k8s 9090
手動(dòng)部署prometheus operator
上面步驟使用的是Kube Prometheus。該項(xiàng)目?jī)?nèi)置了一系列prometheus operator的資源對(duì)象配置,可以做到一鍵安裝。
Prometheus operator也可以手工方式部署。
安裝Prometheus Operator
- Git下載Prometheus Operator項(xiàng)目
git clone https://github.com/coreos/prometheus-operator.git
- 執(zhí)行命令,創(chuàng)建prometheus-operator對(duì)象和相關(guān)CRD
kubectl apply -f bundle.yaml
- 啟用prometheus資源對(duì)象的RBAC規(guī)則
創(chuàng)建ServiceAccount:
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus
創(chuàng)建ClusterRole:
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
name: prometheus
rules:
- apiGroups: [""]
resources:
- nodes
- services
- endpoints
- pods
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources:
- configmaps
verbs: ["get"]
- nonResourceURLs: ["/metrics"]
verbs: ["get"]
創(chuàng)建ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: prometheus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus
subjects:
- kind: ServiceAccount
name: prometheus
namespace: default
- 創(chuàng)建prometheus資源對(duì)象
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
name: prometheus
spec:
serviceAccountName: prometheus
serviceMonitorSelector:
matchLabels:
team: frontend
podMonitorSelector:
matchLabels:
team: frontend
resources:
requests:
memory: 400Mi
enableAdminAPI: false
通過(guò)serviceMonitorSelector和podMonitorSelector決定哪些ServiceMonitor和PodMonitor生效。如果選擇器為空({})意味著會(huì)選擇所有的對(duì)象。
- 部署自己的應(yīng)用。
下面舉一個(gè)例子:
創(chuàng)建一個(gè)Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: example-app
spec:
replicas: 3
selector:
matchLabels:
app: example-app
template:
metadata:
labels:
app: example-app
spec:
containers:
- name: example-app
image: fabxc/instrumented_app
ports:
- name: web
containerPort: 8080
這里假定我們的監(jiān)控?cái)?shù)據(jù)在8080端口暴露。
再創(chuàng)建一個(gè)service,即訪(fǎng)問(wèn)監(jiān)控?cái)?shù)據(jù)的service。
kind: Service
apiVersion: v1
metadata:
name: example-app
labels:
app: example-app
spec:
selector:
app: example-app
ports:
- name: web
port: 8080
- 創(chuàng)建ServiceMonitor
這一步我們需要Prometheus讀取上一步創(chuàng)建的service暴露的監(jiān)控?cái)?shù)據(jù)。需要借助于ServiceMonitor完成。
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: example-app
labels:
team: frontend
spec:
selector:
matchLabels:
app: example-app
endpoints:
- port: web
注意:
- 這里的selector需要匹配上一步創(chuàng)建出來(lái)的service。
- endpoints的port只能配置為service中的命名端口,不能使用數(shù)字。
- 需要確保prometheus對(duì)象的
serviceMonitorSelector和serviceMonitorNamespaceSelector匹配這一步創(chuàng)建出的ServiceMonitor對(duì)象。
- 暴露prometheus端口
如果需要暴露prometheus端口可以在集群外訪(fǎng)問(wèn),需要執(zhí)行此步驟。
apiVersion: v1
kind: Service
metadata:
name: prometheus
spec:
type: NodePort
ports:
- name: web
nodePort: 30900
port: 9090
protocol: TCP
targetPort: web
selector:
prometheus: prometheus
這里使用創(chuàng)建了一個(gè)使用NodePort的Service。
Prometheus資源對(duì)象
Prometheus資源對(duì)象的作用相當(dāng)于整個(gè)Prometheus的配置中心。
Prometheus資源對(duì)象描述文件如下:
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
creationTimestamp: "2020-02-12T04:38:38Z"
generation: 1
labels:
prometheus: k8s
name: k8s
namespace: monitoring
resourceVersion: "3745"
selfLink: /apis/monitoring.coreos.com/v1/namespaces/monitoring/prometheuses/k8s
uid: 3d66375e-b8fb-453b-bcd2-a9ef1fd75387
spec:
alerting:
alertmanagers:
- name: alertmanager-main
namespace: monitoring
port: web
baseImage: quay.io/prometheus/prometheus
nodeSelector:
kubernetes.io/os: linux
podMonitorNamespaceSelector: {}
podMonitorSelector: {}
replicas: 2
resources:
requests:
memory: 400Mi
ruleSelector:
matchLabels:
prometheus: k8s
role: alert-rules
securityContext:
fsGroup: 2000
runAsNonRoot: true
runAsUser: 1000
serviceAccountName: prometheus-k8s
serviceMonitorNamespaceSelector: {}
serviceMonitorSelector: {}
version: v2.15.2
其中,限制監(jiān)控范圍的配置有如下四個(gè):
- podMonitorNamespaceSelector:掃描哪個(gè)namespace下的PodMonitor,如果為空,則掃描所有的namespace。
- serviceMonitorNamespaceSelector:掃描哪個(gè)namespace下的ServiceMonitor,如果為空,則掃描所有的namespace。
- podMonitorSelector:通過(guò)selector配置掃描哪些PodMonitor。如果為空,則掃描所有PodMonitor。
- serviceMonitorSelector:通過(guò)selector配置掃描哪些ServiceMonitor。如果為空,則掃描所有ServiceMonitor。
除此之外還有一個(gè)ruleSelector,只有匹配該selector的PrometheusRules才會(huì)被讀取。因此我們?nèi)缡怯媚J(rèn)的prometheus配置,自己創(chuàng)建的PrometheusRules需要有如下兩個(gè)標(biāo)簽:
prometheus: k8s
role: alert-rules
指定Prometheus的遠(yuǎn)程存儲(chǔ)
生產(chǎn)環(huán)境Prometheus的監(jiān)控?cái)?shù)據(jù)需要落地到數(shù)據(jù)庫(kù)中。
建議使用Influx數(shù)據(jù)庫(kù)。它和Prometheus的兼容性最好。
安裝InfluxDB
InfluxDB官網(wǎng)鏈接:https://www.influxdata.com/
下載安裝并啟動(dòng)服務(wù)即可。
# 啟動(dòng)InfluxDB
systemctl start influxdb
# 進(jìn)入InfluxDB
influx
創(chuàng)建一個(gè)名為prometheus的數(shù)據(jù)庫(kù):
curl -XPOST http://localhost:8086/query --data-urlencode "q=CREATE DATABASE prometheus"
編譯并運(yùn)行Remote storage adapter
Prometheus使用Influx作為遠(yuǎn)程存儲(chǔ)需要一個(gè)remote_storage_adapter。remote_storage_adapter可以支持Graphite, Influxdb和Opentsdb。其中Influxdb支持READ和WRITE模式。
使用Git clone源代碼之后,執(zhí)行go build命令編譯。
接下來(lái)運(yùn)行Remote storage adapter
./remote_storage_adapter --influxdb-url=http://localhost:8086/ --influxdb.database=prometheus --influxdb.retention-policy=autogen
注意:這里Influxdb默認(rèn)端口是8086,使用的數(shù)據(jù)庫(kù)名為prometheus。
配置prometheus資源對(duì)象
涉及的配置項(xiàng)解釋如下:
- remoteRead 獲取數(shù)據(jù)的URL
- remoteWrite 寫(xiě)入數(shù)據(jù)的URL
修改prometheus資源對(duì)象的配置文件,增加:
spec:
remoteRead:
- url: "http://localhost:9201/read"
remoteWrite:
- url: "http://localhost:9201/write"
注意:9201端口是remote_storage_adapter默認(rèn)監(jiān)聽(tīng)的端口。
PS:prometheus原生配置文件的配置方法如下:
# Remote write configuration (for Graphite, OpenTSDB, or InfluxDB).
remote_write:
- url: "http://localhost:9201/write"
# Remote read configuration (for InfluxDB only at the moment).
remote_read:
- url: "http://localhost:9201/read"
ServiceMonitor資源資源對(duì)象
配置Prometheus從一個(gè)Service讀取監(jiān)控信息。
首先配置一個(gè)service,用來(lái)指定監(jiān)控信息暴露端口。
kind: Service
apiVersion: v1
metadata:
name: example-app
labels:
app: example-app
spec:
selector:
app: example-app
ports:
- name: web
port: 8080
監(jiān)控信息從這個(gè)pod的8080端口暴露。
再創(chuàng)建一個(gè)ServiceMonitor:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: example-app
labels:
team: frontend
spec:
selector:
matchLabels:
app: example-app
endpoints:
- port: web
port這個(gè)地方必須使用命名端口。
PodMonitor
配置Prometheus從一個(gè)Pod讀取監(jiān)控信息。
注意:目前配置項(xiàng)作用尚未明確,這里給出部分配置項(xiàng)。
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: example-pod-monitor
namespace: default
labels:
app: example
spec:
podMetricsEndpoints:
selector:
podTargetLabels:
sampleLimit:
jobLabel:
PrometheusRule
用于配置告警規(guī)則。
示例如下:
kind: PrometheusRule
metadata:
labels:
prometheus: k8s
role: alert-rules
name: prometheus-k8s-rules
spec:
groups:
- name: k8s.rules
rules:
- alert: KubeletDown
annotations:
message: Kubelet has disappeared from Prometheus target discovery.
expr: |
absent(up{job="kubelet"} == 1)
for: 15m
labels:
severity: critical
和Ingress配合使用
除了使用NodePort暴露prometheus服務(wù)到集群外,我們還可以使用Ingress的方式暴露服務(wù)。
Ingress的配置如下所示:
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
name: monitoring
annotations:
nginx.ingress.kubernetes.io/rewrite-target: "/$1"
spec:
rules:
- http:
paths:
- backend:
serviceName: prometheus
servicePort: 9090
path: /prometheus/(.*)
該Ingress將/prometheus/映射為prometheus這個(gè)service。此時(shí)可以通過(guò)http://hostname/prometheus/訪(fǎng)問(wèn)到Prometheus server。但有個(gè)問(wèn)題,頁(yè)面的靜態(tài)資源沒(méi)法加載。
為了解決這個(gè)問(wèn)題,接下來(lái)需要為Prometheus server添加一個(gè)context path的配置。
Prometheus對(duì)象有一個(gè)externalUrl的配置項(xiàng),它包含了context path的功能,需要配置為完整的對(duì)外暴露的URL。如下所示:
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
name: main
spec:
replicas: 2
version: v2.15.2
externalUrl: http://hostname/prometheus/
resources:
requests:
memory: 400Mi
更詳細(xì)的使用方式可參考:
https://coreos.com/operators/prometheus/docs/latest/user-guides/exposing-prometheus-and-alertmanager.html
使用示例
https://github.com/coreos/kube-prometheus/blob/master/examples/example-app/