利用grafana+influxdb+prometheus搭建監(jiān)控系統(tǒng)

監(jiān)控系統(tǒng)

  • 數(shù)據(jù)可視化:Grafana

  • 數(shù)據(jù)存儲(chǔ):InfluxDB/Prometheus

  • 數(shù)據(jù)采集:Telegraf/NodeExporter

Grafana

Grafana官方提供了很多dashboard,可以用來(lái)呈現(xiàn)操作系統(tǒng)、數(shù)據(jù)庫(kù)、應(yīng)用程序的運(yùn)行狀態(tài)。

我選擇了以下幾個(gè)dashboard:

這里選擇的系統(tǒng)dashboard和數(shù)據(jù)庫(kù)dashboard采用了InfluxDB作為數(shù)據(jù)源,InfluxDB一般通過(guò)Telegraf采集數(shù)據(jù)。

Java應(yīng)用dashboard采用了Prometheus作為數(shù)據(jù)源,Prometheus一般通過(guò)NodeExporter采集數(shù)據(jù),對(duì)于Java應(yīng)用,可以借助micrometer采集數(shù)據(jù)。

參考資料:

Grafana安裝:

https://grafana.com/docs/grafana/latest/installation/rpm/#install-manually-with-yum

Grafana基本操作,包括創(chuàng)建數(shù)據(jù)源、創(chuàng)建dashboard等。

https://grafana.com/tutorials/grafana-fundamentals/#1

InfluxDB

InfluxDB概念

概念 數(shù)據(jù)庫(kù) 記錄 數(shù)據(jù)保留多久,保留多少份 索引字段 普通字段 記錄的時(shí)間戳
InfluxDB database measurement point retention policy tag field timestamp
MySQL database table row indexed column column

參考資料:

https://docs.influxdata.com/influxdb/v1.8/concepts/key_concepts/

Sample Data

  • 創(chuàng)建數(shù)據(jù)庫(kù)
    CREATE DATABASE NOAA_water_database
  • 下載并寫(xiě)入數(shù)據(jù)
    curl https://s3.amazonaws.com/noaa.water-database/NOAA_data.txt -o NOAA_data.txt
    influx -import -path=NOAA_data.txt -precision=s -database=NOAA_water_database
  • 測(cè)試查詢(xún)
   > SHOW measurements
    name: measurements
    ------------------
    name
    average_temperature
    h2o_feet
    h2o_pH
    h2o_quality
    h2o_temperature
    ?
    > SELECT COUNT("water_level") FROM h2o_feet
    name: h2o_feet
    --------------
    time                        count
    1970-01-01T00:00:00Z  15258
    ?
    > SELECT * FROM h2o_feet LIMIT 2
    name: h2o_feet
    --------------
    time                    level description       location        water_level
    2015-08-18T00:00:00Z    below 3 feet               santa_monica    2.064
    2015-08-18T00:00:00Z    between 6 and 9 feet       coyote_creek    8.12

參考資料:

https://docs.influxdata.com/influxdb/v1.8/query_language/sample-data/

Explore Schema

  • SHOW DATABASES

  • SHOW MEASUREMENTS

  • SHOW TAG KEYS

  • SHOW FIELD KEYS

參考資料:

https://docs.influxdata.com/influxdb/v1.8/query_language/explore-schema/

Explore Data

  • The SELECT statement
    SELECT <field_key>[,<field_key>,<tag_key>] FROM <measurement_name>[,<measurement_name>]
  • The WHERE clause
    SELECT_clause FROM_clause WHERE <conditional_expression> [(AND|OR) <conditional_expression> [...]]
  • The GROUP By clause
    SELECT_clause FROM_clause [WHERE_clause] GROUP BY [* | <tag_key>[,<tag_key]]
  • ORDER BY time DESC

  • The LIMIT and SLIMIT clauses

參考資料:

https://docs.influxdata.com/influxdb/v1.8/query_language/explore-data/

Functions

  • 聚合(Aggregations)

  • 選擇(Selectors)

  • 轉(zhuǎn)換(Transformations)

參考資料:

https://docs.influxdata.com/influxdb/v1.8/query_language/functions/

Telegraf

telegraf用于采集數(shù)據(jù),輸出到influxdb中。

telegraf支持采集系統(tǒng)和數(shù)據(jù)庫(kù)的指標(biāo)數(shù)據(jù),只需要在/etc/telegraf/telegraf.conf做簡(jiǎn)單的配置。

telegraf在寫(xiě)入數(shù)據(jù)時(shí),會(huì)為每一條數(shù)據(jù)加上一個(gè)tag[host],用來(lái)區(qū)分是哪個(gè)應(yīng)用上報(bào)的數(shù)據(jù)。host的值可以在telegraf.conf中配置,也可以修改linux hostname。

### OUTPUT
?
# Configuration for influxdb server to send metrics to
[[outputs.influxdb]]
 urls = ["http://localhost:8089"]
 database = "telegraf_metrics"
?
 ## Retention policy to write to. Empty string writes to the default rp.
 retention_policy = ""
 ## Write consistency (clusters only), can be: "any", "one", "quorum", "all"
 write_consistency = "any"
?
 ## Write timeout (for the InfluxDB client), formatted as a string.
 ## If not provided, will default to 5s. 0s means no timeout (not recommended).
 timeout = "5s"
?
# Read metrics about cpu usage
[[inputs.cpu]]
 ## Whether to report per-cpu stats or not
 percpu = true
 ## Whether to report total system cpu stats or not
 totalcpu = true
 ## Comment this line if you want the raw CPU time metrics
 fielddrop = ["time_*"]
?
?
# Read metrics about disk usage by mount point
[[inputs.disk]]
 ## By default, telegraf gather stats for all mountpoints.
 ## Setting mountpoints will restrict the stats to the specified mountpoints.
 # mount_points = ["/"]
?
 ## Ignore some mountpoints by filesystem type. For example (dev)tmpfs (usually
 ## present on /run, /var/run, /dev/shm or /dev).
 ignore_fs = ["tmpfs", "devtmpfs"]
?
?
# Read metrics about disk IO by device
[[inputs.diskio]]
 ## By default, telegraf will gather stats for all devices including
 ## disk partitions.
 ## Setting devices will restrict the stats to the specified devices.
 # devices = ["sda", "sdb"]
 ## Uncomment the following line if you need disk serial numbers.
 # skip_serial_number = false
?
?
# Get kernel statistics from /proc/stat
[[inputs.kernel]]
 # no configuration
?
?
# Read metrics about memory usage
[[inputs.mem]]
 # no configuration
?
?
# Get the number of processes and group them by status
[[inputs.processes]]
 # no configuration
?
?
# Read metrics about swap memory usage
[[inputs.swap]]
 # no configuration
?
?
# Read metrics about system load & uptime
[[inputs.system]]
 # no configuration
?
# Read metrics about network interface usage
[[inputs.net]]
 # collect data only about specific interfaces
 # interfaces = ["eth0"]
?
[[inputs.netstat]]
 # no configuration

[[inputs.mysql]]
 server = ["root:root@tcp(127.0.0.1:3306)/"]

Prometheus

架構(gòu)

prometheus_architecture.png

概念

概念 數(shù)據(jù)庫(kù) 記錄 數(shù)據(jù)保留多久,保留多少份 索引字段 普通字段 記錄的時(shí)間戳
Prometheus - metric time series - - label timestamp
InfluxDB database measurement point retention policy tag field timestamp
MySQL database table row indexed column column

Prometheus和InfluxDB差異:

  • Prometheus metric的一條記錄由多個(gè)label加一個(gè)value構(gòu)成,metric類(lèi)型包括Counter、Gauge、Histogram、Summary,InfluxDB measurement并沒(méi)有區(qū)分這些類(lèi)型。

  • Prometheus通過(guò)pull的方式拉取數(shù)據(jù),InfluxDB通過(guò)push的方式推送數(shù)據(jù)。

  • Prometheus的一條記錄一般只有一個(gè)value,同樣是記錄cpu的指標(biāo)數(shù)據(jù),InfluxDB measurement會(huì)包含3個(gè)field[usage_idle, usage_system, usage_user],1條記錄[97, 2, 1],Prometheus table會(huì)包含1個(gè)label[mode],3條記錄['idle', 97], ['system', 2], ['user', 1]。

參考資料:

https://prometheus.io/docs/concepts/metric_types/

查詢(xún)數(shù)據(jù)

Prometheus通過(guò)網(wǎng)頁(yè)查詢(xún)數(shù)據(jù),默認(rèn)地址是http://your_host:9090。

${Prometheus_home}/prometheus.yml文件可以添加需要拉取數(shù)據(jù)的實(shí)例(instance),通過(guò)Metric Up 可以查看所有實(shí)例的工作狀態(tài)。

參考資料:

https://prometheus.io/docs/prometheus/latest/querying/examples/

Micrometer

micrometer用于采集java應(yīng)用的指標(biāo)數(shù)據(jù),可以適配多數(shù)主流的監(jiān)控系統(tǒng),比如Prometheus、InfluxDB。有點(diǎn)像SLF4J,適配很多日志系統(tǒng),而micrometer面向的是應(yīng)用的Metrics。

使用Spring為Prometheus提供指標(biāo)數(shù)據(jù):

@Controller
@RequestMapping(value = "/prometheus")
public class PrometheusController {
?
   @Getter
   private PrometheusMeterRegistry registry;
?
   @PostConstruct
   private void init() {
     PrometheusConfig config = k -> {
       return null;
     };
     this.registry = new PrometheusMeterRegistry(config);
     this.registry.config().commonTags("application", "myAppName");
     new ClassLoaderMetrics().bindTo(this.registry);
     new JvmMemoryMetrics().bindTo(this.registry);
     new JvmGcMetrics().bindTo(this.registry);
     new ProcessorMetrics().bindTo(this.registry);
     new JvmThreadMetrics().bindTo(this.registry);
   }
?
   @RequestMapping(method = { RequestMethod.Get, RequestMethod.POST})
   public void index(HttpServletRequest req, HttpServletResponse resp) {
     resp.getWriter().write(registry.scrape());
     resp.getWriter().flush();
   }
}

參考資料:

https://micrometer.io/docs

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

友情鏈接更多精彩內(nèi)容