我是一塊磚,哪里需要哪里搬!隨著項(xiàng)目數(shù)量越來越多,總是遇到服務(wù)出現(xiàn)問題后都是由客戶先發(fā)現(xiàn)問題,再一層一層的反饋到開發(fā)人員,這樣不僅用戶體驗(yàn)不好,還會(huì)出現(xiàn)服務(wù)掛了很長時(shí)間后才被發(fā)現(xiàn),日志已經(jīng)被自動(dòng)清除,無法進(jìn)行bug查找?;谶@種情況,我們組搭建起了elk,但是僅僅有elk還是不夠的,如何在故障產(chǎn)生后,及時(shí)的通知到相關(guān)人員這也是非常重要的。
本著開發(fā)量盡量少、功能盡量強(qiáng)大、對內(nèi)存要求盡量低的原則,分析對比了網(wǎng)上多種基于日志的告警系統(tǒng),大致有以下幾種:
1、cat:大眾點(diǎn)評開源的告警系統(tǒng),功能強(qiáng)大,相對重量級;不符合需求!
2、 kafka+sparkstream:完全靠開發(fā);不符合需求!
3、sentinl:kibana插件,友好的web ui,非常方便管理,僅支持發(fā)送郵件
安裝方式非常的簡單:
?。?首先根據(jù)kibana的版本下載對應(yīng)的版sentinl包 https://github.com/sirensolutions/sentinl/releases/tag,
./kibana-plugin install file:./sentinl-v6.0.1.zip
然后重啟kibana,便可在kibana界面上看到sentinl,如下圖:
sentinl的使用和安裝都非常的簡單,但是僅支持發(fā)送郵件,并且郵件內(nèi)容中無法獲取到從es上查詢出的內(nèi)容。不符合需求!
4、ElastAlert:無開發(fā)量;告警支持郵件、釘釘、微信、自定義等多種告警方式;能靈活從es中查詢出來的內(nèi)容;符合需求!
1)安裝
首先在從下載源碼包:網(wǎng)上都說master不支持es5,需要切換到es5的分支,但是并沒找到es5的分支,故這里采用了es6分支,本文使用的es版本為5.4.0。ElastAlert只支持python2。上傳下載后的包到服務(wù)器上,解壓。
cd elastalert
pip install -r requirements.txt
python setup.py install
cp config.yaml.example config.yaml
修改config.yaml
# This is the folder that contains the rule yaml files
# Any .yaml file will be loaded as a rule
#rules_folder: example_rules
#rule目錄,rules可以存在多個(gè)規(guī)則
rules_folder: rules
# How often ElastAlert will query Elasticsearch
# The unit can be anything from weeks to seconds
run_every:
#minutes: 1
#每3秒向es請求數(shù)據(jù)
seconds: 3
# ElastAlert will buffer results from the most recent
# period of time, in case some log sources are not in real time
buffer_time:
#日志會(huì)延時(shí)進(jìn)入es,這里表示查詢時(shí)間范圍
minutes: 15
# The Elasticsearch hostname for metadata writeback
# Note that every rule can have its own Elasticsearch host
es_host: 200.200.200.65
# The Elasticsearch port
es_port: 9200
# Connect with TLS to Elasticsearch
#use_ssl: True
# Option basic-auth username and password for Elasticsearch
#es_username: someusername
#es_password: somepassword
# The index on es_host which is used for metadata storage
# This can be a unmapped index, but it is recommended that you run
# elastalert-create-index to set a mapping
writeback_index: elastalert_status
#writeback_index: logstash-2018.06.25
# If an alert fails for some reason, ElastAlert will retry
# sending the alert until this time period has elapsed
alert_time_limit:
# minutes: 2
days: 2
以上字段的解釋
Rules_folder:用來加載下一階段rule的設(shè)置,默認(rèn)是example_rules
Run_every:用來設(shè)置定時(shí)向elasticsearch發(fā)送請求
Buffer_time:用來設(shè)置請求里時(shí)間字段的范圍,默認(rèn)是45分鐘
Es_host:elasticsearch的host地址
Es_port:elasticsearch 對應(yīng)的端口號
Use_ssl:可選的,選擇是否用SSL連接es,true或者false
Verify_certs:可選的,是否驗(yàn)證TLS證書,設(shè)置為true或者false,默認(rèn)為- true
Es_username:es認(rèn)證的username
Es_password:es認(rèn)證的password
Es_url_prefix:可選的,es的url前綴(我的理解是https或者h(yuǎn)ttp)
Es_send_get_body_as:可選的,查詢es的方式,默認(rèn)的是GET
Writeback_index:elastalert產(chǎn)生的日志在elasticsearch中的創(chuàng)建的索引
Alert_time_limit:失敗重試的時(shí)間限制
修改后,執(zhí)行elastalert-create-index ,會(huì)自動(dòng)在es中創(chuàng)建索引 elastalert_status,用來保存各個(gè)rule每次的執(zhí)行結(jié)果。
2)配置告警規(guī)則
ElastAlert支持11種告警規(guī)則,本文主要介紹frequency,其他的告警規(guī)則,如果后續(xù)有應(yīng)用將會(huì)補(bǔ)上。
首先copy一份默認(rèn)的
cp example_rules/example_frequency.yaml rules/test_frequency.yaml
在已有的es上隨便找來一個(gè)index進(jìn)行測試,如下圖:

只要_type的值為syslog,就發(fā)送郵件,修改test_frequency.yaml
# Alert when the rate of events exceeds a threshold
# (Optional)
# Elasticsearch host
es_host: 200.200.200.65
# (Optional)
# Elasticsearch port
es_port: 9200
# (OptionaL) Connect with SSL to Elasticsearch
#use_ssl: True
# (Optional) basic-auth username and password for Elasticsearch
#es_username: someusername
#es_password: somepassword
# (Required)
# Rule name, must be unique
name: "服務(wù)器都掛了你還在睡覺"
# (Required)
# Type of alert.
# the frequency rule type alerts when num_events events occur with timeframe time
type: frequency
#use_strftime_index: true
# (Required)
# Index to search, wildcard supported
#index: logstash-*
index: logstash-*
# (Required, frequency specific)
# Alert when this many documents matching the query occur within a timeframe
#在規(guī)定的時(shí)間范圍內(nèi)發(fā)生N次就觸發(fā)事件
num_events: 1
# (Required, frequency specific)
# num_events must occur within this amount of time to trigger an alert
timeframe:
hours: 3
# (Required)
# A list of Elasticsearch filters used for find events
# These filters are joined with AND and nested in a filtered query
# For more info: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl.html
#過濾出_type為syslog的數(shù)據(jù)
filter:
- term:
_type: "syslog"
# some_field: "some_value"
#_ query_string
# query: "_type: syslog"
# (Required)
# The alert is use when a match is found
#告警方式設(shè)置我email
alert:
- "email"
#告警郵件主題,以及動(dòng)態(tài)填充的參數(shù),按順序?qū)?yīng)
alert_subject: "Error {} @{}"
alert_subject_args:
- name
- "@timestamp"
#只發(fā)送alert_text的內(nèi)容
alert_text_type: alert_text_only
#增加郵件內(nèi)容
alert_text: |
> "你好啊,我是帥氣的笑笑"
> Name: {}
> Message: {}
> Host: {} ({})
alert_text_args:
- name
- message
- port
- host
smtp_host: smtp.163.com
smtp_port: 25
#用戶認(rèn)證文件,需要user和password兩個(gè)屬性
# smtp_auth_file.yaml,為剛才編輯的配置文件
smtp_auth_file: /home/elk/test-zx/smtp_auth_file.yaml
email_reply_to:xxx@163.com
from_addr: xxx@163.com
# (required, email specific)
# a list of email addresses to send alerts to
email:
- "xxx@163.com"
/home/elk/test-zx/smtp_auth_file.yaml配置郵箱的smtp賬戶和密碼
user: "xxx@163.com"
password: "xxx"
可以使用下面兩種方式測試上面的規(guī)則:
elastalert-test-rule --config config.yaml ruels/test_frequency.yaml
python -m elastalert.elastalert --debug --config config.yaml --rule ruels/test_frequency.yaml
觸發(fā)事件內(nèi)容:

在真正運(yùn)行的時(shí)候,采用
python -m elastalert.elastalert --config config.yaml
即可。
這里只介紹了alert的一種方式,還支持command、釘釘、微信和自定義擴(kuò)展