前言
監(jiān)控集成選型的 Telegraf 探針,最近需要實(shí)現(xiàn)對 Oracle 數(shù)據(jù)庫的做實(shí)時監(jiān)控,查了下 Telegraf 竟然還不支持 Oracle 監(jiān)控,WTF?于是自己研究了下,通過 Python + SQL 腳本折中解決了,此文去且當(dāng)作小結(jié)。
實(shí)現(xiàn)的效果
預(yù)備知識
Oracle動態(tài)性能視圖
動態(tài)性能視圖屬于數(shù)據(jù)字典,它們的所有者為SYS,并且多數(shù)動態(tài)性能視圖只能由特權(quán)用戶和DBA用戶查詢。當(dāng)數(shù)據(jù)庫處于不同狀態(tài)時,可以訪問的動態(tài)性能視圖有所不同。啟動例程時,ORACLE會自動建立動態(tài)性能視圖;停止例程時,ORACLE會自動刪除動態(tài)性能視圖。數(shù)據(jù)字典信息是從數(shù)據(jù)文件中獲得,而動態(tài)性能視圖信息是從SGA和控制文件取得。所以,兩者所反映的信息還是有很大差異的。數(shù)據(jù)庫管理員利用這些動態(tài)性能視圖,可以了解數(shù)據(jù)庫運(yùn)行的一些基本信息,為我們進(jìn)行數(shù)據(jù)庫維護(hù)以及數(shù)據(jù)庫性能優(yōu)化提供一些數(shù)據(jù)上的支持。所有動態(tài)性能視圖都是以V_$開始的,Oracle 為每個動態(tài)性能視圖提供了相應(yīng)的同義詞(V$開頭)
通過查詢V$FIXED_TABLE,可以列出所有可用的動態(tài)性能視圖和動態(tài)性能表。
SQL> select? * from V$FIXED_TABLE? where name like 'V$%';
NAME? ? ? ? ? ? ? ? ? ? ? ? ? ? OBJECT_ID TYPE? TABLE_NUM
------------------------------ ---------- ----- ----------
V$WAITSTAT? ? ? ? ? ? ? ? ? ? 4294950915 VIEW? ? ? 65537
V$BH? ? ? ? ? ? ? ? ? ? ? ? ? 4294951406 VIEW? ? ? 65537
V$GC_ELEMENT? ? ? ? ? ? ? ? ? 4294951794 VIEW? ? ? 65537
V$CR_BLOCK_SERVER? ? ? ? ? ? ? 4294951796 VIEW? ? ? 65537
V$CURRENT_BLOCK_SERVER? ? ? ? 4294952095 VIEW? ? ? 65537
V$POLICY_HISTORY? ? ? ? ? ? ? 4294953128 VIEW? ? ? 65537
V$ENCRYPTED_TABLESPACES? ? ? ? 4294952996 VIEW? ? ? 65537
V$GC_ELEMENTS_WITH_COLLISIONS? 4294951798 VIEW? ? ? 65537
V$FILE_CACHE_TRANSFER? ? ? ? ? 4294951800 VIEW? ? ? 65537
V$TEMP_CACHE_TRANSFER? ? ? ? ? 4294951802 VIEW? ? ? 65537
V$CLASS_CACHE_TRANSFER? ? ? ? 4294951804 VIEW? ? ? 65537
V$INSTANCE_CACHE_TRANSFER? ? ? 4294952151 VIEW? ? ? 65537
V$LOCK_ELEMENT? ? ? ? ? ? ? ? 4294951408 VIEW? ? ? 65537
V$BSP? ? ? ? ? ? ? ? ? ? ? ? ? 4294951594 VIEW? ? ? 65537
V$LOCKS_WITH_COLLISIONS? ? ? ? 4294951410 VIEW? ? ? 65537
V$FILE_PING? ? ? ? ? ? ? ? ? ? 4294951412 VIEW? ? ? 65537
V$TEMP_PING? ? ? ? ? ? ? ? ? ? 4294951532 VIEW? ? ? 65537
V$CLASS_PING? ? ? ? ? ? ? ? ? 4294951414 VIEW? ? ? 65537
V$LOCK_ACTIVITY? ? ? ? ? ? ? ? 4294951437 VIEW? ? ? 65537
V$ROWCACHE? ? ? ? ? ? ? ? ? ? 4294950916 VIEW? ? ? 65537
以下是不同類型的指標(biāo)視圖的快速表格比較:
該表的第一行是經(jīng)典的等待事件和統(tǒng)計視圖。以下幾行是度量標(biāo)準(zhǔn)視圖。度量標(biāo)準(zhǔn)視圖是在Oracle10g中引入的。
度量視圖計算增量和速率,這極大地簡化了解決簡單問題的能力,比如 “現(xiàn)在我的數(shù)據(jù)庫的I/O速率是多少?” 這個問題,在10g之前,處理起來出奇的乏味。要回答這個問題,你必須查詢v$sysstat,例如:
Select value from v$sysstat where name='physical reads';
但是僅查詢一次v$sysstat不能解決問題,而是“自數(shù)據(jù)庫啟動以來已完成了多少I / O?”的問題。要回答原始問題,必須兩次查詢v$sysstat并接受兩個值之間的增量:
在時間A取值
在時間B取值
Delta = (B-A)
and/or get Rate = (B-A)/elapsed time
獲得這些差值和速率可能是一項(xiàng)艱巨的工作。然后10gOracle引入了度量標(biāo)準(zhǔn)表,這些度量表可以在一個查詢中解決問題。
等待事件視圖為(系統(tǒng)級別)
V$SYSTEM_EVENT– 自啟動以來累積的等待事件
V$EVENTMETRIC- 等待事件增量持續(xù)60秒
DBA_HIST_SYSTEM_EVENT– 自啟動以來累計的上周按快照(小時)的等待事件
等待事件匯總到稱為等待類的組中。對于等待類,有以下視圖:
V$SYSTEM_WAIT_CLASS– 自啟動以來累積
V$WAITCLASSMETRIC– 持續(xù)60秒增量
V$WAITCLASSMETRIC_HISTORY– 最后一小時的60秒增量
注意:DBA_HIST_WAITCLASSMETRIC_HISTORY用于警報或基準(zhǔn),而不是日常值。
其他的就不一一展開了,具體可以參考下文:
http://datavirtualizer.com/wait-event-and-wait-class-metrics-vs-vsystem_event/
cx_Oracle
cx_Oracle 是一個 Python 擴(kuò)展模塊,可以訪問 Oracle 數(shù)據(jù)庫。它符合 Python 數(shù)據(jù)庫API 2.0 規(guī)范。
基本要求
要在 Python 和 Oracle 數(shù)據(jù)庫中使用cx_Oracle7,需要滿足以下條件:
Python 2.7或 3.5 及更高版本。
Oracle 客戶端庫。
Oracle 數(shù)據(jù)庫。Oracle的標(biāo)準(zhǔn)客戶端 - 服務(wù)器版本互操作性允許cx_Oracle連接到較舊和較新的數(shù)據(jù)庫。(推薦)
快速安裝
在 Linux 上安裝 cx_Oracle 的一般方法是使用 Python 的 Pip 包從 PyPI 安裝cx_Oracle:
從 PyPI 安裝 cx_Oracle:
python -m pip install cx_Oracle --upgrade
將 Oracle 客戶端庫添加到操作系統(tǒng)庫搜索路徑,例如 Linux 的 LDLIBRARYPATH
如果你的數(shù)據(jù)庫位于遠(yuǎn)程計算機(jī)上,請下 適用于你的操作系統(tǒng)體系結(jié)構(gòu)的免費(fèi)Oracle Instant Client “Basic” 或 “Basic Light” 包
至于具體的OracleClient安裝,可以參考下文:
https://cx-oracle.readthedocs.io/en/latest/user_guide/installation.html#installing-cx-oracle-on-linux
解決方案
Python:收集 Oracle 指標(biāo)數(shù)據(jù)
Telegraf:收集 Python 打印的性能指標(biāo)數(shù)據(jù)
InfluxDB:存儲時間序列 Oracle 性能指標(biāo)數(shù)據(jù)
Grafana:可視化 Dashboard
安裝
具體的安裝可以參考官方文檔:
Telegraf:https://docs.influxdata.com/telegraf/v1.12/introduction/installation/
InfluxDB:https://docs.influxdata.com/influxdb/v1.7/introduction/installation/
Grafana:https://grafana.com/docs/installation/rpm/
具體設(shè)置
在 InfluxDB 中創(chuàng)建一個 Telegraf 數(shù)據(jù)庫:
[root@zuozewei ~]# influx
Connected to http://localhost:8086 version 1.6.2
InfluxDB shell version: 1.6.2
> create user "telegraf" with password 'telegraf'
> create database telegraf
> show databases
name: databases
name
----
_internal
telegraf
編寫 python+sql 腳本以收集 oracle 指標(biāo)。腳本的輸出內(nèi)容很重要,必須是 InfluxDB line-protocol。該腳本查詢v$ SYSMETRIC和v$eventmetric,獲得最后一分鐘時,等待類和等待事件指標(biāo)。
python代碼是:
import socket,argparse,subprocess,re,cx_Oracle
fqdn = socket.getfqdn()
class OraStats():
? ? def __init__(self, user, passwd, sid):
? ? ? ? self.user = user
? ? ? ? self.passwd = passwd
? ? ? ? self.sid = sid
? ? ? ? self.delengine = "none"
? ? ? ? connstr=self.user+'/'+self.passwd+'@'+self.sid
? ? ? ? self.connection = cx_Oracle.connect(connstr)
? ? ? ? cursor = self.connection.cursor()
? ? ? ? cursor.execute("select distinct(SVRNAME)? from v$dnfs_servers")
? ? ? ? rows = cursor.fetchall()
? ? ? ? for i in range(0, cursor.rowcount):
? ? ? ? ? ? self.dengine_ip = rows[i][0]
? ? ? ? ? ? proc = subprocess.Popen(["nslookup", self.dengine_ip], stdout=subprocess.PIPE)
? ? ? ? ? ? lookupresult = proc.communicate()[0].split('\n')
? ? ? ? ? ? for line in lookupresult:
? ? ? ? ? ? ? ? if 'name=' in re.sub(r'\s', '', line):
? ? ? ? ? ? ? ? ? ? self.delengine = re.sub('\..*$', '', re.sub(r'^.*name=', '', re.sub(r'\s', '', re.sub(r'.$', '', line))))
? ? # 等待類別
? ? def waitclassstats(self, user, passwd, sid, format):
? ? ? ? cursor = self.connection.cursor()
? ? ? ? cursor.execute("""
? ? ? ? select n.wait_class, round(m.time_waited/m.INTSIZE_CSEC,3) AAS
? ? ? ? from? v$waitclassmetric? m, v$system_wait_class n
? ? ? ? where m.wait_class_id=n.wait_class_id and n.wait_class != 'Idle'
? ? ? ? union
? ? ? ? select? 'CPU', round(value/100,3) AAS
? ? ? ? from v$sysmetric where metric_name='CPU Usage Per Sec' and group_id=2
? ? ? ? union select 'CPU_OS', round((prcnt.busy*parameter.cpu_count)/100,3) - aas.cpu
? ? ? ? from
? ? ? ? ( select value busy
? ? ? ? from v$sysmetric
? ? ? ? where metric_name='Host CPU Utilization (%)'
? ? ? ? and group_id=2 ) prcnt,
? ? ? ? ( select value cpu_count from v$parameter where name='cpu_count' )? parameter,
? ? ? ? ( select? 'CPU', round(value/100,3) cpu from v$sysmetric where metric_name='CPU Usage Per Sec' and group_id=2) aas
? ? ? ? """)
? ? ? ? for wait in cursor:
? ? ? ? ? ? wait_name = wait[0]
? ? ? ? ? ? wait_value = wait[1]
? ? ? ? ? ? print ("oracle_wait_class,fqdn={0},delphix={1},db={2},wait_class={3} wait_value={4}".format(fqdn, self.delengine, sid, re.sub(' ', '_', wait_name), wait_value))
? ? # 系統(tǒng)指標(biāo)
? ? def sysmetrics(self, user, passwd, sid, format):
? ? ? ? cursor = self.connection.cursor()
? ? ? ? cursor.execute("""
? ? ? ? select METRIC_NAME,VALUE,METRIC_UNIT from v$sysmetric where group_id=2
? ? ? ? """)
? ? ? ? for metric in cursor:
? ? ? ? ? ? metric_name = metric[0]
? ? ? ? ? ? metric_value = metric[1]
? ? ? ? ? ? print ("oracle_sysmetric,fqdn={0},delphix={1},db={2},metric_name={3} metric_value={4}".format(fqdn,self.delengine,sid,re.sub(' ', '_', metric_name),metric_value))
? ? # 在閃回恢復(fù)區(qū)中有關(guān)磁盤配額和當(dāng)前磁盤使用情況
? ? def fraused(self, user, passwd, sid, format):
? ? ? ? cursor = self.connection.cursor()
? ? ? ? cursor.execute("""
? ? ? ? select round((SPACE_USED-SPACE_RECLAIMABLE)*100/SPACE_LIMIT,1) from? V$RECOVERY_FILE_DEST
? ? ? ? """)
? ? ? ? for frau in cursor:
? ? ? ? ? ? fra_used = frau[0]
? ? ? ? ? ? print ("oracle_fra_pctused,fqdn={0},delphix={1},db={2} fra_pctused={3}".format(fqdn,self.delengine,sid,fra_used))
? ? # 磁盤使用狀態(tài)
? ? def fsused(self):
? ? fss = ['/oracle', '/data']
? ? for fs in fss:
? ? ? ? ? ? df = subprocess.Popen(["df","-P",fs], stdout=subprocess.PIPE)
? ? ? ? ? ? output = df.communicate()[0]
? ? ? ? ? ? total = re.sub('%','',output.split("\n")[1].split()[1])
? ? ? ? ? ? used = re.sub('%','',output.split("\n")[1].split()[2])
? ? ? ? ? ? pctused = re.sub('%','',output.split("\n")[1].split()[4])
? ? ? ? ? ? print("oracle_fs_pctused,fqdn={0},fs_name={1} oraclefs_pctused={2},oraclefs_alloc={3},oraclefs_used={4}".format(fqdn,fs,pctused,total,used))
? ? # 等待狀態(tài)
? ? def waitstats(self, user, passwd, sid, format):
? ? ? ? cursor = self.connection.cursor()
? ? ? ? cursor.execute("""
? ? ? ? select /*+ ordered use_hash(n) */
? ? ? ? n.wait_class wait_class,
? ? ? ? n.name wait_name,
? ? ? ? m.wait_count? cnt,
? ? ? ? nvl(round(10*m.time_waited/nullif(m.wait_count,0),3) ,0) avg_ms
? ? ? ? from v$eventmetric m,
? ? ? ? v$event_name n
? ? ? ? where m.event_id=n.event_id
? ? ? ? and n.wait_class <> 'Idle' and m.wait_count > 0 order by 1""")
? ? ? ? for wait in cursor:
? ? ? ? ? ? wait_class = wait[0]
? ? ? ? ? ? wait_name = wait[1]
? ? ? ? ? ? wait_cnt = wait[2]
? ? ? ? ? ? wait_avgms = wait[3]
? ? ? ? ? ? print ("oracle_wait_event,fqdn={0},delphix={1},db={2},wait_class={3},wait_event={4} count={5},latency={6}".format(fqdn, self.delengine,sid,re.sub(' ', '_', wait_class), re.sub(' ','_',wait_name),wait_cnt,wait_avgms))
? ? # 表空間使用狀態(tài)
? ? def tbsstats(self, user, passwd, sid, format):
? ? ? ? cursor = self.connection.cursor()
? ? ? ? cursor.execute("""
? ? ? ? select /*+ ordered */ tablespace_name,
? ? ? ? ? ? round(used_space),
? ? ? ? ? ? round(max_size-used_space) free_space,
? ? ? ? ? ? round(max_size),
? ? ? ? ? ? round(used_space*100/max_size,2) percent_used
? ? ? ? ? ? from (
? ? ? ? ? ? ? ? select m.tablespace_name,
? ? ? ? ? ? ? ? m.used_space*t.block_size/1024/1024 used_space,
? ? ? ? ? ? ? ? (case when t.bigfile='YES' then power(2,32)*t.block_size/1024/1024
? ? ? ? ? ? ? ? ? ? ? ? else tablespace_size*t.block_size/1024/1024 end) max_size
? ? ? ? ? ? from dba_tablespace_usage_metrics m, dba_tablespaces t
? ? ? ? where m.tablespace_name=t.tablespace_name)
? ? ? ? """)
? ? ? ? for tbs in cursor:
? ? ? ? ? ? tbs_name = tbs[0]
? ? ? ? ? ? used_space_mb = tbs[1]
? ? ? ? ? ? free_space_mb = tbs[2]
? ? ? ? ? ? max_size_mb = tbs[3]
? ? ? ? ? ? percent_used = tbs[4]
? ? ? ? ? ? print ("oracle_tablespaces,fqdn={0},delphix={1},db={2},tbs_name={3} used_space_mb={4},free_space_mb={5},percent_used={6},max_size_mb={7}".format(fqdn, self.delengine, sid, re.sub(' ', '_', tbs_name), used_space_mb,free_space_mb,percent_used,max_size_mb))
if __name__ == "__main__":
? ? parser = argparse.ArgumentParser()
? ? parser.add_argument('-f', '--format', help="Output format, default influx", choices=['kafka', 'influx'], default='influx')
? ? subparsers = parser.add_subparsers(dest='stat')
? ? parser_all = subparsers.add_parser('ALL', help="Get all database stats")
? ? parser_all.add_argument('-u', '--user', help="Username with sys views grant", required=True)
? ? parser_all.add_argument('-p', '--passwd', required=True)
? ? parser_all.add_argument('-s', '--sid', help="tnsnames SID to connect", required=True)
? ? args = parser.parse_args()
? ? if args.stat == "ALL":
? ? ? ? stats = OraStats(args.user, args.passwd, args.sid)
? ? ? ? stats.waitclassstats(args.user, args.passwd, args.sid, args.format)
? ? ? ? stats.waitstats(args.user, args.passwd, args.sid, args.format)
? ? ? ? stats.sysmetrics(args.user, args.passwd, args.sid, args.format)
? ? ? ? stats.tbsstats(args.user, args.passwd, args.sid, args.format)
? ? ? ? stats.fraused(args.user, args.passwd, args.sid, args.format)
? ? ? ? stats.fsused()
輸出格式化為 InfluxDB line-protocol
[root@localhost tools]# ./oracle.sh
oracle_wait_class,fqdn=localhost.localdomain,delphix=none,db=172.16.106.251:1521/orcl,wait_class=Application wait_value=0
oracle_wait_class,fqdn=localhost.localdomain,delphix=none,db=172.16.106.251:1521/orcl,wait_class=CPU wait_value=0.003
oracle_wait_class,fqdn=localhost.localdomain,delphix=none,db=172.16.106.251:1521/orcl,wait_class=CPU_OS wait_value=0.778
oracle_wait_class,fqdn=localhost.localdomain,delphix=none,db=172.16.106.251:1521/orcl,wait_class=Commit wait_value=0
oracle_wait_class,fqdn=localhost.localdomain,delphix=none,db=172.16.106.251:1521/orcl,wait_class=Concurrency wait_value=0.001
oracle_wait_class,fqdn=localhost.localdomain,delphix=none,db=172.16.106.251:1521/orcl,wait_class=Configuration wait_value=0
oracle_wait_class,fqdn=localhost.localdomain,delphix=none,db=172.16.106.251:1521/orcl,wait_class=Network wait_value=0
oracle_wait_class,fqdn=localhost.localdomain,delphix=none,db=172.16.106.251:1521/orcl,wait_class=Other wait_value=0
oracle_wait_class,fqdn=localhost.localdomain,delphix=none,db=172.16.106.251:1521/orcl,wait_class=System_I/O wait_value=0.001
oracle_wait_class,fqdn=localhost.localdomain,delphix=none,db=172.16.106.251:1521/orcl,wait_class=User_I/O wait_value=0
oracle_wait_event,fqdn=localhost.localdomain,delphix=none,db=172.16.106.251:1521/orcl,wait_class=Commit,wait_event=log_file_sync count=2,latency=0.122
oracle_wait_event,fqdn=localhost.localdomain,delphix=none,db=172.16.106.251:1521/orcl,wait_class=Concurrency,wait_event=os_thread_startup count=2,latency=21.595
oracle_wait_event,fqdn=localhost.localdomain,delphix=none,db=172.16.106.251:1521/orcl,wait_class=Network,wait_event=SQL*Net_message_to_client count=17,latency=0.001
oracle_wait_event,fqdn=localhost.localdomain,delphix=none,db=172.16.106.251:1521/orcl,wait_class=Other,wait_event=asynch_descriptor_resize count=4,latency=0.001
oracle_wait_event,fqdn=localhost.localdomain,delphix=none,db=172.16.106.251:1521/orcl,wait_class=System_I/O,wait_event=db_file_parallel_write count=2,latency=0.081
oracle_wait_event,fqdn=localhost.localdomain,delphix=none,db=172.16.106.251:1521/orcl,wait_class=System_I/O,wait_event=control_file_parallel_write count=24,latency=0.268
oracle_wait_event,fqdn=localhost.localdomain,delphix=none,db=172.16.106.251:1521/orcl,wait_class=System_I/O,wait_event=control_file_sequential_read count=71,latency=0.716
oracle_wait_event,fqdn=localhost.localdomain,delphix=none,db=172.16.106.251:1521/orcl,wait_class=System_I/O,wait_event=log_file_parallel_write count=7,latency=0.076
oracle_wait_event,fqdn=localhost.localdomain,delphix=none,db=172.16.106.251:1521/orcl,wait_class=User_I/O,wait_event=Disk_file_operations_I/O count=16,laten
定義一個 shell 腳本執(zhí)行 Python 腳本
#!/usr/bin/env bash
python /home/oracle/scripts/oracle_metrics.sh -f "influx" "ALL" "-u" "system" "-p" "xxxx" "-s" "172.16.106.251:1521/orcl"
在 oracle主機(jī)上,配置 telegraf 以60秒的間隔執(zhí)行 python sh,然后將輸出發(fā)送到 InfluxDB。編輯/etc/telegraf/telegraf.conf配置文件:
# Telegraf configuration
# Telegraf is entirely plugin driven. All metrics are gathered from the
# declared inputs, and sent to the declared outputs.
# Plugins must be declared in here to be active.
# To deactivate a plugin, comment out the name and any variables.
# Use 'telegraf -config telegraf.conf -test' to see what metrics a config
# file would generate.
# Global tags can be specified here in key="value" format.
[global_tags]
? # dc = "us-east-1" # will tag all metrics with dc=us-east-1
? # rack = "1a"
? host="Dprima"
? collector="telegraf"
# Configuration for telegraf agent
[agent]
? ## Default data collection interval for all inputs
? interval = "10s"
? ## Rounds collection interval to 'interval'
? ## ie, if interval="10s" then always collect on :00, :10, :20, etc.
? round_interval = true
? ## Telegraf will cache metric_buffer_limit metrics for each output, and will
? ## flush this buffer on a successful write.
? metric_buffer_limit = 10000
? ## Flush the buffer whenever full, regardless of flush_interval.
? flush_buffer_when_full = true
? ## Collection jitter is used to jitter the collection by a random amount.
? ## Each plugin will sleep for a random time within jitter before collecting.
? ## This can be used to avoid many plugins querying things like sysfs at the
? ## same time, which can have a measurable effect on the system.
? collection_jitter = "0s"
? ## Default flushing interval for all outputs. You shouldn't set this below
? ## interval. Maximum flush_interval will be flush_interval + flush_jitter
? flush_interval = "60s"
? ## Jitter the flush interval by a random amount. This is primarily to avoid
? ## large write spikes for users running a large number of telegraf instances.
? ## ie, a jitter of 5s and interval 10s means flushes will happen every 10-15s
? flush_jitter = "0s"
? ## Run telegraf in debug mode
? debug = false
? ## Run telegraf in quiet mode
? quiet = false
? ## Override default hostname, if empty use os.Hostname()
? hostname = "Dprima"
###############################################################################
#? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? OUTPUTS? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? #
###############################################################################
# Configuration for influxdb server to send metrics to
[[outputs.influxdb]]
? urls = ["http://influxgraf:8086"] # required
? database = "telegraf" # required
? precision = "s"
? timeout = "5s"
[[outputs.influxdb]]
? urls = ["http://localhost:9092"] # required
? database = "kapacitor" # required
? precision = "s"
? retention_policy = "default"
? timeout = "5s"
#[[outputs.file]]
#? files=["/home/oracle/scripts/telegraf_debug.txt"]
###############################################################################
#? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? INPUTS? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? #
###############################################################################
# Oracle metrics
[[inputs.exec]]
? # Shell/commands array
? commands = ["/home/oracle/scripts/oracle_metrics.sh"]
? # Data format to consume. This can be "json", "influx" or "graphite" (line-protocol)
? # NOTE json only reads numerical measurements, strings and booleans are ignored.
? data_format = "influx"
? interval = "60s"
###############################################################################
#? ? ? ? ? ? ? ? ? ? ? ? ? ? ? SERVICE INPUTS? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? #
###############################################################################
啟動 telegraf:
telegraf -config /etc/telegraf/telegraf.conf
數(shù)據(jù)可視化
查詢 InfluxDB 數(shù)據(jù)庫
[root@localhost log]# influx
Connected to http://localhost:8086 version 1.7.4
InfluxDB shell version: 1.7.4
Enter an InfluxQL query
> show databases
name: databases
name
----
_internal
telegraf
> use telegraf
Using database telegraf
> show measurements
name: measurements
name
----
oracle_fra_pctused
oracle_sysmetric
oracle_tablespaces
oracle_wait_class
oracle_wait_event
> select * from oracle_sysmetric limit 5
name: oracle_sysmetric
time? ? ? ? ? ? ? ? db? ? ? ? ? ? ? ? ? ? ? delphix fqdn? ? ? ? ? ? ? ? ? host? ? ? ? ? ? ? ? ? metric_name? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? metric_value
----? ? ? ? ? ? ? ? --? ? ? ? ? ? ? ? ? ? ? ------- ----? ? ? ? ? ? ? ? ? ----? ? ? ? ? ? ? ? ? -----------? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ------------
1554277680000000000 172.16.14.251:1521/orcl none? ? localhost.localdomain localhost.localdomain Active_Parallel_Sessions? ? ? ? ? ? ? ? ? ? ? 0
1554277680000000000 172.16.14.251:1521/orcl none? ? localhost.localdomain localhost.localdomain Active_Serial_Sessions? ? ? ? ? ? ? ? ? ? ? ? 1
1554277680000000000 172.16.14.251:1521/orcl none? ? localhost.localdomain localhost.localdomain Average_Active_Sessions? ? ? ? ? ? ? ? ? ? ? 0.0138029495084
1554277680000000000 172.16.14.251:1521/orcl none? ? localhost.localdomain localhost.localdomain Average_Synchronous_Single-Block_Read_Latency 0.5875
1554277680000000000 172.16.14.251:1521/orcl none? ? localhost.localdomain localhost.localdomain Background_CPU_Usage_Per_Sec? ? ? ? ? ? ? ? ? 0.104149308449
>
Grafana 效果圖如下:
小結(jié)
通過結(jié)合 Python 腳本開發(fā)的方式,我們可以擴(kuò)展部分 Telegraf 不支持的監(jiān)控項(xiàng),本文簡單提供了一種思路。
相關(guān)資料:
https://github.com/7DGroup/JMeter-examples/tree/master/Performance%20Monitoring/Telegraf-InfluxDB-Grafana-Python-Oracle
參考資料:
[1]:https://cx-oracle.readthedocs.io/en/latest/index.html
[2]:http://datavirtualizer.com/wait-event-and-wait-class-metrics-vs-vsystem_event/
[3]:https://docs.influxdata.com/influxdb/v1.7/write_protocols/
備注:文章來自微信公眾號:Zee_7DGroup