需求描述
一般在生產(chǎn)環(huán)境中,有時候需要自動的檢測指標值狀態(tài),如果發(fā)生異常,需要提前預警的,比如發(fā)郵件告知,本篇就介紹如果通過Power shell實現(xiàn)狀態(tài)值監(jiān)控。
監(jiān)控值范圍
根據(jù)經(jīng)驗,作為DBA一般需要監(jiān)控如下系統(tǒng)能行指標。
cpu:
Processor(_Total)% Processor Time
Processor(_Total)% Privileged Time
SQLServer:SQL StatisticsBatch Requests/sec
SQLServer:SQL StatisticsSQL Compilations/sec
SQLServer:SQL StatisticsSQL Re-Compilations/sec
SystemProcessor Queue Length
SystemContext Switches/sec
Memory:
MemoryAvailable Bytes
MemoryPages/sec
MemoryPage Faults/sec
MemoryPages Input/sec
MemoryPages Output/sec
Process(sqlservr)Private Bytes
SQLServer:Buffer ManagerBuffer cache hit ratio
SQLServer:Buffer ManagerPage life expectancy
SQLServer:Buffer ManagerLazy writes/sec
SQLServer:Memory ManagerMemory Grants Pending
SQLServer:Memory ManagerTarget Server Memory (KB)
SQLServer:Memory ManagerTotal Server Memory (KB)
Disk:
PhysicalDisk(_Total)% Disk Time
PhysicalDisk(_Total)Current Disk Queue Length
PhysicalDisk(_Total)Avg. Disk Queue Length
PhysicalDisk(_Total)Disk Transfers/sec
PhysicalDisk(_Total)Disk Bytes/sec
PhysicalDisk(_Total)Avg. Disk sec/Read
PhysicalDisk(_Total)Avg. Disk sec/Write
SQL Server:
SQLServer:Access MethodsFreeSpace Scans/sec
SQLServer:Access MethodsFull Scans/sec
SQLServer:Access MethodsTable Lock Escalations/sec
SQLServer:Access MethodsWorktables Created/sec
SQLServer:General StatisticsProcesses blocked
SQLServer:General StatisticsUser Connections
SQLServer:LatchesTotal Latch Wait Time (ms)
SQLServer:Locks(_Total)Lock Timeouts (timeout > 0)/sec
SQLServer:Locks(_Total)Lock Wait Time (ms)
SQLServer:Locks(_Total)Number of Deadlocks/sec
SQLServer:SQL StatisticsBatch Requests/sec
SQLServer:SQL StatisticsSQL Re-Compilations/sec
上述指標含義,可以參照我上一篇文章:SQL Server需要監(jiān)控哪些計數(shù)器
監(jiān)控腳本
$server = "(local)"
$uid = "sa"
$db="master"
$pwd="password"
$mailprfname = "SendEmail"
$recipients = ""
$subject = "數(shù)據(jù)庫指標異常了!"
$computernamexml = "f:computername.xml"
$alter_cpuxml = "f:alter_cpu.xml"
function GetServerName($xmlpath)
{
$xml = [xml] (Get-Content $xmlpath)
$return = New-Object Collections.Generic.List[string]
for($i = 0;$i -lt $xml.computernames.ChildNodes.Count;$i++)
{
if ( $xml.computernames.ChildNodes.Count -eq 1)
{
$cp = [string]$xml.computernames.computername
}
else
{
$cp = [string]$xml.computernames.computername[$i]
}
$return.Add($cp.Trim())
}
$return
}
function GetAlterCounter($xmlpath)
{
$xml = [xml] (Get-Content $xmlpath)
$return = New-Object Collections.Generic.List[string]
$list = $xml.counters.Counter
$list
}
function CreateAlter($message)
{
$SqlConnection = New-Object System.Data.SqlClient.SqlConnection
$CnnString ="Server = $server; Database = $db;User Id = $uid; Password = $pwd"
$SqlConnection.ConnectionString = $CnnString
$CC = $SqlConnection.CreateCommand();
if (-not ($SqlConnection.State -like "Open")) { $SqlConnection.Open() }
$cc.CommandText=" EXEC msdb..sp_send_dbmail
@profile_name = '$mailprfname'
,@recipients = '$recipients'
,@body = '$message'
,@subject = '$subject'
"
$cc.ExecuteNonQuery()|out-null
$SqlConnection.Close();
}
$names = GetServerName($computernamexml)
$pfcounters = GetAlterCounter($alter_cpuxml)
foreach($cp in $names)
{
$p = New-Object Collections.Generic.List[string]
$report = ""
foreach ($pfc in $pfcounters)
{
$b = ""
$counter ="\"+$cp+$pfc.get_InnerText().Trim()
$p.Add($counter)
}
$count = Get-Counter $p
for ($i = 0; $i -lt $count.CounterSamples.Count; $i++)
{
$v = $count.CounterSamples.Get($i).CookedValue
$pfc = $pfcounters[$i]
#$pfc.get_InnerText()
$b = ""
$lg = ""
if($pfc.operator -eq "lt")
{
if ($v -ge [double]$pfc.alter)
{$b = "alter"
$lg = "Greater Than"}
}
elseif ($pfc.operator -eq "gt")
{
if( $v -le [double]$pfc.alter)
{$b = "alter"
$lg = "Less Than"}
}
if($b -eq "alter")
{
$path = "\"+$cp+$pfc.get_InnerText()
$item = "{0}:{1};{2} Threshold:{3}" -f $path,$v.ToString(),$lg,$pfc.alter.Trim()
$report += $item + "`n"
}
}
if($report -ne "")
{
#生產(chǎn)警告 參數(shù) 計數(shù)器,閥值,當前值
CreateAlter $report
}
}
其中涉及到2個配置文件:computernamexml,alter_cpuxml分別如下:
<computernames>
<computername>
wuxuelei-pc
</computername>
</computernames>
<Counters>
<Counter alter = "10" operator = "gt" >Processor(_Total)% Processor Time</Counter>
<Counter alter = "10" operator = "gt" >Processor(_Total)% Privileged Time</Counter>
<Counter alter = "10" operator = "gt" >SQLServer:SQL StatisticsBatch Requests/sec</Counter>
<Counter alter = "10" operator = "gt" >SQLServer:SQL StatisticsSQL Compilations/sec</Counter>
<Counter alter = "10" operator = "gt" >SQLServer:SQL StatisticsSQL Re-Compilations/sec</Counter>
<Counter alter = "10" operator= "lt" >SystemProcessor Queue Length</Counter>
<Counter alter = "10" operator= "lt" >SystemContext Switches/sec</Counter>
</Counters>
其中 alter 就是閥值,如第一條,如果 閥值 > 性能計數(shù)器值,就會發(fā)出警告。
其實這種自定義配置的方式,實現(xiàn)了靈活多變的自動化監(jiān)控標準:
1、比如可以檢測磁盤空間大小
2、檢測運行峰值狀態(tài)
3、定時的根據(jù)歷史運行值,更改生產(chǎn)系統(tǒng)中的閥值大小,也就是所謂的運行基線
警告實現(xiàn)方式
1、SQL Agent配置Job方式實現(xiàn)
2、計劃任務
以上兩種配置方式,可以靈活掌握,操作還是蠻簡單的,如果不會,可自行google。當然,如果不想干預正常的生產(chǎn)系統(tǒng),可以添加一個Server專門用來自動化運維檢測來用,實現(xiàn)遠程監(jiān)控。
后續(xù)文章中會分析關于Power Shell的遠程調(diào)用,并且能實現(xiàn)事故當前狀態(tài)下,自動化截圖….自動Send Email……為DBA現(xiàn)場取證第一手材料…方便診斷問題…
效果圖如下
以上只提供實現(xiàn)方式,如需要內(nèi)容更新,自己靈活更新。
更多信息請查看IT技術專欄