vROPs –Alarms and Alerts whats the difference?  Part 2

Link to part 1
Link to part 3

Following on from Part 1 – So what else can I do?

The key is to identify what is causing such a high level of alarms in your environment, but this can be tricky!

There is actually a way to show all the symptoms that are currently active in your environment much like the Alerts views and this can be useful but the real problem is the closed alarms…

The total number of ‘Alarms’ generated and stored in the database is not displayed anywhere in the GUI or any of the OOB vROPs health dashboards (I do love these though 🙂 ). In fact I failed to find a metric anywhere in vROPs to cover this :).    Happy to be corrected if it does exist…

Post7-Img1

*Feature request submitted  🙂

If the alert count is very low and we cannot see what Symptoms these alarms related to, how do we identify which symptoms and/or alerts we need to tune?

Solution 1

You can run one of the scripts mentioned in part 1 but in raw format that’s piped to a file.

Alerts –

  • su – postgres -c “/opt/vmware/vpostgres/9.3/bin/psql -d vcopsdb -A -t -c ‘select * from alert'” > rawalert.txt

Alarms

  • su – postgres -c “/opt/vmware/vpostgres/9.3/bin/psql -d vcopsdb -A -t -c ‘select * from alarm'” > rawalarm.txt

* To get the full picture, you will need to run that on ALL analytics nodes
* The commands may take a while to complete if your environment is large

You can then grab these files and work on the data contained within to perform some analysis.
I found this very difficult to work with.

Solution 2

I wanted to be able to narrow down my search into which Symptoms were the generating the highest volume of alarms, after all, it could have been just one Symptom…

What I came up with was a PowerShell script that can pull this information directly out of vROPs via the Rest APIs.

The script will give you a breakdown of the total volume of Alarms in the database + the total alarms for each resource type and each criticality type.  It will write to the screen and export to a CSV file.

Here is example output from a small Lab environment.
* The runtime columns just represent how long it took to pull the data from the API. I had been having some timeout issues…

Post7-Img2

Post7-Img3

And here is the script:

#####################################
# ScriptName:    vROPs-AlarmBreakdown
# Script Author: James Gill
# Last Updated:  28/05/17
# Version:       1
#####################################


#Collect Credntial details
$cred = Get-Credential


if($resourceTypes) {Clear-variable resourceTypes}
if($criticalityTypes) {Clear-Variable criticalityTypes}
if($vROPsAddress) {Clear-Variable vROPsAddress}
if($APICall) {Clear-Variable APICall}
if($APIMethod) {Clear-Variable APIMethod}

#region manual input params
#Enter your vROPs addrerss here
$vRopsAddress = @('manvrops01.vcit.local')

#Enter your report path and report name here
$ReportPath = 'c:\temp\AlarmBreakdown.csv'

#endregion manual input params


$APICall = "suite-api/api/symptoms/query"
$APIMethod = 'Post'

#region ResourceTypes and Criticality Types

#Resource Types
$ResourceTypes = @(
        "ClusterComputeResource",
        "ComputeResource",
        "CustomDatacenter",
        "Datacenter",
        "Datastore",
        "StoragePod",
        "DatastoreFolder",
        "VM Entity Status",
        "Folder",
        "HostFolder",
        "HostSystem",
        "NetworkFolder",
        "ResourcePool",
        "VMwareAdapter Instance",
        "VirtualMachine",
        "VMFolder",
        "DistributedVirtualPortgroup",
        "VmwareDistributedVirtualSwitch",
        "vSphere World"
)

#Criticality Types
$CriticalityTypes = @(
        "CRITICAL",
        "IMMEDIATE",
        "WARNING",
        "INFORMATION"
)
#endregion ResourceTypes and Criticality Types


#region foreach loops
$alertvolumes = @()

foreach ($resourcetype in $ResourceTypes) {

    foreach ($Criticalitytype in $CriticalityTypes) {


#region JSON payload construction, headers and run

# Define required headers to supply to the Web request
$headers = @{
    		"Accept" = "application/json"
            "Content-Type" = "application/json"
	        }
	

$JSONfile01 = @{
                  compositeOperator = 'AND';
                  resourcequery = @{
                    resourceKind = @("$ResourceType")
                  }
                  alarmCriticality =  @("$CriticalityType");
                  activeOnly = 'false'
                }

#Convert the hash table to json
$JSONBODY = ConvertTo-Json $JSONFile01

# the json file needs to have a '-' in the property name  resource-query. This was removed in the hash table as not supported.
# we need to put it back
$JSONBODY = $JSONBODY -replace "resourcequery", "resource-query"


#endregion JSON payload construction and headers

#region SSL bypass

#vRops SSL certificate trust
add-type @"
     using System.Net;
     using System.Security.Cryptography.X509Certificates;
     public class TrustAllCertsPolicy : ICertificatePolicy {
         public bool CheckValidationResult(
             ServicePoint srvPoint, X509Certificate certificate,
             WebRequest request, int certificateProblem) {
             return true;
         }
     }
"@
[System.Net.ServicePointManager]::CertificatePolicy = New-Object TrustAllCertsPolicy
#endregion SSL bypass

#region access API

#set the API URI
$URI = "https://$vROPsAddress/$APICall"

Write-Output "Retrieving : $ResourceType : $CriticalityType" 
$Date= Get-Date
$AlarmResults = Invoke-RestMethod -Method $APIMethod -Uri $URI -timeoutsec 600 -Headers $headers -body $JSONBody -Credential $cred 
$Invokatoincompleted = Get-date
$RunTime = ($Invokatoincompleted - $Date)

$RuntimeMin = $Runtime.Minutes
$RuntimeSec = $Runtime.seconds
$RuntimeMil = $Runtime.milliseconds

#endregion access API

#region Present Results
[int]$TotalAlarms = $AlarmResults.pageInfo.totalCount.ToString()


 $alertvolumes += New-Object PSObject -Property @{

                    Date                = $Date
                    ResourceType        = $ResourceType
                    Criticality         = $CriticalityType
                    AlarmVolume         = $TotalAlarms
                    RunTimeMin         = $RunTimeMin
                    RunTimeSec         = $RunTimeSec
                    RunTimeMil         = $RuntimeMil
                                        }
                
}

}
$SUMTotal = ((($alertvolumes | select AlarmVolume) | measure AlarmVolume -Sum).Sum)

 $alertvolumes += New-Object PSObject -Property @{

                    Date                = $Date
                    ResourceType        = "Total Volume"
                    Criticality         = "All"
                    AlarmVolume         = $SumTotal
                    RunTimeMin         = "-"
                    RunTimeSec         = "-"
                    RunTimeMil         = "-"
                                        }



$alertvolumes | Sort -Property AlarmVolume -Descending | select Date, ResourceType, Criticality, AlarmVolume, RunTimeMin, RunTimeSec, RunTimeMil | FT -AutoSize
$alertvolumes | Sort -Property AlarmVolume -Descending | select Date, ResourceType, Criticality, AlarmVolume, RunTimeMin, RunTimeSec, RunTimeMil | export-csv $ReportPath -NoTypeInformation -Append


#endregion Present Results

#endregion foreach loops

 

How do I identify the actual ‘Symptom’ definitions? I will cover this in Part 3

Link to part 1
Link to part 3