Today’s post will be pretty short, however hopefully very informative and helpful. Actually, the content of it is not mine, but I am sharing it, because I would like this information to reach as many people as possible.
If you have been working with the SQL management pack for System Center Operations Manager, then you surely had to deal with the situation, where the PowerShell scripts, used within the various monitors and rules cannot be executed on the managed systems. In most of the cases this behaviour is encountered on heavily loaded SQL Servers, which are experiencing resource shortage. Often, at least in my experience, the issue is tightly related to querying WMI and the inability of the system to allocate enough resources for the query to succeed.
The alerts, containing the error message are usually generated by specific rule, which traces failed monitoring workflows and is named like this:
MSSQL <version>: Monitoring failed
where the version reflects the monitored SQL version, which is affected, respectively the management pack, containing this rule. Example:
MSSQL 2014: Monitoring failed
MSSQL 2016: Monitoring failed
But can also be triggered by the respective AlwaysOn rule:
MSSQL 2014: Always On monitoring script failed rule
MSSQL 2016: Always On monitoring script failed rule
The common thing between those is the Reason and the detailed error output for the failing workflow, which is found in the alert description:
Reason: Out of memory
Detailed error output: Out of memory
The workaround covered here addresses this particular case respectively error message. While searching for a solution, I came across a Microsoft Social Technet Forum Post, where Alex, who is also involved in the SQL Management Pack development, presents us with a neat solution for this.
Here is the link to the post and also his action plan:
MSSQL: Monitoring failed with Event ID 4001 Out of Memory
The workaround contains two actions:
- isolating the SQL Server WMI provider;
- increasing the WMI UnloadTimeout.
Here the complete action plan, Alex suggested and also his comments:
“We found a workaround to get rid of the „out of memory“ errors. Try to take the following steps and then share your results here or at email@example.com. Pay attention that we suggest you make two changes at once as we don’t know if it’s enough just one of them to solve the problem. The only thing is known at the moment—if you apply both changes, the errors will disappear.
So, two changes are:
- to isolate the SQL Server WMI provider;
- to increase the UnloadTimeout.
To isolate the provider in its own host please follow these steps from an elevated PowerShell (the namespace ComputerManagement11 is for SQL 2012, 10 is for 2008 & 2008 R2, 12 is for 2014, 13is for 2016):
$a = [WMI]'Root\Microsoft\SqlServer\ComputerManagement11:__Win32Provider.name="MSSQL_ManagementProvider"' $a.HostingModel = "NetworkServiceHost:SQL" $a.put()
To revert the change:
$a = [WMI]'Root\Microsoft\SqlServer\ComputerManagement11:__Win32Provider.name="MSSQL_ManagementProvider"' $a.HostingModel = "NetworkServiceHost" $a.put()
To increase the unload timeout to 30 minutes please follow these steps:
- Open WBEMTEST
- Click on Connect
- In Namespace type Root\Microsoft\SqlServer\ComputerManagement11 then the Connect button
- Click on the Query button
- Type select * from __win32provider where name = ‚MSSQL_ManagementProvider‘ then click Apply
- Double click on the resulting row
- Double click on the value UnloadTimeout
- Select Not NULL then type 00000000003000.000000:000 then click on Save Property
- Click on Save Object
- Click Close”
The one and only thing I could add to this, would be a personal request of mine: Please, try to take a couple of minutes of your time to give feedback to the those guys (the mail is in the comment of Alex), because they are actively using it to make the management pack better for all of us. Your contribution matters!