Home

View RSS Feed

CRAIGDWILSON

SMParse - Performance and Health Check Utility

Rate this Entry
One of the Key Items to examine on a Primary Server to determine if there is a Load Issue on one of the three Back-end Components (Primary, Database, or LDAP Server) is how long it takes to process Assignment Requests for Users and Devices. Tracking and analyzing spikes in the time required to process requests can help determine if one of the three primary components is experiencing load issues during different times of the day. Since each environment is different, it is best to analyze peak times versus off peak times than to state specific timings to judge load issues. Once load issues are isolated, then steps can be taken to try and address the performance of that specific area, though this specific article will focus on identification of potential areas of concern rather than resolutions for each of the three specific areas.

Assignment requests can be broken down into User Assignment Requests and Device Assignment Requests. If a User Assignment takes a long time, it indicates a possible load issue with the Primary Server, the Database Server, or the LDAP Server. If a device assignment request takes a long time, it indicates a possible load issue with the Primary Server or the Database Server, but in General LDAP source issues should not directly impact device assignment requests.

If the delay is only in User Assignments and not Device assignments, it normally indicates a bottleneck with the LDAP user source. If there are multiple primary servers and the performance issue tends to occur only on some servers but not others, it tends to indicate a load or other issues on the server, but not the database itself since requests from other servers are handled promptly. If both user and device requests are slow from both high and low use servers, it will generally indicate a database performance issue. (Note: User Requests should always tend to have a slightly higher average since resolving user assignments require querying LDAP along with the database versus simply querying the database for device requests.)

The primary server's services-messages.log can be used to find out how long assignment requests are taking to be fulfilled. Search the log for the phrase getAllAssignments complete, time: and examine how long the request took to be fulfilled. If the value for object: on that line contains a ~ separating two strings, that the object is a user. If object: does not contain a ~, then it will be a device object. Below is an example of a device entry followed by a user entry:

[5/14/16 8:19:21 AM] [Assignment Web Service] [getAllAssignments complete, time: 121ms, object: 5b39027b57e2c14d8b31ff2e89c89bea]
[5/14/16 8:19:21 AM] [Assignment Web Service] [getAllAssignments complete, time: 695ms, object: e87125e4f07e7588a1c4d283bafb2594~7006cf90e2a6de118 b3c001e4f2ba32a]

In general, so long as the assignment request does not exceed two minutes, it should not cause any issues beyond slowness on a client device. Assignment requests exceeding two minutes may in some cases result in an agent not receiving all assignments due to timeouts on the client side. However, if a large number of requests take an excessive amount of time to complete, it could results in a backlog and a cascading performance issue. Expected performance results can vary between implementations depending on the complexity of the ZCM design, hardware, database type, LDAP type and numerous other factors. Comparing peak and off-peak usage performance results is the best way to determine peak load performance degradation.

--

SMParse.exe is a utility that will parse the Services-Messages.log file from a primary server and provide summary analysis of general performance, load over time, and extract out common "exceptions" for analysis.

To use the utility, copy SMParse.exe and the Services-Messages.log to a temporary working directory and launch the utility.
There will be a prompt for a "String Match" string.
If this optional field is populated, it will output all lines containing that string to a file called "STRING_Match.txt.
This is useful for gathering details about a particular user, device, PID, or any other specific detail that should be parsed out for specific analysis.

After pressing 'OK', the services-messages.log will be processed and create a "Summary.txt" file to be reviewed. (Sample Below)

Total LDAP Connection Create Exceptions: 0
Total DB Connection Exceptions: 0
Fast Bundle Content Lookup(under 10 seconds): 735
Slow Bundle Content Lookup(over 10 seconds): 0

Total Device Assignment Requests: 869
Number Completed in Excess of 1 second: 338
Number Completed in Excess of 5 seconds: 28
Number Completed in Excess of 10 seconds: 4
Number Completed in Excess of 20 seconds: 2
Number Completed in Excess of 30 seconds: 0
Number Completed in Excess of 60 seconds: 0

Total User Assignment Requests: 763
Number Completed in Excess of 1 second: 12
Number Completed in Excess of 5 seconds: 8
Number Completed in Excess of 10 seconds: 1
Number Completed in Excess of 20 seconds: 1
Number Completed in Excess of 30 seconds: 1
Number Completed in Excess of 60 seconds: 0

Probable 10 Sec Server Side Sleep due to Test/Chained Assignments. Not a Concern
* User Assignments Completed in 10.0 to 10.5 Seconds : 24
* Device Assignments Completed in 10.0 to 10.5 Seconds : 1
* Number of String Matches '' : 0


HOURLY RESULTS
______________

[05/07/2014 13] UserCount:380 DeviceCount:391
[05/07/2014 14] UserCount:66 DeviceCount:75
[05/07/2014 15] UserCount:306 DeviceCount:391
[05/07/2014 16] UserCount:11 DeviceCount:12


-------------------------------------------------------------------------

Separate text files will be created in the same directory that contain the associated lines from the log file for both LDAP and DB Exceptions.
(Note: Do not be concerned about the occasional error, the system is designed to handle such events.)

"Content Lookup" has nothing to do with physical data files.
This is how long the database spends gathering the details regarding which content files are used by a bundle.
The most common issue this will identify are "Corrupt" bundles, resulting in the Lookup to take minutes.
While very rare, the most common type of bundle to see this issue are ZPM bundles.
The resolution would be to delete and download again that specific patch bundle.

Assignment Requests that take less than 1 second are not logged in detail, but all others are broken down into files named Userxx.txt and Devicexx.txt for further analysis.
These files can then be analyzed to see if there are changes in performance levels during specific times of the day.

(Note: The Server Side Code will intentionally enter a 10 second "sleep" state while processing certain assignments, causing some assignments to take just over 10.x seconds.
This sleep state is not any indication of a performance issues. Thus any assignment that fall in the very narrow range of 10.0 to 10.5 will be treated as taking 1 second for counts, but
are broken out into UserSleep.txt and DeviceSleep.txt.)

At the bottom of summary.txt, the assignment totals are broken out on an hourly level to also assist in determining load during different times of the day.

Note: The Primary server is not required to be in debug for the server to log the information necessary for this utility.


SMParse9.exe (Version 9.0) -> https://drive.google.com/file/d/0B0o...ew?usp=sharing

Submit "SMParse - Performance and Health Check Utility" to Twitter Submit "SMParse - Performance and Health Check Utility" to Facebook Submit "SMParse - Performance and Health Check Utility" to Google Submit "SMParse - Performance and Health Check Utility" to Digg Submit "SMParse - Performance and Health Check Utility" to del.icio.us Submit "SMParse - Performance and Health Check Utility" to StumbleUpon

Updated 27-Sep-2016 at 11:10 AM by CRAIGDWILSON

Categories
Uncategorized

Comments