XPERF is a great Windows diagnostics tool!

Cisco C210 M2 problems with Hardware Interrupts

We were seeing the above SQL Server alerts with increasing frequency… a week ago we’d get 1 or 2 a day, and yesterday we were seeing spikes of up to 10 an hour.
Error text is
ERROR: 18056, Severity: 20, State: 29… OR “The client was unable to reuse a session with SPID 70, which had been reset for connection pooling. The failure ID is 29. This error may have been caused by an earlier operation failing. Check the error logs for failed operations immediately before this error message.”
CPU status in PerfMon showed nothing out of the ordinary. CPU stats in Resource Monitor showed an interesting cascade type pattern where all the CPU’s (24 on this machine) would spike to 80% then slowly fade back to 0. CPU0 on Node 1 however showed a more steady spike to 80% then back to 0% pattern.
In the interests of finding out what was on CPU0 on Node 1, I downloaded PE (Process Explorer). PE showed me nothing that could be tied to a single core, but I did notice that the Process Interrupts CPU Time was about 2/3rds greater than the System Idle Process.
I couldn’t glean any additional information from PE, so I downloaded XPERF, a Microsoft tool to dig into kernel performance. There I could see that CPU 12 (CPU 0 on Node 1) displayed a regular spike to 100% and back to 0%.
Xperf –on Diag –stackwalk PROFILE
Let that run for a while to capture the needed data
Xperf –d foo.etl to stop and merge the trace results into file foo.etl
Xperf foo.etl to see the results.
A stack walk showed that NDIS.sys was the leading cause of the interrupts. That told me that I should check for new network drivers. I installed the latest Cisco supplied Intel Drivers…
Below is a before and after of the DPC CPU Usage. The differences are night and day.
BEFORE

AFTER

Now, let’s see if we get any more nasty ERROR: 18056, Severity: 20, State: 29 errors logged in SQL Server. Followed in the logs with “The client was unable to reuse a session with SPID 70, which had been reset for connection pooling. The failure ID is 29. This error may have been caused by an earlier operation failing. Check the error logs for failed operations immediately before this error message.”

XPERF can be installed as part of the Windows SDK packaging. Make sure you select the Developer Tools, and then you can find the msi’s for the xperf under the installation directory.

2 thoughts on “XPERF is a great Windows diagnostics tool!

Leave a Reply to Jason Cancel reply

Your email address will not be published.

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.