DQ Networks

Debugging PIX High CPU Utilisation

This is the product of a heart-stopping moment when a normally trouble-free PIX saw its CPU utilisation suddenly jump to 100% from its normal level of around 25%-30%.

You can determine the CPU utilisation from the command line:-

CW-PIX# show cpu usage CPU utilization for 5 seconds = 24%; 1 minute: 27%; 5 minutes: 26%

The next thing you will probably want to know is which processes are unusually busy. The output of "show processes" looks like this:-

PC SP STATE Runtime SBASE Stack Process Hsi 001f02c9 02a0871c 0056ed50 40 02a07794 3404/4096 arp_timer Lsi 001f5a95 02b8ba64 0056ed50 270 02b8aaec 3696/4096 FragDBGC Cwe 001100e2 02ba904c 02ba66d8 111450 02ba80f4 2108/4096 VAC poll Hsi 001104b5 02baa0ec 0056ed50 30 02ba91a4 3616/4096 VAC statistics Lwe 0011a13f 02bab0bc 005724b8 0 02baa254 3688/4096 dbgtrace Lwe 003fb2fd 02bad24c 00567688 250576030 02bab304 6832/8192 Logger Hsi 003ff455 02bb0344 0056ed50 10 02bae3cc 7856/8192 tcp_fast ...etc

The only field that is remotely useful here is the "Runtime" field which is the total CPU time that process has used since monitoring began (see The PIX show processes Command). According to Monitoring PIX Performance, the only way to determine which process is hogging the CPU is to run the command twice in succession, about a minute apart and to compare the two. This is difficult to say the least as there are 68 processes and the numbers are large and unwieldy. I have written a Perl script to compare two consecutive "show processes". You can download it from here or click here to use an on-line version.

The output of the script is something like this:-

arp_timer 0 FragDBGC 0 VAC poll 0 VAC statistics 0 dbgtrace 0 Logger 760 tcp_fast 0 tcp_slow 0 xlate clean 0 uxlate clean 0 tcp_intercept_timer_p 0 route_process 0 PIX Garbage Collector 40 isakmp_time_keeper 0 perfmon 0 IPsec timer handler 0 qos_metric_daemon 0 IP Background 0 pix/trace 0 pix/tconsole 0 ci/console 0 lu_ctl 0 update_cpu_usage 0 uauth_in 0 uauth_thread 0 udp_timer 0 i82543_timer 82680 557mcfix 0 557poll 174240 557timer 0 pix/intf0 70410 riprx/0 0 riptx/0 0 pix/intf1 160240 riprx/1 0 riptx/1 0 pix/intf2 0 riprx/2 0 ...

The number is the difference in "Runtime" value between the two consecutive "show processes". The trick is to then look for the one(s) with the big numbers. Some of them are supposed to be big (e.g. 557poll, i82543_timer - read the Cisco documents referenced above for details). In this particular case, the culprit is "pix/intf1" which is the inside interface on the PIX, indicating that the problem came from the inside.

References