[netperf-talk] CPU Utilization Issues
Lentz, Benjamin A.
blentz at cswg.com
Fri Feb 5 10:21:32 PST 2010
Greetings!
I am working with one of my company's vendors on confirming the health
of a Oracle RAC interconnect between two Red Hat 5.3 x86_64 systems
(kernel 2.6.18-128.el5).
We're seeing very high CPU utilization when performing the following
test with netperf-2.4.0-rc2:
netperf -l 5 -H 192.168.100.11 -i 10,2 -I 99,10 -t UDP_STREAM -- -m 1472
-s 32768 -S 32768
The CPU utilization is so high, in fact, that Oracle Clusterware begins
node eviction procedures when this test is performed while Clusterware
is running.
System is a IBM 3850 M2 / x3950 M2 -[7141AC1]- with a single quad-core
Intel Xeon CPU X7350 @ 2.93GHz and 32GB of memory. Ethernet interfaces
PCI-e Intel e1000 cards.
Our vendor has another, much smaller system, with a single dual-core
Intel CPU. However, when they run this exact same test in their
environment, they do not have node eviction.
Interconnect is GbE, connected through Cisco gear. With clusterware
down, get fairly good results (950MBit on a GbE) running netperf. We've
also verified with fping, iperf, traceroute, ping, etc., and found good
results.
We are trying very desperately to answer these questions:
- Why do we get node eviction on our IBM 3850 system and not on the
vendor's dual-core system?
- Does having netperf-2.4.0-rc2 versus netperf-2.4.5 make a difference?
Do we need to upgrade netperf to the current version in order to get
usable results in our environment?
- How can we actually be confident in the stability, throughput, and
latency of our testing?
- Can netperf certify it's use on Red Hat 5.3 x86_64 or on IBM 3850
hardware with Intel e1000 cards?
We've tried asking Red Hat about this, and they've reported that they do
not support netperf. They suggested iperf.
We've tried asking Oracle about this, since it is in their Metalink
documentation that they refer our vendor to netperf. On one instance,
they report that if Red Hat doesn't support netperf, then they don't
support it either. On yet another instance, they report that they find
fault with the operating system or network interconnect or network gear
if the netperf testing does indeed indicate success in one environment
and not the other.
In order to eliminate the network piece, I retested the same UDP stream
test against the localhost interface and witnessed very high CPU
utilization. This was done with Oracle Clusterware down, as to not cause
node eviction and a reboot of the nodes.
$ sudo ./netperf/netperf-2.4.0-rc2/src/netperf -l 60 -H 127.0.0.1 -i
10,2 -I 99,10 -t UDP_STREAM -- -m 1472 -s 32768 -S 32768
UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
127.0.0.1 (127.0.0.1) port 0 AF_INET : +/-5.0% @ 99% conf.
top - 11:25:38 up 28 days, 18:56, 7 users, load average: 1.10, 0.42,
0.25
Tasks: 169 total, 3 running, 166 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.7%us, 25.8%sy, 0.0%ni, 56.6%id, 0.0%wa, 0.0%hi, 16.9%si,
0.0%st
Mem: 32959572k total, 15081576k used, 17877996k free, 563688k buffers
Swap: 33554424k total, 684k used, 33553740k free, 13854536k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3008 root 25 0 6472 708 536 R 99.8 0.0 0:43.41 netperf
3009 root 15 0 6476 400 284 R 44.6 0.0 0:18.97 netserver
2229 root 15 0 12736 1196 820 S 0.3 0.0 0:00.53 top
1 root 15 0 10344 684 572 S 0.0 0.0 1:13.46 init
2 root RT -5 0 0 0 S 0.0 0.0 0:06.02 migration/0
3 root 34 19 0 0 0 S 0.0 0.0 0:04.21 ksoftirqd/0
4 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/0
5 root RT -5 0 0 0 S 0.0 0.0 0:09.09 migration/1
6 root 34 19 0 0 0 S 0.0 0.0 0:02.18 ksoftirqd/1
If there's anyone who's used netperf to confirm that health or status of
a network interconnect, we'd like to inquire as to what version, what
testing methods, and what hardware and operating system combination is
being used.
I understand that this utility does not come with commercial support,
but our company has a substantial project riding on the build-out of
this RAC cluster and our vendor will not proceed with the build without
full undeniable verification of the network interconnect using the
netperf utility. Even the success of the netperf run is insufficient it
seems, given that we have a conflict with Oracle Clusterware that does
not exist in our vendor's smaller environment.
Thank you for your time.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.netperf.org/pipermail/netperf-talk/attachments/20100205/2e5578ce/attachment.html>
More information about the netperf-talk
mailing list