[netperf-talk] CPU Utilization Issues
Rick Jones
rick.jones2 at hp.com
Fri Feb 5 13:12:23 PST 2010
Lentz, Benjamin A. wrote:
> Greetings!
>
> I am working with one of my company's vendors on confirming the health
> of a Oracle RAC interconnect between two Red Hat 5.3 x86_64 systems
> (kernel 2.6.18-128.el5).
>
> We're seeing very high CPU utilization when performing the following
> test with netperf-2.4.0-rc2:
>
> netperf -l 5 -H 192.168.100.11 -i 10,2 -I 99,10 -t UDP_STREAM -- -m 1472
> -s 32768 -S 32768
>
> The CPU utilization is so high, in fact, that Oracle Clusterware begins
> node eviction procedures when this test is performed while Clusterware
> is running.
How high is "so high?"
The netperf UDP_STREAM test has no flow-control of its own. Netperf
will sit there and send UDP datagrams as fast as it can. If there is no
intra-stack flow control, it will take a core to 100% CPU utilization.
If the path length of the stack is long enough relative to the "oomph"
of the core, it will go to 100% utilization on a core even on a stack
with intra-stack flow control if that bottleneck is reached before the
link itself.
> System is a IBM 3850 M2 / x3950 M2 -[7141AC1]- with a single quad-core
> Intel Xeon CPU X7350 @ 2.93GHz and 32GB of memory. Ethernet interfaces
> PCI-e Intel e1000 cards.
>
> Our vendor has another, much smaller system, with a single dual-core
> Intel CPU. However, when they run this exact same test in their
> environment, they do not have node eviction.
Does Oracle Clusterware try to send traffic through the same NIC while
the test is running?
> Interconnect is GbE, connected through Cisco gear. With clusterware
> down, get fairly good results (950MBit on a GbE) running netperf. We've
> also verified with fping, iperf, traceroute, ping, etc., and found good
> results.
>
> We are trying very desperately to answer these questions:
> - Why do we get node eviction on our IBM 3850 system and not on the
> vendor's dual-core system?
> - Does having netperf-2.4.0-rc2 versus netperf-2.4.5 make a difference?
> Do we need to upgrade netperf to the current version in order to get
> usable results in our environment?
> - How can we actually be confident in the stability, throughput, and
> latency of our testing?
Netperf can be asked to try to achieve different levels of confidence in
the result it reports with the -i and -I options:
http://www.netperf.org/svn/netperf2/tags/netperf-2.4.5/doc/netperf.html#Global-Options
Out of curiousity, what led you to want to run a UDP_STREAM test over
say a TCP_STREAM test?
> - Can netperf certify it's use on Red Hat 5.3 x86_64 or on IBM 3850
> hardware with Intel e1000 cards?
Not sure what you mean by certify. I have used netperf scores of times
under RHEL5.3 x86_64, with a number of different NICs, but not with an
IBM 3850.
> We've tried asking Red Hat about this, and they've reported that they do
> not support netperf. They suggested iperf.
I'll have to have a talk with some of the RHEL folks about that... :) I
do know there are at least a few folks at Red Hat who are quite fond of
netperf.
> We've tried asking Oracle about this, since it is in their Metalink
> documentation that they refer our vendor to netperf. On one instance,
> they report that if Red Hat doesn't support netperf, then they don't
> support it either. On yet another instance, they report that they find
> fault with the operating system or network interconnect or network gear
> if the netperf testing does indeed indicate success in one environment
> and not the other.
>
> In order to eliminate the network piece, I retested the same UDP stream
> test against the localhost interface and witnessed very high CPU
> utilization. This was done with Oracle Clusterware down, as to not cause
> node eviction and a reboot of the nodes.
>
> $ sudo ./netperf/netperf-2.4.0-rc2/src/netperf -l 60 -H 127.0.0.1 -i
> 10,2 -I 99,10 -t UDP_STREAM -- -m 1472 -s 32768 -S 32768
> UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
> 127.0.0.1 (127.0.0.1) port 0 AF_INET : +/-5.0% @ 99% conf.
>
> top - 11:25:38 up 28 days, 18:56, 7 users, load average: 1.10, 0.42, 0.25
> Tasks: 169 total, 3 running, 166 sleeping, 0 stopped, 0 zombie
> Cpu(s): 0.7%us, 25.8%sy, 0.0%ni, 56.6%id, 0.0%wa, 0.0%hi, 16.9%si,
> 0.0%st
> Mem: 32959572k total, 15081576k used, 17877996k free, 563688k buffers
> Swap: 33554424k total, 684k used, 33553740k free, 13854536k cached
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 3008 root 25 0 6472 708 536 R 99.8 0.0 0:43.41 netperf
> 3009 root 15 0 6476 400 284 R 44.6 0.0 0:18.97 netserver
> 2229 root 15 0 12736 1196 820 S 0.3 0.0 0:00.53 top
> 1 root 15 0 10344 684 572 S 0.0 0.0 1:13.46 init
> 2 root RT -5 0 0 0 S 0.0 0.0 0:06.02 migration/0
> 3 root 34 19 0 0 0 S 0.0 0.0 0:04.21 ksoftirqd/0
> 4 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/0
> 5 root RT -5 0 0 0 S 0.0 0.0 0:09.09 migration/1
> 6 root 34 19 0 0 0 S 0.0 0.0 0:02.18 ksoftirqd/1
>
That is pretty much "normal" - netperf has gone to 100% CPU utilization
on its core, netserver might go there too, but some of the "inbound"
processing probably happened in netperf's context rather than netservers.
> If there's anyone who's used netperf to confirm that health or status of
> a network interconnect, we'd like to inquire as to what version, what
> testing methods, and what hardware and operating system combination is
> being used.
>
> I understand that this utility does not come with commercial support,
> but our company has a substantial project riding on the build-out
> of this RAC cluster and our vendor will not proceed with the build
> without full undeniable verification of the network interconnect using
> the netperf utility. Even the success of the netperf run is insufficient
> it seems, given that we have a conflict with Oracle Clusterware that
> does not exist in our vendor's smaller environment.
>
> Thank you for your time.
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> netperf-talk mailing list
> netperf-talk at netperf.org
> http://www.netperf.org/cgi-bin/mailman/listinfo/netperf-talk
More information about the netperf-talk
mailing list