[netperf-talk] CPU Utilization Issues
Lentz, Benjamin A.
blentz at cswg.com
Mon Feb 8 08:51:54 PST 2010
> If you add a -c and -C option to netperf, what CPU utilization does it
> report?
I have contacted the folks necessary in order to bring down the
clustering services again for another test... We'll see. Do we need to
specify a [rate] value, or just leave it empty?
> BTW, is this the system on which netperf is running that is thought to
> be dead, or the one that is the target of the netperf test (ie where
> netserver is running)?
This is a tricky question. I believe the Oracle node eviction process is
operating under several mechanisms to determine node death... And we've
seen all combinations in our testing. One of the other problems we're
facing is that in a two-node cluster, determining which node is actually
malfunctioning is difficult for the system... Adding a third node (or
using odd-numbered quantity of nodes) solves this problem... We do not
have the hardware for this at this time though.
> Perhaps. Can you provide information such as the output of
> /proc/interrupts and then /proc/irq/<irq>/smp_affinity for some of the
NICs?
Sure, here's one of our systems.
$ cat /proc/interrupts | grep eth
59: 103 326966 1241556 1240838 PCI-MSI eth3
67: 47 302 458 3928039 PCI-MSI eth5
146: 206 0 3655016 4794 PCI-MSI eth0
178: 143 2179700 405245 224575 PCI-MSI eth1
210: 103 296887 1163469 1343220 PCI-MSI eth2
218: 34 655377 2258 767 PCI-MSI eth4
$ for irq in 59 67 146 178 210 218; do sudo cat
/proc/irq/$irq/smp_affinity; done
00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000008
00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000008
00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000004
00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000002
00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000004
00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000002
> Is the irqbalance service running.
Yes, it is.
> You are 100% certain they are running a UDP_STREAM and not a
TCP_STREAM
> test? Are you using the exact same command line as they are? They
> haven't done something like ./configure --enable-intervals and "paced"
> the UDP_STREAM test?
Well, I have to believe what I am told, but yeah, the best information I
have available is that we're running the exact same command between the
two systems (one crashing Oracle, the other not). I have no clue about
the ./configure options used, but I've posed the question to them via
email.
> Out of curiousity, what led you to want to run a UDP_STREAM test over
> say a TCP_STREAM test?
FWIW, all the Oracle inter-node communication is done over UDP inside
the Oracle configuration.
>
>
> This is a good question. Our vendor is following Oracle Metalink
> Documents 810394.1 and 563566.1, which appear to make reference to
> netperf.
>
> Oddly enough, all TCP_STREAM testing the vendor did seems to work
fine.
TCP_STREAM benefits from TCP's flow control. It will not run wild and
potentially fill the drivers transmit queue. Any other traffic will
then have a much better chance of actually getting through.
Versus a UDP_STREAM test, where other, non-netperf traffic (such as
those used by Oracle to run OCFS2 or RAC clustering) will have a
*difficult* time getting through? This would explain why TCP can
co-exist and why UDP can not.
> I love to see people using netperf. However I have to remind folks
that
> netperf is a *performance benchmark* - it is *not* a functional test
> tool. Even though it can often uncover functional problems in a stack
> or NIC or whatnot.
>
> If the desire is to see that the interconnect is working properly, I
> would think that "successful" netperf tests without the clustering
> software running would be sufficient. Of course I am a networking
guy,
> not a clustering or database type :)
Good answer :-)
Thanks for all your help thus far. It's greatly appreciated.
More information about the netperf-talk
mailing list