[netperf-talk] Which is the best hardware configuration for Netperf-Server?
Rick Jones
rick.jones2 at hp.com
Tue Oct 27 10:22:26 PDT 2009
Andrew Gallatin wrote:
> Frank Schuster wrote:
> > Hello,
> >
>> after I get Netperf successful running on a test system I want to install a
>> large server with netserver (netperf 2.4.5) in my backbone.
>>
>> But my question is what is the best hardware configuration expect of the
>> network interface? The network interface can up to more than 10 gbps.
>
> I'm sorry, I cannot really understand what you are asking. If you're
> asking how to configure your NIC, usually the best answer is "don't
> configure it at all". Are you asking what arguments to pass to netperf?
Further, what is the goal of this netperf server? To allow those accessing it
to see how fast *their* connection to the server might be?
> > Is netperf supporting multicore processors?
>
> Netperf 2.x is single threaded. The netserver server forks a new
> child for each connection. This means you can run multiple processes
> easily. Note that netperf2 does not coordinate between multiple
> processes, so you need to be careful not to rely on the numbers
> produced by those multiple processes, as they may be inaccurate (say
> process A starts 0.5 seconds before process B -- it will have the
> network to itself for 0.5 seconds at the start, and B will have the
> network to itself for 0.5 seconds at the end, and your test will
> report overly optimistic bandwidth). Either use system tools
> (netstat -i 1) to calculate b/w from multiple processes, or use a
> different tool (netperf4, uperf, or iperf as a last resort) that was
> intended for multiple threads.
Netperf4 is indeed the intended took for concurrent testing. However, there is a
way to (ab)use the confidence intervals functionality in netperf2 to be
reasonably confident that skew error from the lack of synchronization isn't
excessive.
http://www.netperf.org/svn/netperf2/trunk/doc/netperf.html#Using-Netperf-to-Measure-Aggregate-Performance
> I'll let Rick talk about netperf4. I'm too dumb to figure out how
> to use it, so even though I'm a netperf2 "expert", I've never used
> netperf4.
To be perfectly frank, I have trouble using netperf4 too :) It does rather need
some UI improvements. The XML config files are quite powerful, but a trifle
cumbersome.
> You can easily utilize more than one core by using netperf's CPU
> binding options (-T$locCPUnum,$remoteCPUnum) and binding the
> NIC's interrupt handler(s) to different cores than the
> netperf/netserver processes.
Expanding a bit, because I've been remis in updating doc/netperf.texi to
properly document the global -T option:
-T N # bind netperf and netserver to CPU id N on their respective systems
-T N, # bind netperf only, let netserver run where it may
-T ,M # bind netserver only, let netperf run where it may
-T N,M # bind netperf to CPU N and netserver to CPU M
BTW, this is a netperf option, not a netserver option.
> Or you can run multiple netperfs
> and use the binding options to bind them to different cores.
>
> > --> If yes, how can I configure netperf to use more than one processor?
> > --> If yes or no, how many MHz should be each core have?
>
> Endtation network performance as measured by netperf is not really
> about "MHz". It is more about memory bandwidth and apparent MTU size.
I would put it as "MTU size and CPU horsepower vs stack pathlength" but I agree
completely - avoid MHz. Alas, even that exposes a common bias in network
performance measurment - that bulk Mbit/s is the be-all and end-all, and does
not speak to latency :)[1] For network latency, "bandwidth" is perhaps not as
key, nor MTU (well, when speaking to small-packet latency) and it is the
relationship between the stack path-length, the CPU horsepower, and these days
the behaviour of the NIC/driver's interrupt coalescing implementation.
> Eg, I can easily saturate a 10GbE link with standard frames on a 2GHz
> Nehalem Xeon using less than 1/2 of a CPU core when the OS and driver
> support TSO and LRO. Under the same load, a 3GHz P4 would choke because
> it doesn't have the memory bandwidth. Similarly, turn off TSO and
> LRO on the Nehalem, and you might be able to saturate the link, but it
> would take multiple parallel netperfs.
Each consuming all of one of the cores in the Nehalem :) Indeed, one should
eschew the Myth of Megahertz - when I "normalize" netperf results to something
else I tend to use either SPECint_2006 or SPECint_rate2006. I use that
benchmark as a proxy for "hardware horsepower."
> > How many RAM is prefered for long time running test and optionally
> > more than one thread (I think so up to 10 threads from one or more clients)?
>
> Netperf itself uses almost no memory. There was a discussion earlier
> on this list regarding how many buffers it uses. I've run
> multiple copies of netperf on machines with as little as 128MB of RAM.
Apart from what will be allocated in the stack for socket buffers, and what will
be used for the process stack, netperf/netserver will allocate as much as one
send/recv size-worth more than the size of the socket buffer. So, if there is a
128 KB SO_SNDBUF and netperf is sending 4KB at a time, it will allocate a "send
ring" of 132KB of 4KB buffers. This can be overridden with the global -W
option, specifing the number of buffers in the rings directly.
As for long running tests, in theory there are no unplugged memory leaks, but of
course, writing software means never having to say you've found all the bugs :)
> > I think the hard disk is not so interesting or I'm wrong? I think
> > netperf needs only space on the hard disk for swapping.
>
> If you're using netperf -t TCP_SENDFILE and sending a file which is
> not cached in the server's memory, then the disk speed will matter.
> Otherwise, disk speed is largely irrelevant.
And since we are talking about "classic" netperf2 tests not "omni" and since we
are talking about the server side here and since there is not a TCP_ELIFDNES
(sendfile backwards) test (in contrast to the TCP_MAERTS test, sending from
netserver to netperf) there will be no sendfile() calls on the central server.
So, indeed, apart from the needs/desires of the systems virtual memory subsystem
and its swap policies, there is no need for disc I/O performance in a netperf2
netserver.
happy benchmarking,
rick jones
> Best regards,
>
> Drew
> _______________________________________________
> netperf-talk mailing list
> netperf-talk at netperf.org
> http://www.netperf.org/cgi-bin/mailman/listinfo/netperf-talk
[1] gather 'round children, time for a story :) Long long ago, in an
organization far far away, there was a brand-new EISA FDDI NIC. Being FDDI,
this NIC provided an MTU three times that of a 10 Mbit/s Ethernet interface, and
coincidentally it provided three times the bandwidth on a netperf TCP_STREAM
test. All was happiness and joy, and the virtues of this NIC were touted far
and wide. "Three times the bandwidth! Upgrade your servers and go three times
faster!" was the cry from the Heralds of Marketing. Then John Q. Customer
upgraded his NFS server from 10Mbit/s EISA Ethernet to the wizzy new ESIA FDDI
card. Much to his dismay and disgust his NFS server ran slower with the 3X
bandwidth EISA FDDI card. "How could this be!?!" demanded the customer. Well,
The Heralds of Marketing had overlooked something seen by the Network Wizzard
while peering into his cauldron of performance results. They overlooked that
the ESIA FDDI card also had three times the latency of the EISA 10 Mbit/s
Ethernet card, and that while everyone quoted MB/s figures for NFS Read and NFS
Write they forgot that NFS serving was request response and included more than
just reads and writes.
More information about the netperf-talk
mailing list