[netperf-talk] global question concerning Netperf test and SMP support

Simon Duboue Simon.Duboue at ces.ch
Fri Mar 30 01:39:52 PDT 2012


Hello and thank you for your enthusiasm.
This helps me a lot.
Forgive me for the lack of command lines…

>rick: Tests against lo are only that - tests against lo.  I never can 
recall 
>exactly where the looping-back takes place, but I know it includes no 
>driver path.  I would consider it merely a measure of CPU performance. 
>I suppose if loopback didn't do more than say 5 Gbit you wouldn't expect 
>to get > 5 Gbit with a "real" NIC, but seeing say 24 Gbit/s does not 
>guarantee one will get 10 Gbit/s through a 10GbE NIC.
>
>hangbin: I think lo test only affects the TCP/IP stack, no relation with 
NIC
>drivers.

Ok, this consideration could be an answer to my low performance with the 
NIC.
I perform ‘netperf –H 127.0.0.1’ in my host and in my client, here are the 
results:
Server: 
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 127.0.0.1 
(127.0.0.1) port 0 AF_INET
Recv   Send    Send 
Socket Socket  Message  Elapsed 
Size   Size    Size     Time     Throughput 
bytes  bytes   bytes    secs.    10^6bits/sec 

 87380  16384  16384    10.00    7040.84

Client:
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 127.0.0.1 
(127.0.0.1) port 0 AF_INET
Recv   Send    Send 
Socket Socket  Message  Elapsed 
Size   Size    Size     Time     Throughput 
bytes  bytes   bytes    secs.    10^6bits/sec 

 87380  16384  16384    10.00    20641.52 

It seems that my server will be limiting under your considerations.

>rick: I'm not sure that UDP sockets get autotuned.  They are what they 
are, 
>and what netperf reports will be what they are.  What message size are 
>you sending?
>
>You should look at per-CPU utilization, and the udp statistics in 
>netstat -s output - particularly on the receiver.  For completeness you 
>should also look at the ethtool -S statistics for the interfaces on 
>either side.

>hangbin: Our TCP_STREAM and UDP_STREAM test could reach > 9.5G/s on local 
>lab with 10G switch and NICs. you can try to enable gro or something 
else.
>And please paste your command lines and NIC drivers.

Ok for the nestat and ethtool stat. This is a good alternative to CPU 
utilization provided by Netperf. I will watch in this direction.
I use packet size from 18 bytes to 8900 with a MTU of 9000. Here are the 
results of a basic Netperf without changing packet size:

netperf –H ip_addr –t UDP_STREAM

From client to server:
UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
10.0.17.200 (10.0.17.200) port 0 AF_INET
Socket  Message  Elapsed      Messages 
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

112640   65507   10.00       82227      0    4309.12
108544           10.00       40416           2118.01

From server to client:
UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
10.0.17.11 (10.0.17.200) port 0 AF_INET
Socket  Message  Elapsed      Messages 
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

108544   65507   10.00       89012      0    4664.69
112640           10.00       79607           4171.82

>> In TCP STREAM test, I also run two tests: a standard TCP STREAM and a
>> standard TCP MAERTS and the results are very different with a 10x ratio
>> for the TCP MAERTS. How is it possible?

>rick: In addition to repeating the things to check from above, Please 
provide the specific command lines being used.

Here are the results of a basic Netperf test:

Netperf –H ip_addr –t TCP_STREAM (or –t TCP_MAERTS)

From client to server
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.17.200 
(10.0.17.200) port 0 AF_INET
Recv   Send    Send 
Socket Socket  Message  Elapsed 
Size   Size    Size     Time     Throughput 
bytes  bytes   bytes    secs.    10^6bits/sec 

 87380  16384  16384    10.19     738.73 

From server to client:
TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.17.200 
(10.0.17.200) port 0 AF_INET
Recv   Send    Send 
Socket Socket  Message  Elapsed 
Size   Size    Size     Time     Throughput 
bytes  bytes   bytes    secs.    10^6bits/sec 

 87380  16384  16384    10.00    4449.95 

How could it be faster in TCP than in UDP... Does my server is so 
limiting?

>rick: Based on how I interpret your question, the TCP/IP stack is fully 
SMP. 
>However...  a single "flow" (eg TCP connection) will not make use of the 
>services of more than one or possibly two CPUs on either end.  One 
>unless one binds the netperf/netserver to a CPU other than the one 
>taking interrupts from the NIC.

Ok for this, but I read that it is better to get the TCP connection and 
the NIC interrupts on the same CPU or group of CPU for memory access.
For my server, the interrupts are shared out between my 8 cores due to 
architecture considerations.
For my client, the interrupts are located on a single CPU.
Is it the spinlock which determines which core processes TCP/IP stack? 
A last question concerning TCP/IP stack: TCP/IP input and TCP/IP output 
are distinct, could and should they run in a separate core?

>happy benchmarking,

>rick jones

I hope this is clearer than my first message.

Thank you in advance and have a nice day.

Simon Duboué

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.netperf.org/pipermail/netperf-talk/attachments/20120330/36fad022/attachment.html>


More information about the netperf-talk mailing list