[netperf-talk] global question concerning Netperf test and SMP support
Simon Duboue
Simon.Duboue at ces.ch
Fri Mar 30 01:39:52 PDT 2012
Hello and thank you for your enthusiasm.
This helps me a lot.
Forgive me for the lack of command lines…
>rick: Tests against lo are only that - tests against lo. I never can
recall
>exactly where the looping-back takes place, but I know it includes no
>driver path. I would consider it merely a measure of CPU performance.
>I suppose if loopback didn't do more than say 5 Gbit you wouldn't expect
>to get > 5 Gbit with a "real" NIC, but seeing say 24 Gbit/s does not
>guarantee one will get 10 Gbit/s through a 10GbE NIC.
>
>hangbin: I think lo test only affects the TCP/IP stack, no relation with
NIC
>drivers.
Ok, this consideration could be an answer to my low performance with the
NIC.
I perform ‘netperf –H 127.0.0.1’ in my host and in my client, here are the
results:
Server:
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 127.0.0.1
(127.0.0.1) port 0 AF_INET
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 16384 16384 10.00 7040.84
Client:
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 127.0.0.1
(127.0.0.1) port 0 AF_INET
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 16384 16384 10.00 20641.52
It seems that my server will be limiting under your considerations.
>rick: I'm not sure that UDP sockets get autotuned. They are what they
are,
>and what netperf reports will be what they are. What message size are
>you sending?
>
>You should look at per-CPU utilization, and the udp statistics in
>netstat -s output - particularly on the receiver. For completeness you
>should also look at the ethtool -S statistics for the interfaces on
>either side.
>hangbin: Our TCP_STREAM and UDP_STREAM test could reach > 9.5G/s on local
>lab with 10G switch and NICs. you can try to enable gro or something
else.
>And please paste your command lines and NIC drivers.
Ok for the nestat and ethtool stat. This is a good alternative to CPU
utilization provided by Netperf. I will watch in this direction.
I use packet size from 18 bytes to 8900 with a MTU of 9000. Here are the
results of a basic Netperf without changing packet size:
netperf –H ip_addr –t UDP_STREAM
From client to server:
UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
10.0.17.200 (10.0.17.200) port 0 AF_INET
Socket Message Elapsed Messages
Size Size Time Okay Errors Throughput
bytes bytes secs # # 10^6bits/sec
112640 65507 10.00 82227 0 4309.12
108544 10.00 40416 2118.01
From server to client:
UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
10.0.17.11 (10.0.17.200) port 0 AF_INET
Socket Message Elapsed Messages
Size Size Time Okay Errors Throughput
bytes bytes secs # # 10^6bits/sec
108544 65507 10.00 89012 0 4664.69
112640 10.00 79607 4171.82
>> In TCP STREAM test, I also run two tests: a standard TCP STREAM and a
>> standard TCP MAERTS and the results are very different with a 10x ratio
>> for the TCP MAERTS. How is it possible?
>rick: In addition to repeating the things to check from above, Please
provide the specific command lines being used.
Here are the results of a basic Netperf test:
Netperf –H ip_addr –t TCP_STREAM (or –t TCP_MAERTS)
From client to server
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.17.200
(10.0.17.200) port 0 AF_INET
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 16384 16384 10.19 738.73
From server to client:
TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.17.200
(10.0.17.200) port 0 AF_INET
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 16384 16384 10.00 4449.95
How could it be faster in TCP than in UDP... Does my server is so
limiting?
>rick: Based on how I interpret your question, the TCP/IP stack is fully
SMP.
>However... a single "flow" (eg TCP connection) will not make use of the
>services of more than one or possibly two CPUs on either end. One
>unless one binds the netperf/netserver to a CPU other than the one
>taking interrupts from the NIC.
Ok for this, but I read that it is better to get the TCP connection and
the NIC interrupts on the same CPU or group of CPU for memory access.
For my server, the interrupts are shared out between my 8 cores due to
architecture considerations.
For my client, the interrupts are located on a single CPU.
Is it the spinlock which determines which core processes TCP/IP stack?
A last question concerning TCP/IP stack: TCP/IP input and TCP/IP output
are distinct, could and should they run in a separate core?
>happy benchmarking,
>rick jones
I hope this is clearer than my first message.
Thank you in advance and have a nice day.
Simon Duboué
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.netperf.org/pipermail/netperf-talk/attachments/20120330/36fad022/attachment.html>
More information about the netperf-talk
mailing list