<font size=2 face="Courier New">Hello and thank you for your enthusiasm.</font>
<br><font size=2 face="Courier New">This helps me a lot.</font>
<br><font size=2 face="Courier New">Forgive me for the lack of command
lines…</font>
<br>
<br><font size=2 face="Courier New">>rick: Tests against lo are only
that - tests against lo. I never can recall </font>
<br><font size=2 face="Courier New">>exactly where the looping-back
takes place, but I know it includes no </font>
<br><font size=2 face="Courier New">>driver path. I would consider
it merely a measure of CPU performance. </font>
<br><font size=2 face="Courier New">>I suppose if loopback didn't do
more than say 5 Gbit you wouldn't expect </font>
<br><font size=2 face="Courier New">>to get > 5 Gbit with a "real"
NIC, but seeing say 24 Gbit/s does not </font>
<br><font size=2 face="Courier New">>guarantee one will get 10 Gbit/s
through a 10GbE NIC.</font>
<br><font size=2 face="Courier New">></font>
<br><font size=2 face="Courier New">>hangbin: I think lo test only affects
the TCP/IP stack, no relation with NIC</font>
<br><font size=2 face="Courier New">>drivers.</font>
<br>
<br><font size=2 face="Courier New">Ok, this consideration could be an
answer to my low performance with the NIC.</font>
<br><font size=2 face="Courier New">I perform ‘netperf –H 127.0.0.1’
in my host and in my client, here are the results:</font>
<br><font size=2 face="Courier New">Server: </font>
<br><font size=2 face="Courier New">TCP STREAM TEST from 0.0.0.0 (0.0.0.0)
port 0 AF_INET to 127.0.0.1 (127.0.0.1) port 0 AF_INET</font>
<br><font size=2 face="Courier New">Recv Send Send
</font>
<br><font size=2 face="Courier New">Socket Socket Message Elapsed
</font>
<br><font size=2 face="Courier New">Size Size Size
Time Throughput </font>
<br><font size=2 face="Courier New">bytes bytes bytes
secs. 10^6bits/sec </font>
<br>
<br><font size=2 face="Courier New"> 87380 16384 16384
10.00 7040.84</font>
<br>
<br><font size=2 face="Courier New">Client:</font>
<br><font size=2 face="Courier New">TCP STREAM TEST from 0.0.0.0 (0.0.0.0)
port 0 AF_INET to 127.0.0.1 (127.0.0.1) port 0 AF_INET</font>
<br><font size=2 face="Courier New">Recv Send Send
</font>
<br><font size=2 face="Courier New">Socket Socket Message Elapsed
</font>
<br><font size=2 face="Courier New">Size Size Size
Time Throughput </font>
<br><font size=2 face="Courier New">bytes bytes bytes
secs. 10^6bits/sec </font>
<br>
<br><font size=2 face="Courier New"> 87380 16384 16384
10.00 20641.52 </font>
<br>
<br><font size=2 face="Courier New">It seems that my server will be limiting
under your considerations.</font>
<br>
<br><font size=2 face="Courier New">>rick: I'm not sure that UDP sockets
get autotuned. They are what they are, </font>
<br><font size=2 face="Courier New">>and what netperf reports will be
what they are. What message size are </font>
<br><font size=2 face="Courier New">>you sending?</font>
<br><font size=2 face="Courier New">></font>
<br><font size=2 face="Courier New">>You should look at per-CPU utilization,
and the udp statistics in </font>
<br><font size=2 face="Courier New">>netstat -s output - particularly
on the receiver. For completeness you </font>
<br><font size=2 face="Courier New">>should also look at the ethtool
-S statistics for the interfaces on </font>
<br><font size=2 face="Courier New">>either side.</font>
<br>
<br><font size=2 face="Courier New">>hangbin: Our TCP_STREAM and UDP_STREAM
test could reach > 9.5G/s on local >lab with 10G switch and NICs.
you can try to enable gro or something else.</font>
<br><font size=2 face="Courier New">>And please paste your command lines
and NIC drivers.</font>
<br>
<br><font size=2 face="Courier New">Ok for the nestat and ethtool stat.
This is a good alternative to CPU utilization provided by Netperf. I will
watch in this direction.</font>
<br><font size=2 face="Courier New">I use packet size from 18 bytes to
8900 with a MTU of 9000. Here are the results of a basic Netperf without
changing packet size:</font>
<br>
<br><font size=2 face="Courier New">netperf –H ip_addr –t UDP_STREAM</font>
<br>
<br><font size=2 face="Courier New">From client to server:</font>
<br><font size=2 face="Courier New">UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0
(0.0.0.0) port 0 AF_INET to 10.0.17.200 (10.0.17.200) port 0 AF_INET</font>
<br><font size=2 face="Courier New">Socket Message Elapsed
Messages
</font>
<br><font size=2 face="Courier New">Size Size
Time Okay Errors Throughput</font>
<br><font size=2 face="Courier New">bytes bytes secs
# #
10^6bits/sec</font>
<br>
<br><font size=2 face="Courier New">112640 65507 10.00
82227 0 4309.12</font>
<br><font size=2 face="Courier New">108544
10.00 40416
2118.01</font>
<br>
<br><font size=2 face="Courier New">From server to client:</font>
<br><font size=2 face="Courier New">UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0
(0.0.0.0) port 0 AF_INET to 10.0.17.11 (10.0.17.200) port 0 AF_INET</font>
<br><font size=2 face="Courier New">Socket Message Elapsed
Messages
</font>
<br><font size=2 face="Courier New">Size Size
Time Okay Errors Throughput</font>
<br><font size=2 face="Courier New">bytes bytes secs
# #
10^6bits/sec</font>
<br>
<br><font size=2 face="Courier New">108544 65507 10.00
89012 0 4664.69</font>
<br><font size=2 face="Courier New">112640
10.00 79607
4171.82</font>
<br>
<br><font size=2 face="Courier New">>> In TCP STREAM test, I also
run two tests: a standard TCP STREAM and a</font>
<br><font size=2 face="Courier New">>> standard TCP MAERTS and the
results are very different with a 10x ratio</font>
<br><font size=2 face="Courier New">>> for the TCP MAERTS. How is
it possible?</font>
<br>
<br><font size=2 face="Courier New">>rick: In addition to repeating
the things to check from above, Please provide the specific command lines
being used.</font>
<br>
<br><font size=2 face="Courier New">Here are the results of a basic Netperf
test:</font>
<br>
<br><font size=2 face="Courier New">Netperf –H ip_addr –t TCP_STREAM
(or –t TCP_MAERTS)</font>
<br>
<br><font size=2 face="Courier New">From client to server</font>
<br><font size=2 face="Courier New">TCP STREAM TEST from 0.0.0.0 (0.0.0.0)
port 0 AF_INET to 10.0.17.200 (10.0.17.200) port 0 AF_INET</font>
<br><font size=2 face="Courier New">Recv Send Send
</font>
<br><font size=2 face="Courier New">Socket Socket Message Elapsed
</font>
<br><font size=2 face="Courier New">Size Size Size
Time Throughput </font>
<br><font size=2 face="Courier New">bytes bytes bytes
secs. 10^6bits/sec </font>
<br>
<br><font size=2 face="Courier New"> 87380 16384 16384
10.19 738.73 </font>
<br>
<br><font size=2 face="Courier New">From server to client:</font>
<br><font size=2 face="Courier New">TCP MAERTS TEST from 0.0.0.0 (0.0.0.0)
port 0 AF_INET to 10.0.17.200 (10.0.17.200) port 0 AF_INET</font>
<br><font size=2 face="Courier New">Recv Send Send
</font>
<br><font size=2 face="Courier New">Socket Socket Message Elapsed
</font>
<br><font size=2 face="Courier New">Size Size Size
Time Throughput </font>
<br><font size=2 face="Courier New">bytes bytes bytes
secs. 10^6bits/sec </font>
<br>
<br><font size=2 face="Courier New"> 87380 16384 16384
10.00 4449.95 </font>
<br>
<br><font size=2 face="Courier New">How could it be faster in TCP than
in UDP... Does my server is so limiting?</font>
<br>
<br><font size=2 face="Courier New">>rick: Based on how I interpret
your question, the TCP/IP stack is fully SMP. </font>
<br><font size=2 face="Courier New">>However... a single "flow"
(eg TCP connection) will not make use of the </font>
<br><font size=2 face="Courier New">>services of more than one or possibly
two CPUs on either end. One </font>
<br><font size=2 face="Courier New">>unless one binds the netperf/netserver
to a CPU other than the one </font>
<br><font size=2 face="Courier New">>taking interrupts from the NIC.</font>
<br>
<br><font size=2 face="Courier New">Ok for this, but I read that it is
better to get the TCP connection and the NIC interrupts on the same CPU
or group of CPU for memory access.</font>
<br><font size=2 face="Courier New">For my server, the interrupts are shared
out between my 8 cores due to architecture considerations.</font>
<br><font size=2 face="Courier New">For my client, the interrupts are located
on a single CPU.</font>
<br><font size=2 face="Courier New">Is it the spinlock which determines
which core processes TCP/IP stack? </font>
<br><font size=2 face="Courier New">A last question concerning TCP/IP stack:
TCP/IP input and TCP/IP output are distinct, could and should they run
in a separate core?</font>
<br>
<br><font size=2 face="Courier New">>happy benchmarking,</font>
<br>
<br><font size=2 face="Courier New">>rick jones</font>
<br>
<br><font size=2 face="Courier New">I hope this is clearer than my first
message.</font>
<br>
<br><font size=2 face="Courier New">Thank you in advance and have a nice
day.</font>
<br>
<br><font size=2 face="Courier New">Simon Duboué</font>
<br>