[netperf-talk] netperf port numbers

Rick Jones rick.jones2 at hp.com
Thu Aug 25 13:51:13 PDT 2011


On 08/25/2011 12:06 PM, Vishal Ahuja wrote:
> Hi Rick,
> Thanks! It was indeed the packet size causing the low throughput for
> netperf. When I changed it to 1470 bytes, I get decent throughput.

Good.

> Am observing some interesting phenomenon, but am not able to decipher
> why this could be happening. For all the scenarios below irqbalance
> is off, and there are two netperf flows.
> Scenario 1: only a single core is enabled at the receiver. The
> throughput is very low and this is understandable. I see ksoftirqd/0
> running, which indicates that the core is overwhelmed because of the
> interrupts.
> Scenario2: Now we have two cores (core0 and core1) active on the
> receiver. NIC interrupts have been affinitised to core0, Using the
> netperf client, I specified the core bindings, so that both the
> netserver instances would run on core0. I was expecting that the
> throughput would be the same as scenario1, but it is not so. Using *top,
> *I could see that both the netserver instances were running on core1,
> not core0 (as specified at the time of execution). Can you please help
> me understand why this could be happening?

Looking at the netperf command lines in scenario 2 I see nothing that 
would cause them to be bound to CPU 0.  While it is indeed true that 
Linux will try to launch the process on the same CPU as initiates the 
wakeup, that is not a hard and fast thing, and an otherwise unbound 
process can run elsewhere.  So, seeing the netservers running on core 1 
isn't really that big a surprise.

To force them to run on CPU 0, you should add a suitable -T *global* option.

Yes, you do have a -T option in the command lines, but you have placed 
it in the test-specific portion, which means it will not be parsed as an 
affinity setting command.  You need to move the option to the global 
command line section before the "--" and then it should behave as you 
expect.  Same for Scenario3.

> Scenario3: Is the same as scenario2, except the nic interrupts have been
> affinitised to core1, and the core bindings have been specified so that
> netserver(s) run on core1. What is surprising is that now I see
> ksoftirqd/0 running. According to my understanding, it should have been
> ksoftird/1 should be running?? Sorry this is more of a LINUX question.

Soft interrupts by the stack, unless you have RPS going, target the 
current CPU not another.  So, interrupt on CPU0, do some quantity of 
work, find there is still more to do, launch the soft interrupt to get 
off the hard interrupt context, move-on.  At least that is my 
understanding.  Indeed asking in a more linux-specific forum would be 
the way to confirm that and see what details I got wrong.

> Please find below the command lines (parallellaunch is a python script
> to launch multiple instances of a program in parallel):
> **
> */_Scenario1_/*
> *vishal at vishal-desktop:~/Desktop$*
> <mailto:vishal at vishal-desktop:~/Desktop$>*./parallellaunch.py "netperf
> -H 192.168.0.8 -L 192.168.0.9 -t UDP_STREAM -l 30 -- -m 1470" "netperf
> -H 192.168.0.8 -L 192.168.0.9 -t UDP_STREAM -l 30 -- -m 1470"
> UDP UNIDIRECTIONAL SEND TEST from 192.168.0.9 (192.168.0.9) port 0
> AF_INET to 192.168.0.8 (192.168.0.8) port 0 AF_INET : demo
> UDP UNIDIRECTIONAL SEND TEST from 192.168.0.9 (192.168.0.9) port 0
> AF_INET to 192.168.0.8 (192.168.0.8) port 0 AF_INET : demo
> Socket  Message  Elapsed      Messages
> Size    Size     Time         Okay Errors   Throughput
> bytes   bytes    secs            #      #   10^6bits/sec*
> *112640    1470   30.00     14361451      0    5629.69
> 112640           30.00         432              0.17*
> *Socket  Message  Elapsed      Messages
> Size    Size     Time         Okay Errors   Throughput
> bytes   bytes    secs            #      #   10^6bits/sec*
> *112640    1470   30.00     14257730      0    5589.03
> 112640           30.00         466              0.18
> *
> **
> */_Scenario2_/*
> *vishal at vishal-desktop:~/Desktop$*
> <mailto:vishal at vishal-desktop:~/Desktop$>*./parallellaunch.py "netperf
> -H 192.168.0.8 -L 192.168.0.9 -t UDP_STREAM -l 30 -- -m 1470 -T0,0"
> "netperf -H 192.168.0.8 -L 192.168.0.9 -t UDP_STREAM -l 30 -- -m 1470 -T0,0"
> UDP UNIDIRECTIONAL SEND TEST from 192.168.0.9 (192.168.0.9) port 0
> AF_INET to 192.168.0.8 (192.168.0.8) port 0 AF_INET : demo
> UDP UNIDIRECTIONAL SEND TEST from 192.168.0.9 (192.168.0.9) port 0
> AF_INET to 192.168.0.8 (192.168.0.8) port 0 AF_INET : demo
> Socket  Message  Elapsed      Messages
> Size    Size     Time         Okay Errors   Throughput
> bytes   bytes    secs            #      #   10^6bits/sec*
> *112640    1470   30.00     14434023      0    5658.13
> 112640           30.00     8033032           3148.95*
> *Socket  Message  Elapsed      Messages
> Size    Size     Time         Okay Errors   Throughput
> bytes   bytes    secs            #      #   10^6bits/sec*
> *112640    1470   30.00     14471895      0    5672.98
> 112640           30.00     7985987           3130.51*
> **
> */_Scenario3_/*
> *vishal at vishal-desktop:~/Desktop$*
> <mailto:vishal at vishal-desktop:~/Desktop$>*./parallellaunch.py "netperf
> -H 192.168.0.8 -L 192.168.0.9 -t UDP_STREAM -l 30 -- -m 1470 -T1,1"
> "netperf -H 192.168.0.8 -L 192.168.0.9 -t UDP_STREAM -l *
> *30 -- -m 1470 -T1,1"
> UDP UNIDIRECTIONAL SEND TEST from 192.168.0.9 (192.168.0.9) port 0
> AF_INET to 192.168.0.8 (192.168.0.8) port 0 AF_INET : demo
> UDP UNIDIRECTIONAL SEND TEST from 192.168.0.9 (192.168.0.9) port 0
> AF_INET to 192.168.0.8 (192.168.0.8) port 0 AF_INET : demo
> Socket  Message  Elapsed      Messages
> Size    Size     Time         Okay Errors   Throughput
> bytes   bytes    secs            #      #   10^6bits/sec*
> *112640    1470   30.00     14004780      0    5489.87
> 112640           30.00     8072317           3164.35*
> *Socket  Message  Elapsed      Messages
> Size    Size     Time         Okay Errors   Throughput
> bytes   bytes    secs            #      #   10^6bits/sec*
> *112640    1470   30.00     14057073      0    5510.37
> 112640           30.00     7913977           3102.28
> *
> Thank you for your time,
> Vishal
>
>
> On Wed, Aug 24, 2011 at 4:39 PM, Rick Jones <rick.jones2 at hp.com
> <mailto:rick.jones2 at hp.com>> wrote:
>
>     On 08/24/2011 04:12 PM, Vishal Ahuja wrote:
>
>         Hi Rick,
>         I am running some netperf experiments using UDP and TCP over a
>         10 Gbps
>         link - the machines are connected back to back. Am only running
>         a single
>         netperf client, and on the sender side there are multiple cores
>         enabled.
>         A single TCP flow manages upto 6.5 Gbps, which is fine. When
>         using UDP,
>         the problem is that the throughput on the sender side is around 4.1
>         Gbps, but the throughput on the receive side is 0 Gbps. The same
>         experiment with iperf achieves around 2.35 Gbps on the receive side.
>         Using top, I observed that while the experiment was running,
>         netserver
>         was never scheduled on any of the cpus. Even if I run it with a nice
>         value of -20, it does not get scheduled. Can you please help me to
>         understand why this could be happening. My guess is the all the
>         traffic
>         is being directed to a single core, which gets overwhelmed by the
>         interrupts, due to which, the netserver application never gets a
>         chance.
>         Is that correct? If yes, then why does it not happen for TCP,
>         considering that the RTT in my setup is negligible.
>
>
>     TCP has end-to-end flow control.  The TCP window advertised by the
>     receiver and honored by the sender, and the congestion window
>     calculated by the sender, both work to minimize the times when the
>     sender overwhelms the receiver.
>
>     UDP has no end-to-end flow control, and netperf at least makes no
>     attempt to provide any.  It does though offer a way to throttle
>     netperf if you included --enable-intervals in the ./configure prior
>     to compiling netperf.
>
>     Also, depending on the model of NIC, there can be more offloads for
>     TCP than UDP - such as Large Receive Offload or General Receive
>     Offload and TSO/GSO - which takes advantage of TCP being a byte
>     stream protocol and not needing to preserve message boundaries.
>       Some NICs support UDP Fragmentation Offload, but I do not know if
>     there is a corresponding UDP Reassembly Offload.  If "UFO" is
>     supported on the sender, but no UDP fragment reassembly offload at
>     the receiver, then sending a maximum size UDP datagram that becomes
>     45 or so fragments/packets on the wire/fibre is not much more
>     expensive than sending a 1024 byte one that is only one packet on
>     the wire/fibre, but will have significantly greater overhead on the
>     receiver - what takes one trip down the protocol stack at the sender
>     is sort of 45 trips up the protocol stack at the receiver.
>
>     Even without UDP Fragmentation Offload, I believe that sending a
>     fragmented UDP datagram is cheaper than reassembling it, so there is
>     still a disparity.
>
>     As for iperf vs netperf, they probably default to different send
>     sizes - on my Linux system for example, where the default UDP socket
>     buffer is something like 128KB, netperf will send 65507 byte
>     messages.  I don't know if iperf sends messages that size by
>     default.  And, since you didn't provide any of the command lines :)
>     we don't know if you told them to use the same send size.
>
>     I doubt that the traffic of a single stream (iperf or netperf, UDP
>     or TCP or whatever) would ever be directed to more than one core -
>     things like receive side scaling or receive packet scaling will hash
>     the headers, and if there is just the one flow, there will be just
>     the one queue/core used.  At times I have had some success
>     explicitly affinitising netperf/netserver to a core other than the
>     one taking interrupts from the NIC.  By default in Linux anyway, an
>     attempt is made to run the application on the same CPU as where the
>     wakeup happened.
>
>     happy benchmarking,
>
>     rick jones
>
>



More information about the netperf-talk mailing list