[netperf-talk] netperf2.4.4 on linux, socket size issue ?

Andrew Gallatin gallatin at cs.duke.edu
Tue Apr 22 13:44:23 PDT 2008


mark wagner writes:
 > Hi Drew
 > 
 > Thanks for the quick response. I'm actually working on a paper on 10G 
 > tuning so things are at the defaults now so I can show the effects of 
 > the changes.  More comments in-line.
 > 
 > Andrew Gallatin wrote:
 > > mark wagner writes:
 > >  > above.  So, is the drop in performance because I'm really only using an 
 > >  > 8K buffer even though its getting reported as 16K or is something else 
 > >  > going on here that I'm oblivious to?
 > >
 > > The doubling is a red herring.  By default, linux will auto-tune the
 > > socket buffer sizes.  If the application sets the socket buffer size,
 > > this auto-tuning is disabled.   So in your -- -s 8K example, you're
 > > disabling the auto-tuning, and throttling transmit.
 > >
 > >   
 > So even though netperf reports the message and buffer sizes to be the 
 > same in both runs, should I assume that those are just incorrect and 
 > have no bearing on what is really going on?

Yes, as Rick has said.

 > Please keep in mind that I need to be able to present these results and 
 > compare / contrast different values. So maybe its best for me to 
 > explicitly specify the sizes on every run and report what I said to use 
 > and the throughput rather than use what netperf said the buffers were ?

Using the omni version may help, as Rick suggested.

 > > BTW, there are some other settings you should tweak for good 10GbE
 > > performance (like increasing the default socket buffer limits).  
 > >
 > > Try adding the following lines to
 > > /etc/sysctl.conf and execute the command "sysctl -p /etc/sysctl.conf".
 > >
 > > net.core.rmem_max = 16777216
 > > net.core.wmem_max = 16777216
 > > net.ipv4.tcp_rmem = 4096 87380 16777216
 > > net.ipv4.tcp_wmem = 4096 65536 16777216
 > > net.core.netdev_max_backlog = 250000
 > >
 > > What NIC are you using?  4Gb/s is very low for 10GbE.
 > >   
 > While I shouldn't specify the vendors at this point, with the default 
 > packet / socket sizes, an MTU of 1500 and playing with the affinity I 
 > can get this cardset over 7G before I run out of CPU. Higher with more 
 > tricks and netperf instances....

If this is an Intel Core2 machine (eg, a modern Xeon), the Linux
kernel prior to 2.6.19 will choose a suboptimal memory copy routine.
This will limit transmit bandwidth to something less than 8Gb/s.
To work around this, you can

1) Use sendfile (netperf -tTCP_SENDFILE -F /boot/vmlinuz).  This
uses the sendfile system call (used by web, ftp and smb servers) to
eliminate copies on the transmit side.

2) Upgrade to a 2.6.19 or newer kernel. 

3) Convince RedHat to apply the fix to their 2.6.18 series.
It is a 1 line fix, and does not break any ABIs:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=27fbe5b28a2ffef171c6005f304ea4f80fcdcc01

Since you work at RedHat, I'm hoping you might have some standing to
get this patch folded into their 2.6.18 kernels. 

Also, if this is a Core2 Xeon, you should disable C1E ("enhanced"
idle).  

Last, if the vendor's NIC supports it (Intel, Myricom, and maybe
Chelsio AFAIK), you should enable DCA. 

Cheers,

Drew


More information about the netperf-talk mailing list