<tt><font size=2>>I'm a bit surprised that 128 byte send sizes is faster
than 16K, but I <br>
>suppose if the system is small enough, or has a small enough page size
<br>
>(?) there may be some issues with ease of buffer allocation.<br>
</font></tt>
<br><tt><font size=2>Here are the info concerning both sides meminfo:</font></tt>
<br><tt><font size=2>For the server (8 cores processor)</font></tt>
<br><tt><font size=2>/proc# cat meminfo</font></tt>
<br><tt><font size=2>MemTotal: 3762812 kB</font></tt>
<br><tt><font size=2>MemFree: 3576576 kB</font></tt>
<br><tt><font size=2>Buffers:
896 kB</font></tt>
<br><tt><font size=2>Cached:
8108 kB</font></tt>
<br><tt><font size=2>SwapCached: 0
kB</font></tt>
<br><tt><font size=2>Active:
4832 kB</font></tt>
<br><tt><font size=2>Inactive: 5068
kB</font></tt>
<br><tt><font size=2>Active(anon): 1304 kB</font></tt>
<br><tt><font size=2>Inactive(anon): 8 kB</font></tt>
<br><tt><font size=2>Active(file): 3528 kB</font></tt>
<br><tt><font size=2>Inactive(file): 5060 kB</font></tt>
<br><tt><font size=2>Unevictable: 0
kB</font></tt>
<br><tt><font size=2>Mlocked:
0 kB</font></tt>
<br><tt><font size=2>HighTotal: 3145724 kB</font></tt>
<br><tt><font size=2>HighFree: 2985268 kB</font></tt>
<br><tt><font size=2>LowTotal: 617088 kB</font></tt>
<br><tt><font size=2>LowFree: 591308
kB</font></tt>
<br><tt><font size=2>SwapTotal:
0 kB</font></tt>
<br><tt><font size=2>SwapFree:
0 kB</font></tt>
<br><tt><font size=2>Dirty:
0 kB</font></tt>
<br><tt><font size=2>Writeback:
0 kB</font></tt>
<br><tt><font size=2>AnonPages: 1092
kB</font></tt>
<br><tt><font size=2>Mapped:
1300 kB</font></tt>
<br><tt><font size=2>Shmem:
124 kB</font></tt>
<br><tt><font size=2>Slab: 17112
kB</font></tt>
<br><tt><font size=2>SReclaimable: 2124 kB</font></tt>
<br><tt><font size=2>SUnreclaim: 14988 kB</font></tt>
<br><tt><font size=2>KernelStack: 560 kB</font></tt>
<br><tt><font size=2>PageTables: 304
kB</font></tt>
<br><tt><font size=2>NFS_Unstable: 0
kB</font></tt>
<br><tt><font size=2>Bounce:
0 kB</font></tt>
<br><tt><font size=2>WritebackTmp: 0
kB</font></tt>
<br><tt><font size=2>CommitLimit: 1881404 kB</font></tt>
<br><tt><font size=2>Committed_AS: 3548 kB</font></tt>
<br><tt><font size=2>VmallocTotal: 241504 kB</font></tt>
<br><tt><font size=2>VmallocUsed: 135464 kB</font></tt>
<br><tt><font size=2>VmallocChunk: 105308 kB</font></tt>
<br><tt><font size=2>HugePages_Total: 0</font></tt>
<br><tt><font size=2>HugePages_Free: 0</font></tt>
<br><tt><font size=2>HugePages_Rsvd: 0</font></tt>
<br><tt><font size=2>HugePages_Surp: 0</font></tt>
<br><tt><font size=2>Hugepagesize: 4096 kB</font></tt>
<br>
<br><tt><font size=2>For the client (Xeon dual core):</font></tt>
<br><font size=2 face="Courier New">cat /proc/meminfo </font>
<br><font size=2 face="Courier New">MemTotal: 1897676
kB</font>
<br><font size=2 face="Courier New">MemFree:
1449996 kB</font>
<br><font size=2 face="Courier New">Buffers:
16768 kB</font>
<br><font size=2 face="Courier New">Cached:
247420 kB</font>
<br><font size=2 face="Courier New">SwapCached:
0 kB</font>
<br><font size=2 face="Courier New">Active:
150448 kB</font>
<br><font size=2 face="Courier New">Inactive:
217108 kB</font>
<br><font size=2 face="Courier New">Active(anon): 105576
kB</font>
<br><font size=2 face="Courier New">Inactive(anon):
32 kB</font>
<br><font size=2 face="Courier New">Active(file): 44872
kB</font>
<br><font size=2 face="Courier New">Inactive(file): 217076 kB</font>
<br><font size=2 face="Courier New">Unevictable:
0 kB</font>
<br><font size=2 face="Courier New">Mlocked:
0 kB</font>
<br><font size=2 face="Courier New">HighTotal: 1178632
kB</font>
<br><font size=2 face="Courier New">HighFree:
782452 kB</font>
<br><font size=2 face="Courier New">LowTotal:
719044 kB</font>
<br><font size=2 face="Courier New">LowFree:
667544 kB</font>
<br><font size=2 face="Courier New">SwapTotal: 2104472
kB</font>
<br><font size=2 face="Courier New">SwapFree: 2104472
kB</font>
<br><font size=2 face="Courier New">Dirty:
136 kB</font>
<br><font size=2 face="Courier New">Writeback:
0 kB</font>
<br><font size=2 face="Courier New">AnonPages: 103412
kB</font>
<br><font size=2 face="Courier New">Mapped:
51612 kB</font>
<br><font size=2 face="Courier New">Shmem:
2240 kB</font>
<br><font size=2 face="Courier New">Slab:
21148 kB</font>
<br><font size=2 face="Courier New">SReclaimable: 12400
kB</font>
<br><font size=2 face="Courier New">SUnreclaim:
8748 kB</font>
<br><font size=2 face="Courier New">KernelStack: 1592
kB</font>
<br><font size=2 face="Courier New">PageTables:
2420 kB</font>
<br><font size=2 face="Courier New">NFS_Unstable:
0 kB</font>
<br><font size=2 face="Courier New">Bounce:
0 kB</font>
<br><font size=2 face="Courier New">WritebackTmp:
0 kB</font>
<br><font size=2 face="Courier New">CommitLimit: 3053308
kB</font>
<br><font size=2 face="Courier New">Committed_AS: 337596
kB</font>
<br><font size=2 face="Courier New">VmallocTotal: 122880
kB</font>
<br><font size=2 face="Courier New">VmallocUsed: 30804
kB</font>
<br><font size=2 face="Courier New">VmallocChunk: 83100
kB</font>
<br><font size=2 face="Courier New">HardwareCorrupted: 0
kB</font>
<br><font size=2 face="Courier New">HugePages_Total:
0</font>
<br><font size=2 face="Courier New">HugePages_Free:
0</font>
<br><font size=2 face="Courier New">HugePages_Rsvd:
0</font>
<br><font size=2 face="Courier New">HugePages_Surp:
0</font>
<br><font size=2 face="Courier New">Hugepagesize:
2048 kB</font>
<br><font size=2 face="Courier New">DirectMap4k: 10232
kB</font>
<br><font size=2 face="Courier New">DirectMap2M: 890880
kB</font>
<br><tt><font size=2><br>
>I might have guessed something to do with processor data cache <br>
>residency, but presumably the size of the send ring will be, by default,
<br>
>one more buffer than fits in the SO_SNDBUF size when the data socket
is <br>
>created - I suppose if the cache is < 32KB the 128 B send case would
<br>
>have the buffer ring fit and 16KB sends would not. Testing that
would <br>
>call for HW counter information from the processor. Going back
through <br>
>the string you mentioned something with 32KB L1 data cache and a 128
KB <br>
>L2 cache - is that on both sides or just one?<br>
</font></tt>
<br><font size=2 face="sans-serif">For the P4080, I have 32/32kB instruction/data
L1 caches and a 128 kB L2 cache per core (8).</font>
<br><font size=2 face="sans-serif">For the Intel Xeon Dual, I have 16/16kB
instruction/data L1 caches per core (2) and a 6144 kB L2 cache.</font>
<br><tt><font size=2><br>
>Some experimentation with different socket buffer sizes, explicitly
set <br>
>with test-specific -s and -S might be a good thing.<br>
</font></tt>
<br><font size=2 face="sans-serif">I will try going this way.</font>
<br><tt><font size=2><br>
>That the remote CPU utilization is coming back as a negative value
is <br>
>quite troubling and requires further investigation.<br>
><br>
>I wouldn't expect redirecting netperf output to cause a big performance
<br>
>change, certainly not one that would vary with the parms to netperf
- <br>
>the quantity of output will be the same whether -m is 128 or 16K or
<br>
>anything else.<br>
><br>
>happy benchmarking,<br>
><br>
>rick jones<br>
</font></tt>
<br>