[netperf-talk] global question concerning Netperf test and SMP support

Simon Duboue Simon.Duboue at ces.ch
Wed Apr 18 01:17:00 PDT 2012


>I'm a bit surprised that 128 byte send sizes is faster than 16K, but I 
>suppose if the system is small enough, or has a small enough page size 
>(?) there may be some issues with ease of buffer allocation.

Here are the info concerning both sides meminfo:
For the server (8 cores processor)
/proc# cat meminfo
MemTotal:        3762812 kB
MemFree:         3576576 kB
Buffers:             896 kB
Cached:             8108 kB
SwapCached:            0 kB
Active:             4832 kB
Inactive:           5068 kB
Active(anon):       1304 kB
Inactive(anon):        8 kB
Active(file):       3528 kB
Inactive(file):     5060 kB
Unevictable:           0 kB
Mlocked:               0 kB
HighTotal:       3145724 kB
HighFree:        2985268 kB
LowTotal:         617088 kB
LowFree:          591308 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:                 0 kB
Writeback:             0 kB
AnonPages:          1092 kB
Mapped:             1300 kB
Shmem:               124 kB
Slab:              17112 kB
SReclaimable:       2124 kB
SUnreclaim:        14988 kB
KernelStack:         560 kB
PageTables:          304 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     1881404 kB
Committed_AS:       3548 kB
VmallocTotal:     241504 kB
VmallocUsed:      135464 kB
VmallocChunk:     105308 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       4096 kB

For the client (Xeon dual core):
cat /proc/meminfo 
MemTotal:        1897676 kB
MemFree:         1449996 kB
Buffers:           16768 kB
Cached:           247420 kB
SwapCached:            0 kB
Active:           150448 kB
Inactive:         217108 kB
Active(anon):     105576 kB
Inactive(anon):       32 kB
Active(file):      44872 kB
Inactive(file):   217076 kB
Unevictable:           0 kB
Mlocked:               0 kB
HighTotal:       1178632 kB
HighFree:         782452 kB
LowTotal:         719044 kB
LowFree:          667544 kB
SwapTotal:       2104472 kB
SwapFree:        2104472 kB
Dirty:               136 kB
Writeback:             0 kB
AnonPages:        103412 kB
Mapped:            51612 kB
Shmem:              2240 kB
Slab:              21148 kB
SReclaimable:      12400 kB
SUnreclaim:         8748 kB
KernelStack:        1592 kB
PageTables:         2420 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     3053308 kB
Committed_AS:     337596 kB
VmallocTotal:     122880 kB
VmallocUsed:       30804 kB
VmallocChunk:      83100 kB
HardwareCorrupted:     0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:       10232 kB
DirectMap2M:      890880 kB

>I might have guessed something to do with processor data cache 
>residency, but presumably the size of the send ring will be, by default, 
>one more buffer than fits in the SO_SNDBUF size when the data socket is 
>created - I suppose if the cache is < 32KB the 128 B send case would 
>have the buffer ring fit and 16KB sends would not.  Testing that would 
>call for HW counter information from the processor.  Going back through 
>the string you mentioned something with 32KB L1 data cache and a 128 KB 
>L2 cache - is that on both sides or just one?

For the P4080, I have 32/32kB instruction/data L1 caches and a 128 kB L2 
cache per core (8).
For the Intel Xeon Dual, I have 16/16kB instruction/data L1 caches per 
core (2) and a 6144 kB L2 cache.

>Some experimentation with different socket buffer sizes, explicitly set 
>with test-specific -s and -S might be a good thing.

I will try going this way.

>That the remote CPU utilization is coming back as a negative value is 
>quite troubling and requires further investigation.
>
>I wouldn't expect redirecting netperf output to cause a big performance 
>change, certainly not one that would vary with the parms to netperf - 
>the quantity of output will be the same whether -m is 128 or 16K or 
>anything else.
>
>happy benchmarking,
>
>rick jones

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.netperf.org/pipermail/netperf-talk/attachments/20120418/6535c69c/attachment.html>


More information about the netperf-talk mailing list