[netperf-talk] global question concerning Netperf test and SMP support
Simon Duboue
Simon.Duboue at ces.ch
Wed Apr 18 01:17:00 PDT 2012
>I'm a bit surprised that 128 byte send sizes is faster than 16K, but I
>suppose if the system is small enough, or has a small enough page size
>(?) there may be some issues with ease of buffer allocation.
Here are the info concerning both sides meminfo:
For the server (8 cores processor)
/proc# cat meminfo
MemTotal: 3762812 kB
MemFree: 3576576 kB
Buffers: 896 kB
Cached: 8108 kB
SwapCached: 0 kB
Active: 4832 kB
Inactive: 5068 kB
Active(anon): 1304 kB
Inactive(anon): 8 kB
Active(file): 3528 kB
Inactive(file): 5060 kB
Unevictable: 0 kB
Mlocked: 0 kB
HighTotal: 3145724 kB
HighFree: 2985268 kB
LowTotal: 617088 kB
LowFree: 591308 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 0 kB
Writeback: 0 kB
AnonPages: 1092 kB
Mapped: 1300 kB
Shmem: 124 kB
Slab: 17112 kB
SReclaimable: 2124 kB
SUnreclaim: 14988 kB
KernelStack: 560 kB
PageTables: 304 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 1881404 kB
Committed_AS: 3548 kB
VmallocTotal: 241504 kB
VmallocUsed: 135464 kB
VmallocChunk: 105308 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 4096 kB
For the client (Xeon dual core):
cat /proc/meminfo
MemTotal: 1897676 kB
MemFree: 1449996 kB
Buffers: 16768 kB
Cached: 247420 kB
SwapCached: 0 kB
Active: 150448 kB
Inactive: 217108 kB
Active(anon): 105576 kB
Inactive(anon): 32 kB
Active(file): 44872 kB
Inactive(file): 217076 kB
Unevictable: 0 kB
Mlocked: 0 kB
HighTotal: 1178632 kB
HighFree: 782452 kB
LowTotal: 719044 kB
LowFree: 667544 kB
SwapTotal: 2104472 kB
SwapFree: 2104472 kB
Dirty: 136 kB
Writeback: 0 kB
AnonPages: 103412 kB
Mapped: 51612 kB
Shmem: 2240 kB
Slab: 21148 kB
SReclaimable: 12400 kB
SUnreclaim: 8748 kB
KernelStack: 1592 kB
PageTables: 2420 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 3053308 kB
Committed_AS: 337596 kB
VmallocTotal: 122880 kB
VmallocUsed: 30804 kB
VmallocChunk: 83100 kB
HardwareCorrupted: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 10232 kB
DirectMap2M: 890880 kB
>I might have guessed something to do with processor data cache
>residency, but presumably the size of the send ring will be, by default,
>one more buffer than fits in the SO_SNDBUF size when the data socket is
>created - I suppose if the cache is < 32KB the 128 B send case would
>have the buffer ring fit and 16KB sends would not. Testing that would
>call for HW counter information from the processor. Going back through
>the string you mentioned something with 32KB L1 data cache and a 128 KB
>L2 cache - is that on both sides or just one?
For the P4080, I have 32/32kB instruction/data L1 caches and a 128 kB L2
cache per core (8).
For the Intel Xeon Dual, I have 16/16kB instruction/data L1 caches per
core (2) and a 6144 kB L2 cache.
>Some experimentation with different socket buffer sizes, explicitly set
>with test-specific -s and -S might be a good thing.
I will try going this way.
>That the remote CPU utilization is coming back as a negative value is
>quite troubling and requires further investigation.
>
>I wouldn't expect redirecting netperf output to cause a big performance
>change, certainly not one that would vary with the parms to netperf -
>the quantity of output will be the same whether -m is 128 or 16K or
>anything else.
>
>happy benchmarking,
>
>rick jones
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.netperf.org/pipermail/netperf-talk/attachments/20120418/6535c69c/attachment.html>
More information about the netperf-talk
mailing list