[netperf-talk] Reported CPU Utilization

Rick Jones rick.jones2 at hp.com
Mon Jan 31 13:23:49 PST 2011


Andrew Gallatin wrote:
> On 01/31/11 12:55, Peter F. Klemperer wrote:
> 
>> Hi All,
>>
>> I've seen some interesting results for CPU utilization when running
>> netperf that I can't explain. For message size 1 I see a high CPU
>> utilization and then a sharp dropoff with increased message size that
>> then increases (see attached graph). I did not expect to see that
>> initial peak and drop. Can anyone help me understand this?
> 
> I'm much more knowledgeable in  *nix than windows, but in general,
> small message sizes are going to have a much higher cost per byte
> than large message sizes.  This is because of fixed, per-message
> costs to do things like perform a system call, lock the socket
> buffer for a connection, copy data from userspace to kernel's socket
> buffer, etc.

I prefer to put it that all packets have the same "per-packet" cost and the 
larger ones simply add additional per-byte costs, but I think we are describing 
two different paths to the same goal :)

So, while that is true, this is a "paced" test (-w and -b global options 
present) - so in theory the pps rate should be the same at least going from the 
message size of 1 to the message size of 1024 and so one would expect the 1024 
byte packet to be more expensive than the 1 byte packet.

Now, above 1024 (4096, 6144, 8192) the PPS rate will be increasing and we sort 
of see a corresponding CPU utilization increase going from 1024 to  the higher 
levels.

I'd be inclined to repeat the five tests, but with the confidence intervals 
included to get some idea of the variability.  I'd toss-in a global -i 30,3 and 
run again.

I might also be inclined to further explore the space between 1 and 1024 byte 
messages.

In some cases, stacks (eg drivers) have felt it "better" to copy small packets 
into already allocated and mapped buffers (outbound, reverse that inbound).  The 
thought was that copying a small packet was cheaper than remapping.  Perhaps 
something along those lines is happening here, and by 1024 one has crossed the 
"copy vs other" threshold.

There may be similar thresholds where Windows decides to copy or not from the 
user's buffer into the stack.

> I'm not sure what profiling options you have with windows
> in a VM, but I encourage you to profile the different
> scenarios if you can.

Speaking of this being a test in a vm...  "In the beginning" I eschewed 
per-process CPU statistics (eg getrusage()) for netperf because so much of the 
"networking processing" could take place in a context that didn't get charged 
reliably to the process - thus netperf measuring overall system CPU utilization.

I don't have much direct experience with virtualization, but in my simple "the 
guest is really just a process in the eyes of the hypervisor" way of thinking, I 
would be similarly leary of CPU utilization reported by guest-centric means. 
How we/I might address that in netperf2 I'm not really sure.  I think it 
requires separating out CPU utilization from the infrastructure and making it a 
test of its own (a la the path taken in netperf4).

happy benchmarking

rick jones

> 
> Drew
> _______________________________________________
> netperf-talk mailing list
> netperf-talk at netperf.org
> http://www.netperf.org/cgi-bin/mailman/listinfo/netperf-talk



More information about the netperf-talk mailing list