[netperf-talk] Reported CPU Utilization
Rick Jones
rick.jones2 at hp.com
Mon Jan 31 13:23:49 PST 2011
Andrew Gallatin wrote:
> On 01/31/11 12:55, Peter F. Klemperer wrote:
>
>> Hi All,
>>
>> I've seen some interesting results for CPU utilization when running
>> netperf that I can't explain. For message size 1 I see a high CPU
>> utilization and then a sharp dropoff with increased message size that
>> then increases (see attached graph). I did not expect to see that
>> initial peak and drop. Can anyone help me understand this?
>
> I'm much more knowledgeable in *nix than windows, but in general,
> small message sizes are going to have a much higher cost per byte
> than large message sizes. This is because of fixed, per-message
> costs to do things like perform a system call, lock the socket
> buffer for a connection, copy data from userspace to kernel's socket
> buffer, etc.
I prefer to put it that all packets have the same "per-packet" cost and the
larger ones simply add additional per-byte costs, but I think we are describing
two different paths to the same goal :)
So, while that is true, this is a "paced" test (-w and -b global options
present) - so in theory the pps rate should be the same at least going from the
message size of 1 to the message size of 1024 and so one would expect the 1024
byte packet to be more expensive than the 1 byte packet.
Now, above 1024 (4096, 6144, 8192) the PPS rate will be increasing and we sort
of see a corresponding CPU utilization increase going from 1024 to the higher
levels.
I'd be inclined to repeat the five tests, but with the confidence intervals
included to get some idea of the variability. I'd toss-in a global -i 30,3 and
run again.
I might also be inclined to further explore the space between 1 and 1024 byte
messages.
In some cases, stacks (eg drivers) have felt it "better" to copy small packets
into already allocated and mapped buffers (outbound, reverse that inbound). The
thought was that copying a small packet was cheaper than remapping. Perhaps
something along those lines is happening here, and by 1024 one has crossed the
"copy vs other" threshold.
There may be similar thresholds where Windows decides to copy or not from the
user's buffer into the stack.
> I'm not sure what profiling options you have with windows
> in a VM, but I encourage you to profile the different
> scenarios if you can.
Speaking of this being a test in a vm... "In the beginning" I eschewed
per-process CPU statistics (eg getrusage()) for netperf because so much of the
"networking processing" could take place in a context that didn't get charged
reliably to the process - thus netperf measuring overall system CPU utilization.
I don't have much direct experience with virtualization, but in my simple "the
guest is really just a process in the eyes of the hypervisor" way of thinking, I
would be similarly leary of CPU utilization reported by guest-centric means.
How we/I might address that in netperf2 I'm not really sure. I think it
requires separating out CPU utilization from the infrastructure and making it a
test of its own (a la the path taken in netperf4).
happy benchmarking
rick jones
>
> Drew
> _______________________________________________
> netperf-talk mailing list
> netperf-talk at netperf.org
> http://www.netperf.org/cgi-bin/mailman/listinfo/netperf-talk
More information about the netperf-talk
mailing list