[netperf-talk] negative CPU utilization

Rick Jones rick.jones2 at hp.com
Wed May 27 09:39:47 PDT 2009


Andrew Gallatin wrote:
> Rick Jones wrote:
> 
>  > I wonder if the "tick rate" of what gets reported in /proc/cpuinfo is no
>  > longer tied to:
>  >
>  >   return sysconf (_SC_CLK_TCK);
> 
> 
> 
> I think the tick rate might be vary, depending on CPUs.  If everything
> is idle, and I look at the changes to the /proc/stat idle counts, then
> it appears that the rate should be 200.  If I hardcode it to 200, then
> run a netperf with the IRQ and netperf bound to CPU0, netperf (and my
> subtraction of idle counts) shows the load at ~40%.  Eg:
> 
> proc_stat_buf 'cpu  85594 2455 186305 5051551404 327558 557841 701228 
> 14897 0
> cpu0 49773 632 93387 848141616 7376 298781 380909 4011 0
> cpu1 16123 503 53120 1565456086 317817 37764 40600 5660 0
> cpu2 8407 750 17645 1142567937 1048 158141 214179 2633 0
> cpu3 11289 569 22151 1495385763 1316 63153 65538 2592 0
> 
> <...>
> 
> proc_stat_buf 'cpu  85618 2455 187131 5051556006 327559 557928 701292 
> 14899 0
> cpu0 49797 632 94212 848141616 7376 298868 380973 4012 0
> cpu1 16123 503 53120 1565457084 317818 37764 40601 5660 0
> cpu2 8407 750 17646 1142569647 1048 158141 214179 2633 0
> cpu3 11289 569 22151 1495387657 1316 63153 65538 2592 0
> <...>
> lib_local_maxrate = 200.000000
> calc_cpu_util: actual_rate on processor 0 is 0.099998 start 328d9d30 end 
> 328d9d31 util 99.950005 cf 1.000000
> calc_cpu_util: actual_rate on processor 1 is 99.798332 start 5d4ef6d6 
> end 5d4efabc util 50.100834 cf 1.000000
> calc_cpu_util: actual_rate on processor 2 is 170.997147 start 441a3401 
> end 441a3aaf util 14.501427 cf 1.000000
> calc_cpu_util: actual_rate on processor 3 is 189.396835 start 5921c6a3 
> end 5921ce09 util 5.301582 cf 1.000000
> <...>
> 2097152    8972   10.00     1343019      0     9639.5     42.46    1.443
> 2097152           10.00     1343019            9639.5     0.71     0.024
> 
> Thats still wrong.  According to mpstat, CPU0 is the only CPU with any load
> (and its at 100%).  So I guess maybe the different CPUs are changing
> their clock rate based on load, and that's causing chaos.

I admire your faith in mpstat :)

> I've really never understood your whole CPU utilization calculation's
> dependency on tick rate.. 

That is shrouded in the mists of netperf history :)  Long ago and far away - in 
1992 let us say - when netperf was still an HP-internal tool for HP-UX, there was 
a conditional compilation option in the kernel for an "idle counter."  This was a 
counter that incremented while the kernel was in its idle loop.  CPU utilization 
was calculated by comparing the rate at which the idle counter incremented when 
the system was ostensibly idle to the rate at which it incremented while the test 
was running.  Thus was born the method based on comparing tick rates :)

That it ended-up being applied to platforms/situations which did not require that 
method is probably simply a matter of cut-and-paste development.

> Can I suggest something simpler: Just sum
> up the all the CPU states, and express the per-cpu utilization as:
> 
>    per_cpu_util = 100 - (100 * (idle_ticks / all_ticks))
> 
> I just made a patch which does that, and I'm seeing the correct
> utilization (as confirmed with mpstat, and with what I "know" is
> happening on the systems) on a whole smattering of formerly
> problematic systems (ppc64, x86_64, kernels from 2.6.27 - up to
> net-next).  I've attached my patch.

I'll give it a look and likely apply it.  If the tick rate varies on the CPUs, is 
even that correct though? (Matching mpstat notwithstanding :)  How does it 
compare say with using the looper method in netperf on your platform?  For an 
example of why I am skeptical about correctness being defined as matching a 
system tool output just look at some of the commentary in the netcpu_kstat code.

> 
> I have a question about the original code..  you handle wrapping in a
> way which I think is incorrect:
> 
> -    actual_rate = (lib_end_count[i] > lib_start_count[i]) ?
> -      (float)(lib_end_count[i] - lib_start_count[i])/lib_elapsed :
> -      (float)(lib_end_count[i] - lib_start_count[i] +
> -             MAXLONG)/ lib_elapsed;
> 
> 1) these times are uint64_t, so MAXLONG is going to be 1^31-1
> while the max uint64_t size (wrap point) is 1^64-1.
> 2) these things are unsigned, so wrapping doesn't even matter.
> 
> I preserved this behavior in my "tick_subtract" function.  If
> you agree with me that it is unneeded, we can remove that
> function and just do simple subtraction.

Given that code likely got cut-and-pasted from places where counters were 
initially 32 bits, you are likely quite correct.

> 
> Please let me know if you think my approach is viable.  If so,
> I'll do wider testing (back to older systems with 8 field
> proc stats, uniprocessor systems, etc).
> 

Go for it :)


> Thanks,
> 
> Drew
> 
> PS: Here's full output from the same pair of systems in the
> above, but with my utilization method.
> ./src/netperf -Hdell2950b-m -tUDP_STREAM -T1,0 -C -c -d -- -m 8972 -S 1M 
> -s 1M
> scan_sockets_args called with the following argument vector
> ./src/netperf -Hdell2950b-m -tUDP_STREAM -T1 -C -c -d -- -m 8972 -S 1M 
> -s 1M

That's odd - where is the ",0" for the -T option in the scan_sockets_args output?

> Program name: ./src/netperf
> Local send alignment: 8
> Local recv alignment: 8
> Remote send alignment: 8
> Remote recv alignment: 8
> Report local CPU 1
> Report remote CPU 1
> Verbosity: 1
> Debug: 1
> Port: 12865
> Test name: UDP_STREAM
> Test bytes: 0 Test time: 10 Test trans: 0
> Host name: dell2950b-m
> 
> installing catcher for all signals
> Could not install signal catcher for sig 32, errno 22
> Could not install signal catcher for sig 33, errno 22
> Could not install signal catcher for sig 65, errno 22
> remotehost is dell2950b-m and port 12865
> establish_control called with host 'dell2950b-m' port '12865' remfam 
> AF_UNSPEC
>                 local '0.0.0.0' port '0' locfam AF_UNSPEC
> getaddrinfo returned the following for host 'dell2950b-m' port '12865' 
> family AF_UNSPEC
>         cannonical name: 'dell2950b-m.sw.myri.com'
>         flags: 2 family: AF_INET: socktype: SOCK_STREAM protocol 
> IPPROTO_TCP addrlen 16
>         sa_family: AF_INET sadata: 50 65 10 0 130 51
> getaddrinfo returned the following for host '0.0.0.0' port '0' family 
> AF_UNSPEC
>         cannonical name: '0.0.0.0'
>         flags: 3 family: AF_INET: socktype: SOCK_STREAM protocol 
> IPPROTO_TCP addrlen 16
>         sa_family: AF_INET sadata: 0 0 0 0 0 0
> bound control socket to 0.0.0.0 and 0
> successful connection to remote netserver at dell2950b-m and 12865
> complete_addrinfo using hostname dell2950b-m port 0 family AF_UNSPEC 
> type SOCK_DGRAM prot IPPROTO_UDP flags 0x0
> getaddrinfo returned the following for host 'dell2950b-m' port '0' 
> family AF_UNSPEC
>         cannonical name: 'dell2950b-m.sw.myri.com'
>         flags: 2 family: AF_INET: socktype: SOCK_DGRAM protocol 
> IPPROTO_UDP addrlen 16
>         sa_family: AF_INET sadata: 0 0 10 0 130 51
> local_data_address not set, using local_host_name of '0.0.0.0'
> complete_addrinfo using hostname 0.0.0.0 port 0 family AF_UNSPEC type 
> SOCK_DGRAM prot IPPROTO_UDP flags 0x1
> getaddrinfo returned the following for host '0.0.0.0' port '0' family 
> AF_UNSPEC
>         cannonical name: '0.0.0.0'
>         flags: 3 family: AF_INET: socktype: SOCK_DGRAM protocol 
> IPPROTO_UDP addrlen 16
>         sa_family: AF_INET sadata: 0 0 0 0 0 0
> UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
> dell2950b-m.sw.myri.com (10.0.130.51) port 0 AF_INET : cpu bind
> create_data_socket: socket 4 obtained...
> netperf: get_sock_buffer: send socket size determined to be 2097152
> netperf: get_sock_buffer: receive socket size determined to be 2097152
> calibrate: lib_num_loc_cpus 4 lines 5 CPU_LINE_LENGTH 108 
> proc_stat_buflen 540
> recv_response: received a 256 byte response
> send_udp_stream: remote data connection done.
> About to start a timer for 10 seconds.
> cpu_util_init enter, proc_stat_fd 5 proc_stat_buf 0x10096d60
> proc_stat_buf 'cpu  85904 2456 192194 5056775225 327974 558430 701669 
> 15179 0
> cpu0 50002 632 97535 849439433 7436 299350 381332 4018 0
> cpu1 16189 503 54819 1566754531 318165 37773 40609 5930 0
> cpu2 8415 750 17664 1143884630 1048 158147 214183 2635 0
> cpu3 11297 569 22175 1496696629 1323 63158 65543 2594 0
> intr 70972758 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 119 0 2242184 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 24807236 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 17338275 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0'
> res[50002] is 632 97535 849439433 7436 299350 381332 4018 0 269053280
> res[16189] is 503 54819 1566754531 318165 37773 40609 5930 0 269053280
> res[8415] is 750 17664 1143884630 1048 158147 214183 2635 0 269053280
> res[11297] is 569 22175 1496696629 1323 63158 65543 2594 0 269053280
> proc_stat_buf 'cpu  85926 2456 193013 5056780270 327976 558494 701707 
> 15317 0
> cpu0 50002 632 97535 849440841 7436 299414 381370 4018 0
> cpu1 16210 503 55637 1566754595 318165 37774 40610 6068 0
> cpu2 8415 750 17664 1143886370 1048 158147 214183 2635 0
> cpu3 11297 569 22175 1496698462 1325 63158 65543 2594 0
> intr 71105574 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 119 0 2242532 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 24939676 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 17338275 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0'
> res[50002] is 632 97535 849440841 7436 299414 381370 4018 0 269053280
> res[16210] is 503 55637 1566754595 318165 37774 40610 6068 0 269053280
> res[8415] is 750 17664 1143886370 1048 158147 214183 2635 0 269053280
> res[11297] is 569 22175 1496698462 1325 63158 65543 2594 0 269053280
> recv_response: received a 256 byte response
> send_udp_stream: remote results obtained
> lib_local_maxrate = 100.000000
> calc_cpu_util: util on processor 0, diff = 0 0 0 1408 0 64 38 0 0 util 
> 6.754967 cf 1.000000
> calc_cpu_util: util on processor 1, diff = 21 0 818 64 0 1 1 138 0 util 
> 93.863853 cf 1.000000
> calc_cpu_util: util on processor 2, diff = 0 0 0 1740 0 0 0 0 0 util 
> 0.000000 cf 1.000000
> calc_cpu_util: util on processor 3, diff = 0 0 0 1833 2 0 0 0 0 util 
> 0.108994 cf 1.000000
> calc_service_demand called:  units_sent = 12406409824.000000
>                              elapsed_time = 0.000000
>                              cpu_util = 0.000000
>                              num cpu = 0
> calc_service_demand using:   units_sent = 12406409824.000000
>                              elapsed_time = 10.000217
>                              cpu_util = 25.181953
>                              num cpu = 4
> calc_service_demand got:     thruput = 1211537.115985
>                              servdem = 0.831405
> calc_service_demand called:  units_sent = 12406409824.000000
>                              elapsed_time = 0.000000
>                              cpu_util = 13.033341
>                              num cpu = 4
> calc_service_demand using:   units_sent = 12406409824.000000
>                              elapsed_time = 10.000217
>                              cpu_util = 13.033341
>                              num cpu = 4
> calc_service_demand got:     thruput = 1211537.115985
>                              servdem = 0.430308
> calculate_confidence: itr  1; time 10.000217; res  9924.912054
>                                lcpu 25.181953; rcpu 13.033341
>                                lsdm 0.831405; rsdm 0.430308
> Socket  Message  Elapsed      Messages                   CPU      Service
> Size    Size     Time         Okay Errors   Throughput   Util     Demand
> bytes   bytes    secs            #      #   10^6bits/sec % SS     us/KB
> 
> 2097152    8972   10.00     1382792      0     9924.9     25.18    0.831
> 2097152           10.00     1382792            9924.9     13.03    0.430
> 
> shutdown_control: shutdown of control connection requested.
> 
> 
> 
> ------------------------------------------------------------------------
> 
> Index: src/netcpu_procstat.c
> ===================================================================
> --- src/netcpu_procstat.c	(revision 299)
> +++ src/netcpu_procstat.c	(working copy)
> @@ -40,12 +40,29 @@
>     idle. The rate at which this increments during a test is compared
>     with a previous calibrarion to arrive at a CPU utilization
>     percentage. raj 2005-01-26 */
> -static uint64_t  lib_start_count[MAXCPUS];
> -static uint64_t  lib_end_count[MAXCPUS];
>  
> +#define IDLE_IDX 4
> +#define CPU_STATES 9
>  
> +typedef struct cpu_states
> +{
> +  uint64_t     	user;
> +  uint64_t     	nice;
> +  uint64_t     	sys;
> +  uint64_t     	idle;
> +  uint64_t     	iowait;
> +  uint64_t     	hard_irq;
> +  uint64_t     	soft_irq;
> +  uint64_t     	steal;
> +  uint64_t     	guest;  
> +} cpu_states_t;
> +
> +static cpu_states_t  lib_start_count[MAXCPUS];
> +static cpu_states_t  lib_end_count[MAXCPUS];
> +
> +
>  /* The max. length of one line of /proc/stat cpu output */
> -#define CPU_LINE_LENGTH ((8 * sizeof (long) / 3 + 1) * 4 + 8)
> +#define CPU_LINE_LENGTH ((CPU_STATES * sizeof (long) / 3 + 1) * 4 + 8)
>  #define PROC_STAT_FILE_NAME "/proc/stat"
>  #define N_CPU_LINES(nr) (nr == 1 ? 1 : 1 + nr)
>  
> @@ -141,7 +158,7 @@
>  }
>  
>  void
> -get_cpu_idle (uint64_t *res)
> +get_cpu (cpu_states_t *res)
>  {
>    int space;
>    int i;
> @@ -158,15 +175,30 @@
>    /* Skip first line (total) on SMP */
>    if (n > 1) p = strchr (p, '\n');
>  
> -  /* Idle time is the 4th space-separated token */
>    for (i = 0; i < n; i++) {
> -    for (space = 0; space < 4; space ++) {
> -      p = strchr (p, ' ');
> -      while (*++p == ' ');
> -    };
> -    res[i] = strtoul (p, &p, 10);
> +    memset(&res[i], 0, sizeof (&res[i]));
> +    p = strchr (p, ' ');
> +    sscanf(p, "%llu %llu %llu %llu %llu %llu %llu %llu %llu",
> +	   &res[i].user,
> +	   &res[i].nice,
> +	   &res[i].sys,
> +	   &res[i].idle,
> +	   &res[i].iowait,
> +	   &res[i].hard_irq,
> +	   &res[i].soft_irq,
> +	   &res[i].steal,
> +	   &res[i].guest);
>      if (debug) {
> -      fprintf(where,"res[%d] is %llu\n",i,res[i]);
> +      fprintf(where,"res[%d] is %llu %llu %llu %llu %llu %llu %llu %llu %llu\n",
> +	   res[i].user,
> +	   res[i].nice,
> +	   res[i].sys,
> +	   res[i].idle,
> +	   res[i].iowait,
> +	   res[i].hard_irq,
> +	   res[i].soft_irq,
> +	   res[i].steal,
> +	   res[i].guest);
>        fflush(where);
>      }
>      p = strchr (p, '\n');
> @@ -181,23 +213,35 @@
>  measure_cpu_start()
>  {
>    cpu_method = PROC_STAT;
> -  get_cpu_idle(lib_start_count);
> +  get_cpu(lib_start_count);
>  }
>  
>  /* collect final CPU utilization raw data */
>  void
>  measure_cpu_stop()
>  {
> -  get_cpu_idle(lib_end_count);
> +  get_cpu(lib_end_count);
>  }
>  
> +uint64_t
> +tick_subtract(uint64_t start, uint64_t end)
> +{
> +  uint64_t ret;
> +
> +  if (end >= start)
> +    return (end - start);
> +
> +  return (end - start + MAXLONG);
> +}
> +
>  float
>  calc_cpu_util_internal(float elapsed_time)
>  {
> -  int i;
> +  int i, j;
>  
> -  float actual_rate;
>    float correction_factor;
> +  cpu_states_t diff;
> +  uint64_t total_ticks;
>  
>    lib_local_cpu_util = (float)0.0;
>    /* It is possible that the library measured a time other than */
> @@ -214,6 +258,10 @@
>      correction_factor = (float) 1.0;
>    }
>  
> +  if (debug) {
> +    fprintf(where,
> +	    "lib_local_maxrate = %f\n", lib_local_maxrate);
> +  }
>    for (i = 0; i < lib_num_loc_cpus; i++) {
>  
>      /* it would appear that on some systems, in loopback, nice is
> @@ -222,25 +270,34 @@
>       does not go south. raj 6/95 and if we run completely out of idle,
>       the same thing could in theory happen to the USE_KSTAT path. raj
>       8/2000 */ 
> -    
> -    if (lib_end_count[i] == lib_start_count[i]) {
> -      lib_end_count[i]++;
> -    }
> -    
> -    actual_rate = (lib_end_count[i] > lib_start_count[i]) ?
> -      (float)(lib_end_count[i] - lib_start_count[i])/lib_elapsed :
> -      (float)(lib_end_count[i] - lib_start_count[i] +
> -	      MAXLONG)/ lib_elapsed;
> -    lib_local_per_cpu_util[i] = (lib_local_maxrate - actual_rate) /
> -      lib_local_maxrate * 100;
> +
> +    /* Find the difference in all CPU stat fields */
> +    diff.user = tick_subtract(lib_start_count[i].user, lib_end_count[i].user);
> +    diff.nice = tick_subtract(lib_start_count[i].nice, lib_end_count[i].nice);
> +    diff.sys = tick_subtract(lib_start_count[i].sys, lib_end_count[i].sys);
> +    diff.idle = tick_subtract(lib_start_count[i].idle, lib_end_count[i].idle);
> +    diff.iowait = tick_subtract(lib_start_count[i].iowait, lib_end_count[i].iowait);
> +    diff.hard_irq = tick_subtract(lib_start_count[i].hard_irq, lib_end_count[i].hard_irq);
> +    diff.soft_irq = tick_subtract(lib_start_count[i].soft_irq, lib_end_count[i].soft_irq);
> +    diff.steal = tick_subtract(lib_start_count[i].steal, lib_end_count[i].steal);
> +    diff.guest = tick_subtract(lib_start_count[i].guest, lib_end_count[i].guest);
> +    total_ticks = diff.user + diff.nice + diff.sys + diff.idle + diff.iowait 
> +      + diff.hard_irq + diff.soft_irq + diff.steal + diff.guest;
> +
> +    /* calculate idle time as a percentage of all CPU states */
> +    lib_local_per_cpu_util[i] = 100.0 * ((float) diff.idle / (float) total_ticks);
> +
> +    /* invert percentage to reflect non-idle time */
> +    lib_local_per_cpu_util[i] = 100.0 - lib_local_per_cpu_util[i];
> +
> +    /* apply correction factor */
>      lib_local_per_cpu_util[i] *= correction_factor;
>      if (debug) {
>        fprintf(where,
> -              "calc_cpu_util: actual_rate on processor %d is %f start %llx end %llx util %f cf %f\n",
> +              "calc_cpu_util: util on processor %d, diff = %llu %llu %llu %llu %llu %llu %llu %llu %llu util %f cf %f\n",
>                i,
> -              actual_rate,
> -              lib_start_count[i],
> -              lib_end_count[i],
> +	      diff.user, diff.nice, diff.sys, diff.idle, diff.iowait,
> +	      diff.hard_irq, diff.soft_irq, diff.steal, diff.guest,
>  	      lib_local_per_cpu_util[i],
>  	      correction_factor);
>      }
> @@ -255,12 +312,12 @@
>  void
>  cpu_start_internal(void)
>  {
> -  get_cpu_idle(lib_start_count);
> +  get_cpu(lib_start_count);
>    return;
>  }
>  
>  void
>  cpu_stop_internal(void)
>  {
> -  get_cpu_idle(lib_end_count);
> +  get_cpu(lib_end_count);
>  }



More information about the netperf-talk mailing list