NAME
	  RPM -	Reprogrammable Performance Monitoring (RPM) counters.

     SYNTAX

	  #include <i860paragon/rpm.h>

     DESCRIPTION
	  The /usr/include/i860paragon/rpm.h file specifies the	pro-
	  gramming interface to	the Reprogrammable Performance Moni-
	  toring (RPM) hardware	counters and the Mach kernel software
	  (RPM soft) counters. Both the	hardware and soft counters are
	  memory mapped	into every process. The	counters may be	read
	  but not written. By defining a pointer to the	structures
	  defined in rpm.h and then setting the	pointer	to a specific
	  address, the RPM hardware and	soft counters can be accessed.

	  The RPM hardware counters are	specified by the structure
	  rpm, which contains the following fields (hardware
	  counters):

	  rpm_control
		    32-bit control register used to reset the
		    counters. All counters except rpm_time can be
		    reset. The rpm_time	global clock is	reset via the
		    diagnostic station.	The process must have root
		    access and perform an authentication process to
		    gain access	to a writable page in order to write
		    to the control register.

	  rpm_time  10 Mhz 56-bit global clock accurate	to 100
		    nanoseconds	local to the node and 1	microsecond
		    across all nodes in	a system.

	  rpm_cpu0  Number of 50 Mhz cycles that CPU 0 is bus master.

	  rpm_cpu1  Number of 50 Mhz cycles that CPU 1 is bus master.

	  rpm_ltu   Number of 50 Mhz cycles that ltu is	bus master.

	  rpm_exp   Number of 50 Mhz cycles that expansion card	is bus
		    master.

	  rpm_north One	count for every	64 bytes moving	North on the
		    mesh.

	  rpm_south One	count for every	64 bytes moving	South on the
		    mesh.

	  rpm_east  One	count for every	64 bytes moving	East on	the
		    mesh.

	  rpm_west  One	count for every	64 bytes moving	West on	the
		    mesh.

	  To access the	RPM hardware counters, define a	pointer	to the
	  structure rpm, set the pointer to the	address
	  RPM_BASE_VADDR, and access the counter via the structure.
	  For example:

	    struct rpm *rpm;
	    rpm_timer_t	global_time;
	    rpm	= (struct rpm *) RPM_BASE_VADDR;
	    global_time	= rpm->rpm_time;

	  The rpm->rpm_control register	is used	to reset the RPM
	  counters to zero. A task must	first obtain a writable	page
	  to the RPM counters. Normally	the RPM_BASE_VADDR address
	  specifies a read-only	page mapped into every task. In	order
	  to obtain a writable page, the task must have	root authority
	  (executed by the superuser) and perform an authentication
	  process. The authentication steps are	to open	the NORMA dev-
	  ice rpm0, obtain a pager port	for the	mapped rpm device, and
	  map the pager	port into the task's address space. The	end
	  result is an address to a writable page that is substituted
	  for the RPM_BASE_VADDR address. The RPM counters can then be
	  reset	to zero	by setting the rpm->rpm_control	register to
	  0xFFFF0000.

				     NOTE

	       The SPV data collection daemons reset the RPM counters
	       once every second (see Limitations and Workarounds).

	  The rpm->rpm_time global clock can not be reset by the
	  rpm->rpm_control register. The global	clock is reset at the
	  diagnostic station using the diagnostic commands
	  /u/paragon/diag/gclock and /u/paragon/diag/greset. The RPM
	  global clock is also used by the NX dclock() call which
	  returns a double-precision time interval in seconds since
	  the system was booted.

	  The RPM soft counters	are maintained by the Mach kernel on
	  each CPU configured into the kernel. The RPM soft counters
	  are specified	by the structure rpmsoft, which	contains the
	  following fields:

	  rpms_idle Number of double precision seconds the CPU has
		    been idle.

	  Large	grain trap handler statistics:

	  rpms_alltraps
		    Number of traps.

	  rpms_it   Number of trap instructions.

	  rpms_int  Number of interrupts.

	  rpms_iat  Number of instruction access traps.

	  rpms_dat  Number of data access traps.

	  rpms_ft   Number of floating point traps.

	  Data access trap statistics:

	  rpms_datld
		    Number of data access traps	on ld.x.

	  rpms_datst
		    Number of data access traps	on st.x.

	  rpms_datfldfst
		    Number of data access traps	on fld.x or fst.x.

	  rpms_datpst
		    Number of data access traps	on pst.

	  rpms_datpfld
		    Number of data access traps	on pfld.y.

	  rpms_datauto
		    Number of data access traps	on fld.x++, fst.x++,
		    and	pfld.x++.

	  Data access page fault statistics:

	  rpms_notdirty
		    Number of page faults for a	store to a clean page.

	  rpms_notref
		    Number of page faults for an access	to an unrefer-
		    enced page while locked.

	  rpms_notwr
		    Number of page faults for a	store to a read-only
		    page.

	  rpms_pdenotu
		    Number of page directory entry access violations.

	  rpms_ptenotu
		    Number of page table entry access violations.

	  rpms_pdenotp
		    Number of page directory entry invalid traps.

	  rpms_ptenotp
		    Number of page table entry invalid traps.

	  Locked sequence related trap statistics:

	  rpms_lockseq
		    Number of traps while in a locked sequence.

	  rpms_lockres
		    Number of restarted	locked sequences.

	  rpms_lockexp
		    Number of expired locked sequences.

	  Floating-point exception statistics:

	  rpms_fpe_si
		    Number of floating-point sticky inexact excep-
		    tions.

	  rpms_fpe_se
		    Number of floating-point source exceptions.

	  rpms_fpe_mu
		    Number of floating-point multiplier	underflow
		    exceptions.

	  rpms_fpe_mo
		    Number of floating-point multiplier	overflow
		    exceptions.

	  rpms_fpe_mi
		    Number of floating-point multiplier	inexact	excep-
		    tions.

	  rpms_fpe_ma
		    Number of floating-point multiplier	add-one	excep-
		    tions.

	  rpms_fpe_au
		    Number of floating-point underflow exceptions.

	  rpms_fpe_ao
		    Number of floating-point overflow exceptions.

	  rpms_fpe_ai
		    Number of floating-point inexact exceptions.

	  rpms_fpe_aa
		    Number of floating-point add one exceptions.

	  For every CPU	configured into	the Mach kernel	there is a
	  corresponding	rpmsoft	structure that contains	the statistics
	  for the CPU. At the address RPMSOFT_BASE_VADDR is an array
	  of structures, one for each CPU configured into the Mach
	  kernel. The number of	CPUs configured	into the Mach kernel
	  can be found by using	the host_info()	kernel call. To	access
	  the RPM soft data, increment a pointer through the array of
	  the rpmsoft structures. For example:

	    struct rpmsoft *rpmsoft;
	    rpm_timer_t	idle_sum_time;
	    rpmsoft = (struct rpm *) RPMSOFT_BASE_VADDR;
	    idle_sum_time = 0.0;
	    for	( i = 0; i < num_cpus; ++i ) {
	       idle_sum_time = (rpmsoft	+ i)->rpms_idle;
	    }

	  The RPM soft counters	can not	be written or reset. The
	  counters represent summed statistics since the system	was
	  booted.

	  For a	GP node, the first rpmsoft structure in	the array con-
	  tains	the statistics for the application CPU,	while the
	  second rpmsoft structure in the array	contains the statis-
	  tics for the message co-processor.

     EXAMPLES
	  The following	example	reads the RPM global clock and con-
	  verts	the 56-bit time	to double-precision seconds (the
	  dclock() functionality).

	    #define RPM_CLOCK_FREQ (10000000)
	    #define _2_to_52d (4503599627370496.0)
	    #define OR_EXPONENT	(0x4330)
	    #define MASK_EXPONENT (0x000F)
	    double hz;
	    struct rpm *rpm;
	    union {
	       unsigned	short wordwise[4];
	       double value;
	    } t;
	    rpm	= (struct rpm *) RPM_BASE_VADDR;
	    t.value = rpm->rpm_time;
	    hz = 1.0/RPM_CLOCK_FREQ;
	    t.wordwise[3] = (t.wordwise[3] & MASK_EXPONENT) | OR_EXPONENT;
	    t.value = hz * (t.value - _2_to_52d);

	  This code converts the 56-bit	integer	count into a 64-bit
	  double-precision value representing seconds. The code
	  ignores the highest 4	bits of	the 56-bit counter (a 52-bit
	  counter, counting at 10Mhz can count for 14.28 years before
	  wraparound occurs).

	  Consider the representation of a double. Doubles have	three
	  fields: a sign field,	an exponent field and a	fraction
	  field. The sign field	is a single bit, 0 for positive	and 1
	  for negative.	The exponent field is bits 62 to 52, which can
	  hold integers	from 0 to 2047.	The actual value of the
	  exponent is the value	in the exponent	field minus 1023. The
	  fraction field is bits 51 to 0. The actual value of the
	  fraction is 1.f, where f is the value	of the integer in the
	  fraction field. (For information on the hidden 1, refer to
	  IEEE standard	854 for	radix-independent floating-point
	  arithmetic.)

	  To convert the 52-bit	integer	to floating point, the value
	  of the exponent must be set to 52, and the value 1x2**()52
	  must be subtracted (the hidden 1). To	set the	value of the
	  exponent to 52, the value of the exponent field is set to
	  1023 + 52, or	1075 (0x433 hex). To subtract the hidden 1,
	  the value 4503599627370496.0 (1x2**()52) is subtracted. At
	  this point the 52-bit	number has been	converted to a
	  floating-point representation	of the same number. To convert
	  the floating point representation of the 52-bit (10Mhz)
	  counter to seconds, simply multiply by 10M.

     LIMITATIONS AND WORKAROUNDS
	  The SPV tool uses the	RPM hardware to	collect	mesh and
	  memory bus utilization information. Once every second	the
	  RPM hardware counters	are collected and reset	on every node
	  in the system. The SPV data collection daemon	must be
	  stopped if you want an individual application	to collect and
	  interpret the	RPM hardware counters. To stop the SPV daemon
	  the root user	must either select the SPV File	menu Data col-
	  lection command to temporarily stop the SPV data collection
	  or stop the SPV daemon itself. This is done by invoking
	  /etc/init.d/spv stop or modifying the	/etc/init.d/spv	script
	  to not start the daemon during system	boot.

	  The RPM hardware counters wrap around	to zero	when the max
	  32-bit count value has been reached.

	  The original intent of the RPM bus counters (rpm_cpu0,
	  rpm_cpu1, rpm_ltu, and rpm_exp) was to report	bus utiliza-
	  tion information. The	original concept was to	count bus
	  cycles when a	module becomes a bus master. Subsequent	per-
	  formance investigations indicated that becoming a bus	master
	  is an	expensive operation in terms of	bus cycles. Thus, the
	  default bus master is	rpm_cpu0 whether the CPU is using the
	  bus or not. The rpm_cpu1, rpm_ltu, and rpm_exp counters
	  correctly denote the amount of bus utilization for the mes-
	  sage coprocessor, the	ltu, and when the expansion card is a
	  bus master. However, the rpm_cpu0 bus	counter	does not
	  correctly reflect the	bus usage of application CPU.

	  The total utilization	of the bus counters (rpm_cpu0,
	  rpm_cpu1, rpm_ltu, and rpm_exp) is always a little over 97%
	  because about	2% of the bus is consumed for memory refresh.

     SEE ALSO
	  spv, dclock()

	  System Performance Visualization Tool	User's Guide

	  C System Calls Reference Manual










































Acknowledgement and Disclaimer