Using Process Yield Calls in the CH_SHMEM device

The ch_shmem device uses shared memory to communicate between processes. It is fast and very convenient on both shared-memory platforms and when using a single system (even with only one processor) for program development. However, during communication, the device may need to wait until data arrives. The implementation uses a backoff strategy combined with calls to a yeild function. This note describes the behavior of several implementations of the yield function, and compares that to the performance when no yield function is used.

The examples were configured and built with the following commands on shakey, a dual processor Linux system:
POSIX sched_yield
   configure --with-device=ch_shmem -opt=-O --enable-yield=sched_yield
   make
   cd examples/perftest
   make 
   mpptest -np 4 -bisect -gnuploteps -givedy -fname sched
   gnuplot sched > sched.eps
select
   configure --with-device=ch_shmem -opt=-O --enable-yield=select
   make
   cd examples/perftest
   make 
   mpptest -np 4 -bisect -gnuploteps -givedy -fname select
   gnuplot select > select.eps
No yield
   configure --with-device=ch_shmem -opt=-O --disable-yield
   make
   cd examples/perftest
   make 
   mpptest -np 4 -bisect -gnuploteps -givedy -fname noyield
   gnuplot noyield > noyield.eps

The POSIX sched_yield is described in setion 13.3.5. In quick tests, I found implementations (none of which returned ENOSYS, the POSIX code for "not supported") on Linux, Solaris, Tru64, and IRIX.

The results are show below:
sched_yieldselectno yield

The key point to notice is that while the minimum times are nearly the same, the average and maximum times are far greater when either select or no yield is used.

Even when only two processes are used on a (shared) dual processor, using sched_yield is advantageous:
sched_yieldno yieldno yield (same y range as sched)
These results suggest (but don't prove, since this is on a shared system) that using sched_yield, at least under Linux, is always a good idea.

Using Process Yield Calls in the CH_P4 Device

Here are similar results for using sched_yield in the ch_p4 device (in the routine recv_message). These are run on a two-processor Linux machine.
Number of processessched_yieldno yield
4
2
Note that the "no yield" mode is faster when there are sufficient processors, but demonstrates very poor average and worst-case performance when there are more processes than processors.