Fine-grained parallel applications require all their processes to
  run simultaneously on distinct processors to achieve good
  efficiency. This is typically accomplished by space slicing, wherein
  nodes are dedicated for the duration of the run, or by gang
  scheduling, wherein time slicing is coordinated across processors.
  Both schemes suffer from fragmentation, where processors are left
  idle because jobs cannot be packed with perfect efficiency.
  Obviously, this leads to reduced utilization and sub-optimal
  performance.  Flexible coscheduling (FCS) solves this problem by
  monitoring each job's granularity and communication activity, and
  using gang scheduling only for those jobs that require it.
  Processes from other jobs, which can be scheduled without any
  constraints, are used as filler to reduce fragmentation.  In
  addition, inefficiencies due to load imbalance and hardware
  heterogeneity are also reduced because the classification is done on
  a per-process basis.  FCS has been fully implemented as part of the
  STORM resource manager, and shown to be competitive with gang
  scheduling and implicit coscheduling.