Many scientific and high-performance computing applications consist
  of multiple processes running on different processors that
  communicate frequently. Because of their synchronization needs, these applications 
  can suffer severe performance penalties if their processes are not all coscheduled 
  to run together.
  Two common approaches to coscheduling jobs are batch scheduling, wherein
  nodes are dedicated for the duration of the run, and gang
  scheduling, wherein time slicing is coordinated across processors.
  Both work well when jobs are load-balanced and make use of the entire
  parallel machine. However, these conditions are rarely met and most realistic
  workloads consequently suffer from both internal and external fragmentation,
  in which resources and processors are left
  idle because jobs cannot be packed with perfect efficiency.
  This situation leads to reduced utilization and suboptimal
  performance.  Flexible CoScheduling (FCS) addresses this problem by
  monitoring each job's computation granularity and communication pattern
  and scheduling jobs based on their synchronization and load-balancing 
  requirements. In particular, jobs that do not require stringent
  synchronization are identified, and are not coscheduled; instead, these
  processes are used to reduce fragmentation.
  FCS has been fully implemented on top of the
  STORM resource manager on a 256-processor Alpha cluster and compared to
  batch, gang, and implicit coscheduling algorithms.
  This paper describes in detail the implementation of FCS and
  its performance evaluation with a variety of workloads,
  including large-scale benchmarks, scientific applications, and dynamic workloads.
  The experimental results show that FCS saturates at higher loads than
  other algorithms (up to 54% higher in some cases), and displays
  lower response times and slowdown than the other algorithms in nearly all
  scenarios.

Keywords: Cluster computing, load balancing, job scheduling,
gang scheduling, parallel architectures, Flexible coscheduling