Learn more about LSF at
   http://www.platform.com

Administrator's Guide

[ LSF Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]



Backfill Scheduling


About Backfill Scheduling

By default, a reserved job slot cannot be used by another job. To make better use of resources and improve performance of LSF, you can configure backfill scheduling. Backfill scheduling allows other jobs to use the reserved job slots, as long as the other jobs will not delay the start of the big parallel job. Backfilling, together with processor reservation, allows large parallel jobs to run while not underutilizing resources.

In a busy cluster, processor reservation helps to schedule big parallel jobs sooner. However, by default, reserved processors remain idle until the big job starts. This degrades the performance of LSF because the reserved resources are idle while jobs are waiting in the queue.

Backfill scheduling allows the reserved job slots to be used by small jobs that can run and finish before the big job starts. This improves the performance of LSF because it increases the utilization of resources.

How Backfilling Works

For backfill scheduling, LSF assumes that a job will run until its run limit expires. Backfill scheduling works most efficiently when all the jobs in the cluster have a run limit.

Since jobs with a shorter run limit have more chance of being scheduled as backfill jobs, users who specify appropriate run limits in a backfill queue will be rewarded by improved turnaround time.

Once the big parallel job has reserved sufficient job slots, LSF calculates the start time of the big job, based on the run limits of the jobs currently running in the reserved slots. LSF cannot backfill if the big job is waiting for a job that has no run limit defined.

If LSF can backfill the idle job slots, only jobs with run limits that expire before the start time of the big job will be allowed to use the reserved job slots. LSF cannot backfill with a job that has no run limit.

Example

In this scenario, assume the cluster consists of a 4-CPU multiprocessor host.

  1. A sequential job (job1) with a run limit of 2 hours is submitted and gets started at 8:00 am (figure (a)).
  2. Shortly afterwards, a parallel job (job2) requiring all 4 CPUs is submitted. It cannot start right away because of job1, so it reserves the remaining 3 processors (figure (b)).
  3. At 8:30 am, another parallel job (job3) is submitted requiring only two processors and with a run limit of 1 hour. Since job2 cannot start until 10:00am (when job1 finishes), its reserved processors can be backfilled by job3 (figure (c)). Therefore job3 can complete before job2's start time, making use of the idle processors.
  4. Job3 will finish at 9:30am and job1 at 10:00am, allowing job2 to start shortly after 10:00am.

In this example, if job3's run limit was 2 hours, it would not be able to backfill job2's reserved slots, and would have to run after job2 finishes.

Limitations

Configuring Backfill Scheduling

Backfill scheduling is enabled at the queue level. Only jobs in a backfill queue can backfill reserved job slots. If the backfill queue also allows processor reservation, then backfilling can occur among jobs within the same queue.

Configure a Backfill Queue

To configure a backfill queue, define BACKFILL in lsb.queues.

Specify Y to enable backfilling. To disable backfilling, specify N or blank space.

Example

BACKFILL=Y

Enforcing Run Limits

Backfill scheduling works most efficiently when all the jobs in a cluster have a run limit specified at the job level (bsub -W). You can use the external submission executable, esub, to make sure that all users specify a job-level run limit.

Otherwise, you can specify ceiling and default run limits at the queue level (lsb.queues RUNLIMIT).

Viewing Information About Job Start Time

Use bjobs -l or xlsbatch to view the estimated start time of a job.

[ Top ]


[ LSF Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]


      Date Modified: February 28, 2001
Platform Computing Corporation: www.platform.com

North America: +1 905 948 4297
Europe: +44 1256 370 530
Asia: +86 1062 381125
Toll-Free: 1-877-444-4LSF (+1 877 444 4573)

Support: support@platform.com
Information Development: doc@platform.com

Copyright © 2001 Platform Computing Corporation All rights reserved.