Administrator's Guide
[ LSF Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]
Backfill Scheduling
About Backfill Scheduling
By default, a reserved job slot cannot be used by another job. To make better use of resources and improve performance of LSF, you can configure backfill scheduling. Backfill scheduling allows other jobs to use the reserved job slots, as long as the other jobs will not delay the start of the big parallel job. Backfilling, together with processor reservation, allows large parallel jobs to run while not underutilizing resources.
In a busy cluster, processor reservation helps to schedule big parallel jobs sooner. However, by default, reserved processors remain idle until the big job starts. This degrades the performance of LSF because the reserved resources are idle while jobs are waiting in the queue.
Backfill scheduling allows the reserved job slots to be used by small jobs that can run and finish before the big job starts. This improves the performance of LSF because it increases the utilization of resources.
How Backfilling Works
For backfill scheduling, LSF assumes that a job will run until its run limit expires. Backfill scheduling works most efficiently when all the jobs in the cluster have a run limit.
Since jobs with a shorter run limit have more chance of being scheduled as backfill jobs, users who specify appropriate run limits in a backfill queue will be rewarded by improved turnaround time.
Once the big parallel job has reserved sufficient job slots, LSF calculates the start time of the big job, based on the run limits of the jobs currently running in the reserved slots. LSF cannot backfill if the big job is waiting for a job that has no run limit defined.
If LSF can backfill the idle job slots, only jobs with run limits that expire before the start time of the big job will be allowed to use the reserved job slots. LSF cannot backfill with a job that has no run limit.
In this scenario, assume the cluster consists of a 4-CPU multiprocessor host.
- A sequential job (
job1
) with a run limit of 2 hours is submitted and gets started at 8:00 am (figure (a)).- Shortly afterwards, a parallel job (
job2
) requiring all 4 CPUs is submitted. It cannot start right away because ofjob1
, so it reserves the remaining 3 processors (figure (b)).- At 8:30 am, another parallel job (
job3
) is submitted requiring only two processors and with a run limit of 1 hour. Sincejob2
cannot start until 10:00am (whenjob1
finishes), its reserved processors can be backfilled byjob3
(figure (c)). Thereforejob3
can complete beforejob2
's start time, making use of the idle processors.Job3
will finish at 9:30am andjob1
at 10:00am, allowingjob2
to start shortly after 10:00am.In this example, if
job3
's run limit was 2 hours, it would not be able to backfilljob2
's reserved slots, and would have to run afterjob2
finishes.
- A job will not have an estimated start time immediately after MBD is reconfigured.
- You can never preempt jobs in a backfill queue (a job in a backfill queue might be running in a reserved job slot, and starting a new job in that slot might delay the start of the big parallel job)
- A backfill job borrows a job slot that is already taken by another job. The backfill job will not run at the same time as the job that reserved the job slot first. Backfilling can take place even if the job slot limits for a host or processor have been reached. However, backfilling cannot take place if the job slot limits for users or queues have been reached.
Configuring Backfill Scheduling
Backfill scheduling is enabled at the queue level. Only jobs in a backfill queue can backfill reserved job slots. If the backfill queue also allows processor reservation, then backfilling can occur among jobs within the same queue.
To configure a backfill queue, define BACKFILL in
lsb.queues
.Specify
Y
to enable backfilling. To disable backfilling, specifyN
or blank space.BACKFILL=Y
Enforcing Run Limits
Backfill scheduling works most efficiently when all the jobs in a cluster have a run limit specified at the job level (
bsub -W
). You can use the external submission executable,esub
, to make sure that all users specify a job-level run limit.Otherwise, you can specify ceiling and default run limits at the queue level (
lsb.queues
RUNLIMIT).Viewing Information About Job Start Time
Use
bjobs -l
orxlsbatch
to view the estimated start time of a job.[ Top ]
[ LSF Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]
Date Modified: February 28, 2001
Platform Computing Corporation: www.platform.com
North America: +1 905 948 4297
Europe: +44 1256 370 530
Asia: +86 1062 381125
Toll-Free: 1-877-444-4LSF (+1 877 444 4573)
Support: support@platform.com
Information Development: doc@platform.com
Copyright © 2001 Platform Computing Corporation All rights reserved.