Future Technologies Group
Berkeley Lab Computing Sciences

Group Members

Current Projects

Past Projects

Related Sites

    

Berkeley Lab Checkpoint/Restart (BLCR)

Future Technologies Group researchers are developing a hybrid kernel/user implementation of checkpoint/restart. Their goal is to provide a robust, production quality implementation that checkpoints a wide range of applications, without requiring changes to be made to application code. This work focuses on checkpointing parallel applications that communicate through MPI, and on compatibility with the software suite produced by the SciDAC Scalable Systems Software ISIC. This work is broken down into 4 main areas:

  • Checkpoint/Restart for Linux (CR)
  • Checkpointable MPI Libraries
  • Resource Management Interface to Checkpoint/Restart
  • Development of Process Management Interfaces

News

January 12, 2009
Version 0.8.0 is now available from the Checkpoint Downloads page.
This version adds new features, fixes several bugs, and extends support to kernels through 2.6.28.
August 12, 2008
Version 0.7.3 is now available from the Checkpoint Downloads page.
This version fixes several bugs seen in 0.7.2.
July 28, 2008
Version 0.7.2 is now available from the Checkpoint Downloads page.
This version fixes several bugs seen in 0.7.1.
July 14, 2008
Version 0.7.1 has been verified to work correctly with the final 2.6.26 kernel (released yesterday).
June 25, 2008
Version 0.7.1 is now available from the Checkpoint Downloads page.
This version fixes several bugs and extends support to 2.6.26-rc7 kernels (we don't normally track the -rc kernels, but are hoping 0.7.1 will support the upcoming full 2.6.26 release).
May 30, 2008
Version 0.7.0 is now available from the Checkpoint Downloads page.
This version adds several useful features to the checkpoint and restart utilities; extends the range of supported kernels; and fixes numerous bugs. Support for PPC32 platforms is a new experimental features in this release. As previously announced, this release drops support for LinuxThreads; NPTL is now the only supported pthreads implementation. For a complete list of changes since 0.6.5, please see the NEWS file.
February 29, 2008
Version 0.6.5 is now available from the Checkpoint Downloads page.
This version fixes two potential kernel panics.
January 28, 2008
Version 0.6.4 is now available from the Checkpoint Downloads page.
This version fixes a potential kernel panic when checkpointing mmap()s of HUGETLBFS files with some kernels.
January 22, 2008
Version 0.6.3 is now available from the Checkpoint Downloads page.
This version fixes a serious floating-point corruption bug on the x86-64 platform, present in all BLCR releases since 0.4.2. Users of BLCR on the x86-64 architecture are strongly encouraged to upgrade.
January 14, 2008
Version 0.6.2 is now available from the Checkpoint Downloads page.
This version fixes significant bugs in 0.6.1 and adds support for 2.6.23 kernels (and some vendors' 2.6.22.x kernels).
September 25, 2007
Version 0.6.1 is now available from the Checkpoint Downloads page.
This version fixes minor bugs in 0.6.0.
September 10, 2007
Version 0.6.0 is now available from the Checkpoint Downloads page.
This version adds support for checkpoint/restart of memory shared via mmap(MAP_SHARED), of open unlinked files, and of pending signals; extends the range of supported kernels; greatly expands the test suite; and fixes numerous bugs. Support for PPC64 and ARM platforms, and for cross-compilation, are new experimental features in this release.
August 28, 2007
Announcing deprecated support for LinuxThreads and for Linux 2.4.X kernels
  • Starting with the 0.6.0 release, new bug reports that one cannot reproduce under NPTL + Linux 2.6.x will receive little or none of our attention. However, we will try to distribute user-contributed fixes for such bugs. Note that the 0.6.0 release is expected to pass the BLCR test-suite under LinuxThreads and/or 2.4.x kernels on the developers' x86 systems.
  • Beginning with the next "full" release (0.7.0) we will begin to remove code in BLCR that exists only to support LinuxThreads and/or Linux 2.4.x.
  • We have not yet decided the fate of support for those 2.4.x kernels which include Red Hat's backport of NPTL support (RHL9.0, RHEL, RHAS, etc.).
  • If anybody cares enough about 2.4.x and/or LinuxThreads to volunteer to take over testing and maintenance of BLCR on such platforms, let us know.
July 11, 2007
Version 0.5.6 is now available from the Checkpoint Downloads page.
This version fixes a bug that could lead to corrupted restores of data buffered in pipes. All BLCR users with kernel versions 2.6.14 or newer are strongly encouraged to upgrade.
April 27, 2007
Version 0.5.5 is now available from the Checkpoint Downloads page.
This version adds support for 2.6.21 kernels.
April 20, 2007
Version 0.5.4 is now available from the Checkpoint Downloads page.
This version fixes some problems reported in 0.5.3.
March 29, 2007
Version 0.5.3 is now available from the Checkpoint Downloads page.
This version fixes minor problems reported in 0.5.2.
March 23, 2007
Version 0.5.2 is now available from the Checkpoint Downloads page.
This version fixes minor problems reported in 0.5.0 and 0.5.1.
March 20, 2007
Version 0.5.1 is now available from the Checkpoint Downloads page.
This version adds support for newer kernel versions, including 2.6.20 and 2.6.17-5mdv, and fixes some minor problems reported in 0.5.0.
March 2, 2007
Version 0.5.0 is now available from the Checkpoint Downloads page.
This version adds support for checkpoint/restart of groups of related processes, extends the range of supported kernels, improves I/O performance, and fixes numerous bugs.
November 23, 2005
Version 0.4.2 is now available from the Checkpoint Downloads page.
This version adds support for x86_64 (Opteron/EM64T) processors, stable support for Linux 2.6, and a number of miscellaneous bugfixes.
February 18, 2005
Version 0.4.0 is now available from the Checkpoint Downloads page.
This version adds experimental support for Linux 2.6.x kernels.

Documentation

Publications

Downloads

Other Resources

Features

  • Fully SMP safe
  • Rebuilds the virtual address space and restores registers
  • Supports the NPTL implementation of POSIX threads (LinuxThreads is no longer supported)
  • Restores file descriptors, and state associated with an open file
  • Restores signal handlers, signal mask, and pending signals.
  • Restores the process ID (PID), thread group ID (TGID), parent process ID (PPID), and process tree to old state.
  • Support save and restore of groups of related processes and the pipes that connect them.
  • Should work with nearly any x86 Linux system that uses either a 2.4 or 2.6 kernel, or x86_64 system with a 2.6 kernel (see FAQ for most recent info). Verified to work on SuSE Linux 9.x/ 10.0; Red Hat 7.2 through 9; Red Hat Enterprise Linux version 3; CentOS 3.1; Fedora Core 2, 3 and 4; and many vanilla Linux kernels (from kernel.org) from 2.4.0 on up (and many more).
  • Experimental support is present for PPC, PPC64 and ARM architures. We consider this support experimental mainly because of our limited ability to test it.
  • Tested with the GNU C library (glibc) versions 2.1 through 2.6

  • For more information, check these pages, or send e-mail to checkpoint-NO SPAM@lbl . gov