Fermilab


MINOS Offline Documentation
[home] [installing MINOS software] [supported platforms] [package links] [mailing list] [HyperNews] [FAQ] [computing help] [MINOS glossary] [archives]

MemCheck




Summary

MemCheck is a leak checker built into ROOT. It was announced in the 3.02/06 Release Notes. which incorporated MemCheck: Memory Usage check for ROOT-based Applications

To use it you have to link in ROOT's libNew.so which replaces the standard new and delete operators. You run your application as normal but, on exit, it dumps a log of all new and delete operations which is then analysed by the ROOT utility memprobe. This looks for leaks and runs gdb on your application, to present the results as symbolic stack dumps so that you can not only see the leaking location but also the stack of callers that produced it. The checker can also be used when running macros directly from ROOT. It checks leaks of all classes; the objects don't have to inherit from TObject.

How to use MemCheck

  1. It is strongly recommended that you build the ROOT libraries from source and switch on debug:-
        cd $ROOTSYS
       ./configure <target_platform> --build=debug
       e.g. ./configure linuxegcs --build=debug
       gmake distclean
       gmake
       gmake install 
    
  2. Use SRT to build applications as normal but then rebuild loon with the gmake variable USE_ROOT_NEW to a non-zero value e.g.
      rm `which loon`
      gmake USE_ROOT_NEW=1 Loon.bin
    
    This will include ROOT's libNew.so when building.

  3. To enable the leak checker edit your .rootrc (or create one in your work directory), so that it has the line:-
      Root.MemCheck: 1
    

  4. Run your application as normal.

  5. After you job has completed you should find it has created the file:-
      memcheck.out
    

To analyse the output, you need to run the utility memprobe. Root provides one which works O.K., apart from the fact that the stack dumps are incomplete. memprobe presents leak information in source form. To translate stack addresses into source code it runs the loon job under gdb and, when the main program is reached, queries gdb using the list command. However our loon executable doesn't contain all the code; most of the application libraries are loaded as part of the job script. This means the memprobe isn't able to decode many of the more important addresses, and simply removes them from the stack dumps, making them of limited value.

So we have our own version which is slightly modified in several respects including:-

  1. The ability to run loon up to the point where all the libraries have been loaded in before attempting to translate addresses.
  2. Where it still cannot translate an address, it doesn't remove it from the stack listing but instead records it with the warning that the source is not available.
You should find our version of memprobe, called memprobe_minos to avoid any possible clash with memprobe, installed in the standard SRT bin directory. Run memprobe_minos as follows:-
  memprobe_minos -e loon JobCPathModule::Create -bq job-script root-data-file
where:-
  1. -e loon
    tells memprobe_minos the name of the executable.

  2. JobCPathModule::Create
    is the name of a function at which to break. There are probably several suitable ones but it has to be part of loon as the breakpoint is set as soon as the main program is reached and it needs to be sufficiently far into execution that any script loaded libraries are in memory.

  3. -bq job-script root-data-file
    is just the remainder of the standard loon job arguments.

Caution: memprobe_minos will run gdb in a separate shell This has a couple of consequences:-
  • Make sure you run your application with your default environment. For example, if you don't have $SRT_PRIVATE_CONTEXT set by default when you log in, but set it before you run your application, then memprobe_minos will use the wrong libraries and the symbolic address will be invalid.

  • The default directory will be whatever is the one at login, so make sure any file names specified in the loon job arguments use absolute file names.

memprobe_minos produces the following files:-

  1. leak.info
    For every memory leak, ordered in decreasing size, it shows:
    • A stack dump. Each stack entry has 2 lines:-
      • The file and line number
      • The source statment
    • Number of Memory leaking allocations
    • Total Size of the memory leak

  2. memcheck.info
    Gives statistics about the use of operator new and operator delete
    The format is:
    • Count N1 (N2)
      • N1 = number of call to operator new
      • N2 = number of call to new - number of call to delete
    • Size N1 (N2)
      • N1 = total size allocated via call to operator new
      N2 = size allocated - size deleted

  3. multidelete.info
    Gives info about multiple deletion in your code in the format:
    • Multiple deletion: N1
      • N1: negative integer defining the number of multiple calls to operator delete
      • line and line number where it occurred

Filtering the Output

Some "leaks" are beyond our control, for example the Dictionary member function that rootcint adds via the CLASSDEF/IMP macro appears to be a small leak. Indeed there are a lot of small "benign" leaks i.e. leaks of objects that have to persist for the lifetime of the job, so failure to delete them is not a memory loss. However they can produce a lot of output in leak.info and even though leaks are ordered in decreasing size, do making looking for genuine leaks harder. To deal with this memprobe_minos has a facility to filter these out. You can tell memprobe_minos to suppress selected reports by providing a filter file. If the file filter contains:-
  ::Dictionary
then typing:-
  memprobe_minos -f filter ...
will exclude all reports where any line in the stack trace contains the text `` ::Dictionary''. Filter lines work on both lines of a stack entry. For example:-
THashTable.cxx:98
Rehash(fEntries)
would suppress any report where the stack included either line 98 of THashTable.cxx or the Rehash source statement.

Looking for Continuous Leaks

As explained in the previous section, benign leaks may make finding serious leaks harder, particularly small but continuous ones which can ultimately kill a program. A way to identify such leaks is to run a job twice, first for only a few events and then again for a slightly longer run and compare the two outputs. Many benign leaks occur during initialisation and the first event cycle so will be subtracted out. The utility compare_memcheck_info can be used to compare two sets of memcheck.info files. The procedure is as follows:-
  1. Run a job for just a few events.
  2. Run memprobe_minos and rename memcheck.info to memcheck_1.info .
  3. Run the job again for a few more events.
  4. Run memprobe_minos again and rename memcheck.info to memcheck_2.info .
  5. Compare the two files using compare_memcheck_info:-
      compare_memcheck_info memcheck_2.info memcheck_1.info
    
This produces a list of the additional leaks produced in the longer job ordered in decreasing size. Each leak starts with a header line e.g.:-
  Leak total: 225976 number: 6 average size:  37662.67 (leak rate 100.0 %)
This shows that the total leak is 225976 bytes resulting from 6 heap allocations giving a average size of 37662.67. A non-integral average size means that not all objects were of the same size. The leak was 100% i.e. there were no other heap allocations from this point in the code that were subsequently released.

compare_memcheck_info prints out the stack in a slightly different format to that given in leak.info In particular each stack entry normally only occupies one line rather than two and the full directory path of the source file is suppressed, which makes for easier reading. Although it's primary function is to compare two memcheck.info files, it can also be used to reformat a single set of output by making the second file a null:-

  compare_memcheck_info >memcheck_2.info /dev/null

High Water Mark (HWM) Analysis

We have conjectured, but not yet encountered, a memory consumption problem that isn't strictly a leak, because all the objects are still owned, but none the less is a bug that eventually causes the program to crash. The problem is that of the Greedy Client i.e. some client that requests access, but not ownership, of objects from a service. Instead of releasing them, the client asks for more and more. Such a problem isn't detectable in memory checkers such as Valgrind because, at program tear down, the server removes all the objects.

A way to study such a problem is to do HWM analysis. This involves recording the maximum number of objects created at any time throughout a job. By comparing two jobs with different numbers of events, greedy client would show themselves by the fact that the HWM value increases as the run increases.

To do HWM analysis involves a change to Root's:-

./newdelete/src/MemCheck.cxx
./newdelete/inc/MemCheck.h
./include/MemCheck.h
The following has worked at Oxford in August 2004, but should be regarded as development notes.
  1. Modify MemCheck.h
    In definition of class TStackInfo add:-
        Int_t       fHWMAllocSize;     //high water mark size of allocated memory
    
    In TStackInfo::Inc(Int_t memSize) add:-
        if ( fHWMAllocSize < fAllocSize ) fHWMAllocSize = fAllocSize;
    
  2. Modify MemCheck.cxx
    In TStackInfo::Init replace:-
        fTotalAllocCount = fTotalAllocSize = fAllocCount = fAllocSize = 0;
    with:-
        fTotalAllocCount = fTotalAllocSize = fAllocCount = fAllocSize = fHWMAllocSize = 0;
    
    In TMemHashTable::Dump() replace:-
              fprintf(fp, "size %d:%d:%d:%d  ",
                      info->fTotalAllocCount, info->fTotalAllocSize,
                      info->fAllocCount, info->fAllocSize);
    with:-
              fprintf(fp, "size %d:%d:%d:%d:%d  ",
                     info->fTotalAllocCount, info->fTotalAllocSize,
                     info->fAllocCount, info->fAllocSize, info->fHWMAllocSize);
    
  3. Run loon job as normal.
  4. Run memprobe_minos as normal (it automatically detects the additional HWM information).
  5. Rename the memcheck.info file to memcheck_1.info and repeat last two steps with a longer run this time renaming to memcheck_2.info.
  6. Compare the two outputs using compare_memcheck_info but specify HWM analysis:-
      setenv COMPARE_MEMCHECK_INFO_HWM_ANAL 1
      compare_memcheck_info memcheck_2.info memcheck_1.info
    



Last Modified: $Date: 2004/10/12 17:12:50 $
Contact: n.west1@physics.ox.ac.uk
Page viewed from http://www-numi.fnal.gov/offline_software/srt_public_context/WebDocs/MemCheck.html
Fermilab
Security, Privacy, Legal Fermi National Accelerator Laboratory