Summary
MemCheck is a leak checker built into ROOT. It was announced in the
3.02/06 Release Notes. which incorporated
MemCheck: Memory Usage check for ROOT-based Applications
To use it you have to link in ROOT's libNew.so which replaces the standard
new and delete operators. You run your application
as normal but, on exit, it dumps a log of all new and delete operations
which is then analysed by the ROOT utility memprobe. This
looks for leaks and runs gdb on your application, to
present the results as symbolic stack dumps so that you can not only see
the leaking location but also the stack of callers that produced it.
The checker can also be used when running macros directly from ROOT.
It checks leaks of all classes; the objects don't have to inherit
from TObject.
How to use MemCheck
- It is strongly recommended that you build the ROOT libraries from source
and switch on debug:-
cd $ROOTSYS
./configure <target_platform> --build=debug
e.g. ./configure linuxegcs --build=debug
gmake distclean
gmake
gmake install
- Use SRT to build applications as normal but then rebuild loon with
the gmake variable USE_ROOT_NEW to a non-zero value e.g.
rm `which loon`
gmake USE_ROOT_NEW=1 Loon.bin
This will include ROOT's libNew.so when building.
- To enable the leak checker edit your .rootrc (or create one
in your work directory), so that it has the line:-
Root.MemCheck: 1
- Run your application as normal.
- After you job has completed you should find it has created the file:-
memcheck.out
To analyse the output, you need to run the utility memprobe. Root
provides one which works O.K., apart from the fact that the stack
dumps are incomplete. memprobe presents leak information in source
form. To translate stack addresses into source code it runs the loon
job under gdb and, when the main program is reached, queries gdb using
the list command. However our loon executable doesn't contain all the
code; most of the application libraries are loaded as part of the job
script. This means the memprobe isn't able to decode many of the
more important addresses, and simply removes them from the stack
dumps, making them of limited value.
So we have our own version which is slightly modified in several
respects including:-
- The ability to run loon up to the point where all the libraries
have been loaded in before attempting to translate addresses.
- Where it still cannot translate an address, it doesn't remove it
from the stack listing but instead records it with the warning that
the source is not available.
You should find our version of memprobe, called memprobe_minos
to avoid any possible clash with memprobe, installed in the standard
SRT bin directory.
Run memprobe_minos as follows:-
memprobe_minos -e loon JobCPathModule::Create -bq job-script root-data-file
where:-
- -e loon
tells memprobe_minos the name of the executable.
- JobCPathModule::Create
is the name of a function at which to break. There are probably
several suitable ones but it has to be part of loon as the breakpoint
is set as soon as the main program is reached and it needs to be
sufficiently far into execution that any script loaded libraries
are in memory.
- -bq job-script root-data-file
is just the remainder of the standard loon job arguments.
Caution:
memprobe_minos will run gdb in a separate shell
This has a couple of consequences:-
- Make sure you run your application with your default
environment. For example, if you don't have
$SRT_PRIVATE_CONTEXT set by default when you log in,
but set it before you run your application, then memprobe_minos will use
the wrong libraries and the symbolic address will be invalid.
- The default directory will be whatever is the one at login,
so make sure any file names specified in the loon job arguments
use absolute file names.
memprobe_minos produces the following files:-
- leak.info
For every memory leak, ordered in decreasing size, it shows:
- A stack dump. Each stack entry has 2 lines:-
- The file and line number
- The source statment
- Number of Memory leaking allocations
- Total Size of the memory leak
- memcheck.info
Gives statistics about the use of operator new and operator
delete
The format is:
- Count N1 (N2)
- N1 = number of call to operator new
- N2 = number of call to new - number of call to delete
- Size N1 (N2)
-
- N1 = total size allocated via call to operator new
N2 = size allocated - size deleted
- multidelete.info
Gives info about multiple deletion in your code in the format:
- Multiple deletion: N1
- N1: negative integer defining the number of multiple calls to
operator delete
- line and line number where it occurred
Filtering the Output
Some "leaks" are beyond our control, for example the Dictionary member
function that rootcint adds via the CLASSDEF/IMP macro appears to be a
small leak. Indeed there are a lot of small "benign" leaks i.e. leaks
of objects that have to persist for the lifetime of the job, so
failure to delete them is not a memory loss. However they can produce
a lot of output in leak.info and even though leaks are ordered in
decreasing size, do making looking for genuine leaks harder. To deal
with this memprobe_minos has a facility to filter these out.
You can tell memprobe_minos to suppress selected reports by providing a filter
file. If the file filter contains:-
::Dictionary
then typing:-
memprobe_minos -f filter ...
will exclude all reports where any line in the stack trace contains
the text `` ::Dictionary''. Filter lines work on both lines of a stack entry.
For example:-
THashTable.cxx:98
Rehash(fEntries)
would suppress any report where the stack included either line 98
of THashTable.cxx or the Rehash source statement.
Looking for Continuous Leaks
As explained in the previous section, benign leaks may make finding
serious leaks harder, particularly small but continuous ones which can
ultimately kill a program. A way to identify such leaks is to run a
job twice, first for only a few events and then again for a slightly
longer run and compare the two outputs. Many benign leaks occur during
initialisation and the first event cycle so will be subtracted out.
The utility compare_memcheck_info can be used to compare two
sets of memcheck.info files. The procedure is as follows:-
- Run a job for just a few events.
- Run memprobe_minos and rename memcheck.info to memcheck_1.info .
- Run the job again for a few more events.
- Run memprobe_minos again and rename memcheck.info to memcheck_2.info .
- Compare the two files using compare_memcheck_info:-
compare_memcheck_info memcheck_2.info memcheck_1.info
This produces a list of the additional leaks produced in the longer
job ordered in decreasing size. Each leak starts with a header line
e.g.:-
Leak total: 225976 number: 6 average size: 37662.67 (leak rate 100.0 %)
This shows that the total leak is 225976 bytes resulting from 6 heap
allocations giving a average size of 37662.67. A non-integral average
size means that not all objects were of the same size. The leak was
100% i.e. there were no other heap allocations from this point in the code
that were subsequently released.
compare_memcheck_info prints out the stack in a slightly
different format to that given in leak.info In particular each
stack entry normally only occupies one line rather than two and the
full directory path of the source file is suppressed, which makes for
easier reading. Although it's primary function is to compare two
memcheck.info files, it can also be used to reformat a single
set of output by making the second file a null:-
compare_memcheck_info >memcheck_2.info /dev/null
High Water Mark (HWM) Analysis We have conjectured, but not
yet encountered, a memory consumption problem that isn't strictly a
leak, because all the objects are still owned, but none the less is a
bug that eventually causes the program to crash. The problem is that
of the Greedy Client i.e. some client that requests access, but
not ownership, of objects from a service. Instead of releasing them,
the client asks for more and more. Such a problem isn't detectable in
memory checkers such as Valgrind because,
at program tear down, the server removes all the objects.
A way to study such a problem is to do HWM analysis. This involves
recording the maximum number of objects created at any time throughout
a job. By comparing two jobs with different numbers of events, greedy
client would show themselves by the fact that the HWM value increases
as the run increases.
To do HWM analysis involves a change to Root's:-
./newdelete/src/MemCheck.cxx
./newdelete/inc/MemCheck.h
./include/MemCheck.h
The following has worked at Oxford in August 2004, but should be regarded
as development notes.
- Modify MemCheck.h
In definition of class TStackInfo add:-
Int_t fHWMAllocSize; //high water mark size of allocated memory
In TStackInfo::Inc(Int_t memSize) add:-
if ( fHWMAllocSize < fAllocSize ) fHWMAllocSize = fAllocSize;
- Modify MemCheck.cxx
In TStackInfo::Init replace:-
fTotalAllocCount = fTotalAllocSize = fAllocCount = fAllocSize = 0;
with:-
fTotalAllocCount = fTotalAllocSize = fAllocCount = fAllocSize = fHWMAllocSize = 0;
In TMemHashTable::Dump() replace:-
fprintf(fp, "size %d:%d:%d:%d ",
info->fTotalAllocCount, info->fTotalAllocSize,
info->fAllocCount, info->fAllocSize);
with:-
fprintf(fp, "size %d:%d:%d:%d:%d ",
info->fTotalAllocCount, info->fTotalAllocSize,
info->fAllocCount, info->fAllocSize, info->fHWMAllocSize);
- Run loon job as normal.
- Run memprobe_minos as normal (it automatically detects the
additional HWM information).
- Rename the memcheck.info file to memcheck_1.info and
repeat last two steps with a longer run this time renaming to
memcheck_2.info.
- Compare the two outputs using compare_memcheck_info but
specify HWM analysis:-
setenv COMPARE_MEMCHECK_INFO_HWM_ANAL 1
compare_memcheck_info memcheck_2.info memcheck_1.info
|