If you feel strongly that further changes are necessary, contact
Nick WestSee also the hot list of the most essential rules.
The purpose of this document is to collect together a list of standards which should be followed when writing C++ code. It assumes that the code is hand-written and not `generated' otherwise a different set of standards would be needed for the input to the code generator.
Standards are needed for different purposes. Some things which are permissible in C++ are relics of C which most people would consider to be best avoided. Some standards relate just to the appearance (and so the readability) of the code, but have no effect upon the output of the compiler. These are here because the code has to to be reviewed and should last a long time-long after some programmers have left particle physics to get a real job.
These standards should cause no problem to new programmers, but they might be perceived as a nuisance for established C and C++ programmers. These standards were arrived at by a long process within the ATLAS-OO group. In most cases some kind of consensus was reached, but in others a vote was needed. Some of the decisions are somewhat arbitrary, however some things do not matter (e.g. which side of the road we drive on) provided that we all do it the same. The standards comprise a set of rules which are to be followed for all code which is to to be reviewed; this should include all code which has any impact upon a published physics result. None of these rules is absolute. A rule can be waived, but only by prior agreement. Some of the rules could be checked by a program but others will require a human being-a reviewer.
We imagine that these rules will be modified a little in the future, either to restrict practices which prove to be dangerous, or to allow things which had been banned. Relaxing or removing a rule has no impact upon existing code, but strengthening a rule, or adding a new rule, will upset programmers because it may invalidate existing code. For this reason we propose to start with this rather detailed set of rules.
After this introduction comes the main body of the standards. The standard is divided into a series of sections each with its own set of numbered rules. So rule LAY-3 is rule 3 on layout. Each rule is followed by its source (in parentheses), which is one of:
Avoid writing FORTRAN.and
Code adhering to the FORTRAN standard shall not be written.mean the same.
The next section is a set of guidelines for writing good code. These are practices which we consider to be good but which cannot be defined as a clear rule.
Finally there is a summary of the rules, generated directly from the Standard.
A set of ELisp Macros have been developed by Brett Viren that can be used in conjunction with the emacs editor help layout header and implementation files in a standard way.
We will consistently use one formatting style to ensure that it is easy to read each others code-this is especially important for reviews. The layout described here is quite compact, and is also used in a number of text books, though it cannot be said to be the `best' on any objective grounds.
Unprintable characters can cause errors in code compilation and execution which are almost impossible to diagnose. Also this ensures easy transmission over networks.
emacs tip: one can force this using the .emacs line:
| (setq-default indent-tabs-mode nil)
Short lines generally enhance readability. This is why newspapers use multiple columns rather than a single column 60cm wide. There is also still a need to be able to print source code, view it in editors and send it in email (sometimes quoted) without having the lines wrapped.
This is subjective, however a reviewer should be able to spot what is not easy to read.
else
, and the
case
and default
keywords for
a switch statement) with the start of the control structure. The
closing brace will be on a line of its own.
The spacing must be consistent within a
single level of a block although nested blocks can be different if
absolutely necessary. The aim is to maximise readability - more spaces
help unless you start having to split lines.
(MINOS)
Example of a simple for loop:
for (int i = 0; i < size; i++) { sum += something(i); }Example of an
if
statement:
if (x < 0) { y = -x; cout << "x was negative" << endl; } else { y = x; }Example of nested ifs:
if (c == 'a') { cout << "It was an a" << endl; a++; } else if (c == 'z') { cout << "It was a z" << endl; z++; } else { cout << "It was neither a nor z" << endl; }The same logic with a switch statement:
switch (c) { case 'a': cout << "It was an a" << endl; a++; break; case 'z': cout << "It was a z" << endl; z++; break; default: cout << "It was neither a nor z" << endl; break; }
emacs tip: In emacs one can set the default indent in C++ mode using:
| ; set some stuff *after* cc-mode has started | ; thereby overriding the defaults | (setq c++-mode-hook | '(lambda () | ; possible styles: gnu, k&4, bsd, stroustrup, whitesmith, ellemtel, linux | (setq-default c-indentation-style "gnu") | (setq-default c-basic-offset 3) | ) | )
This is also subjective, and so relies upon the good judgement of the reviewer.
if
, for
, do
or while
) does not use a compound statement it must be coded
on one line if it fits within the column maximum.
(Based on B&N p32)
For example:
if (x < 0) x = -x;Which might also have been coded:
if (x < 0) { x = -x; }but not
if (x < 0) x = -x;
Some find it harder to identify the definitions with the control structure brace convention.
For example:
class MyClass { public: ... Number x() const { return m_x; } ... protected: ... Private: ... };
int array[20] = {
27,54,78,92,12, 76,34,21,98,0,
34,675,87,9,23, 87,23,65,78,0
}
instead of:
int array[20] = { 27,54,78,
92,12,76,34,21,98,0,34,675,87,9,
23,87,23,65,78,
0
}
Name the header file after the class, using the same case and use the suffix `.h'. This is rule NAM-4
This rule makes navigation in the code easier.
#ifndef FILENAME_H #define FILENAME_H // The text of the header goes in here ... #endif // FILENAME_H
The symbol, FILENAME_H, is constructed by taking the text which
normally follows the #include
where the file is included, and
converting `-', `/' and `.' to `_' and all letters to capitals.
(Atlas)
The Support tools can generate
this automatically.
This protects the header file from multiple inclusion and circular-references.
For example the file normally included by:
#include "e-calorimeter/seed.h"
would take the form:
#ifndef E_CALORIMETER_SEED_H #define E_CALORIMETER_SEED_H // The text of the header goes in here ... #endif // E_CALORIMETER_SEED_HNote the use of the same symbol as a comment on the
#endif
line. This is required if the text is more than five lines as is
explained later.
In ROOT, a method comment string is a part of a "ROOT meta-language". This "language" could be considered as an extension of C++. It's used to guess DataMember from comment string, see TMethod:FindDataMember, - to declare the method as a context-menu item. The list of active words is "*TOGGLE", "*MENU", "*SUBMENU"
Currently this is a short list, but it could be extended in the future and it is dangerous to mix it with other comments. (Minos)
The example:
class Point { public: Point(Number x, Number y); Number distance(Point point) const; Number distance(Line line) const; };is easier to understand than:
class Point { public: Point(Number, Number); Number distance(Point) const; Number distance(Line) const; };
This ensures that data members are only accessed from within member functions. Hiding data makes it easier to change implementation and provides a more uniform interface to the object.
For example:
class Point { public: Number x() const; // Return the x coordinate private: Number m_x; // The x coordinate (safely hidden) };The fact that the class Point has a data member m_x which holds the x coordinate is hidden.
For example:
class Point { public: Number x() const { return m_x; } // Return the x coordinate private Number m_x; };
func (...) const;
For example:
class Path { friend Path & operator + (const Path &, const Path &); public: Path(); ~Path(); protected: void draw(); private: class Internal { // Path::Internal declarations go here ... }; };As the default at the beginning of a class is
private
, this
means that friend declarations appear within a private
section. However friend declarations are unaffected by the
private
, protected
and public
keywords, so
placing them at the beginning seems most natural.An exception to the ordering is the placement of the members generated by the ClassDef macro; this macro must be placed at the very end of the class definition i.e. just before the closing }. However since these member are not visible to the reader of the header, this trangession is also invisible!
A full description should be placed in the implementation file, see rule IMP-4
Short comments are helpful in case names of data members are not enough self-descriptive. They should conform to Doxygen's convention on comments"
Name the implementation file after the class, using the same case using the suffix `.cxx'. This is rule NAM-5.
#include "directory_name/file_name.h"
.
Normally at least one level of directory should be specified.
(Atlas)
For example:
class Line; class Point { public: Number distance(Line line) const; // Distance from a line };Here it is sufficient to say that Line is a class, without giving details which are inside its header. This saves time in compilation and avoids an apparent dependency upon the Line header file. It also avoiding polluting the global name space which is very important for long-lived project
A global variable is a variable declared outside all blocks and classes
and without the static keyword, it has file scope and can have
external or internal linkage. A global variable with the external linkage
can be accessed and modified by any part of the program, that with
the internal linkage can be access from all functions in the file
but not from outside.
Class scope static objects provide the same scope control and better
reflect the organization of our programs.
The object of any class that must, by design, only have a single instantiation, is called a singleton. A program that consists of a single object is procedural, not OO, so any class that can only produce a single object should rightly be regarded with suspicion. Experience shows that singletons are both attractive and dangerous as a way of short-curcuiting a full OO design. Their existence in the framework can be justified where they manage some resource e.g. the database interface; if the resource is unique then it is reasonable that a unique object manages it.
Friend classes break encapsulation; they are usually the symptom of a bad design. If an object has a public face (for its clients) and a private face (for classes that cooperate with it to serve those clients) this can be achieved using Abstract Base Classes, which should reduce the need for friends.
Often it is not possible for the object to sensibly recover from some pre- or post- condition violation. This is particulary true in an OO program. Eventually the error handling throw ... catch may make it possible to do some event level recovery but such systems are not yet solid over a full range of platforms. Simply to do nothing and wait for the resulting segmentation violation is not acceptible as it gives no clue as to the error.
Passing an object by value creates a new copy of the object, which is fine if the object is small. Passing by constant reference passes the address of the object however the compiler will not allow the arguments to be modified.
For example:
class Point { public: Number distance(Point point) const; // Distance to a point Number distance(const Line & line) const; // Distance from a line };Here a Line may be large so it is passed by constant reference.
By implication, the omission of the const modifier implies that the argument may be modified. If the previous example had been written:
class Point { public: Number distance(Point point) const; // Distance to a point Number distance(Line & line) const; // Distance from a line };the member function, Number distance(Line &) const, would be allowed to modify the Line passed to it.
mutable
keyword should be
used where appropriate.)
(B&N p145)
For example:
class Point { public: Number distance(const Line & line) const; // Distance from line void translate(const Vector & vector); // Shift a point };here we see that distance is
const
; it does not change the state
of Point. Conversely translate is not declared const
,
because it does.
Declaring member functions to be const
where they are not
meant to modify member data makes the code safer, and is useful in
understanding what a class is supposed to do.
This is rather subjective, so the reviewer must decide if the use made of this feature is worth the risk. The risk is that if you miss off an argument by mistake, you will not be told by the compiler.
For example with:
class Point { public: Point(Number x, Number y = 0); // Create from (x,y) };then
Point p(10);would be valid and equivalent to:
Point p(10, 0);This might be what you want to happen, or it might be a typing error.
The compiler cannot check unspecified arguments. This facility was
used by the C library function, printf
. Now cout
should be used instead.
References are more convenient than pointers, however sometimes a pointer must be used for example when the function you call retains a reference (an alias) to the object you are passing in, such as when you construct a dynamic data structure.
Another example of an unavoidable pointer argument are C functions such as scanf that return results via their arguments, for example:-
scanf("%d", &num);
although this example infringes rule
DANG-1.
Even with the systematic use of the const
qualifier, it is
still worth designing functions which are free of side effects.
This makes the interface better to understand.
For example:
float f(1.23F); double d(1.23E-308);
Warning: Only globals and class static variables of arithmetic or pointer type are initialized to 0 by default; all others, including pointers, will have junk!! Always initialise your variables!!
Of course it is perfectly fine to use default constructors for objects e.g.
Number n;because any well written class would ensure all its constructors left objects properly and fully initialised (see INIT-3).
For example:
Number n(0); // a number used for ... Point p(20,10); // a point ...This example also shows a trailing comment for saying what each variable is.
Don't expect the user to call open or close functions. The destructor should carry out all close operations.
File scope is a complication we can do without.
This is because the order of the initialisers (when they are made explicit in the constructor definition) is ignored. So it is best to arrange them in the order in which they will be executed, which is base classes in the order in which the bases appear in the class definition followed by members in the order in which they are declared.
See the explanation with the previous rule.
explicit
keyword.
(Based on B&N p155)
This is to avoid possible confusing behaviour with assignments. A non-converting constructor constructs objects just as converting constructors, but does so only where a constructor call is explicitly indicated by the syntax.
For example with the definition of the constructors for Z:
class Z { public: explicit Z(int); // ... };the following assignment would be illegal:
Z a1 = 1;however
Z a1(1);would work, as would:
Z a1 = Z(1); Z* p = new Z(1);
For example
class Line { public: Line (const Line &); // copy constructor Line & operator= (const Line &); // assignment operator // ... };
Without these the default copy constructor and assignment operator would perform shallow copies and so produce two references to the same object. This is one of the easier ways to make a program crash.
If you do not think it likely that anyone will want to copy an object of the class you are working on, then just provide private declarations of the copy constructor and assignment operator then the object will be uncopyable.
This ensures that the object being copied is not altered by the copy or assign.
See the previous example.
This ensures that
a = b = c;will assign
c
to b
and then b
to a
as is
the case with built in objects.
This requires some care when writing assignment code, as the case when left and right operands are the same may require that most of the code is bypassed.
This is a minimum requirement to keep control of memory.
This makes sure that when the object goes out of scope and is destroyed, the pointer is not still trying to point to it.
delete
operator on
any pointer passed to it as an argument.
(Based on B&N p193).
This is also to avoid dangling pointers, i.e. pointers to memory which has been given back. Such code will often continue to work until the memory is re-allocated for another object.
For example if you define the operator +
, then it should do
something like addition.
This is the only way to get conversions of the left operand of binary operations to work. It is common in implementing the symmetric operator to call the corresponding asymmetric binary operator. The problem comes with situations like:-
Complex a,b;
a = b * 3.0;
if there is a converting constructor to convert 3.0 to Complex then this
will work if there is a member operator* function - the compiler tries
b.operator*(Complex(3.0)). However:-
a = 3.0 * b;
fails because (3.0).operator*(b) makes no sense. With a global function it will
try operator*(Complex(3.0),b) and all is well. Note however, that
converting constructors are banned by CTOR-1.
This ensures that objects can be properly deleted when referenced by base class pointers.
(Minos)
Guidelines:
GN1. Methods that make a new object which the caller
must delete should start with "Create".
GN2. Methods that accept an object the caller has
allocated and take responsibility for eventually
deleting it must start with "Adopt".
It would be very dangerous to use such a member function. A user many only mean to interrogate the object but could, be using the wrong calling sequence, end up changing it. In any case, overloading should only be used to group together logically equivalent operations and manifestly interrogation and modification are fundementally different.
Examples:
const float kKineticEnergy;
enum EColor { kRed, kGreen, kBlue };
Public enums should be included to global scope by adding to LinkDef.h e.g.
#pragma link C++ enum EColor;
This will make them available during an interactive CINT session.
Hardwired names are clearly unacceptible, but the natural alternative, environmental variables, can quickly become unmanagable. So MINOS will develop something along the lines of LHCb's Job Option service so that all this information is managed at a single point which provides a primary point of control to the job user. Until this service is available, the use of environmental variables is acceptible, but be prepared to rework code that accesses them.
The general message of this section is only to use the preprocessor for doing things which cannot be done by other means.
#define
to define symbolic constants.
(Based on B&N p76)
Use const
for individual constants or enum
for
enumeration.
#else
and #endif
directives should
carry a comment that tells what the corresponding #if
was
about if the conditional section is longer than five lines.
(Atlas)
For example:
#ifndef GEOMETRY_POINT_H #define GEOMETRY_POINT_H class Point { public: Point(Number x, Number y); // Create from (x,y) Number distance(Point point) const; // Distance to a point Number distance(const Line & line) const; // Distance from a line void translate(const Vector & vector); // Shift a point }; #endif // GEOMETRY_POINT_H
Following this rule ensures portability for different platforms.
Compiler warnings usually point out to imperfections that can be eliminated.
This is for portability reasons, to give consistent numerical results and for compatibility with persistence schemes.
EXIT_SUCCESS
or
EXIT_FAILURE
and no other value.
(Atlas)
For example:
#include <cstdlib> main () { bool success(true); // Do something which might clear success return success? EXIT_SUCCESS : EXIT_FAILURE; }
For example
switch (expression) { case constant: statements; break; default: statements; break; }
For example a file with the header:
#include <iostream.h>might have:
Number n; cin >> n; cout << "Value entered was " << n << endl;
In C it was normal to use the macro NULL. This might be for example
((void *) 0)
.
new
and delete
instead of
malloc()
and free()
(B&N p78)
struct
is deprecated. It must not be used in
the class interface and should only be used internally under exceptional
circumstances.
(Minos)
The struct is identical to the class except that by default its contents are public rather than private. As such it violates rule HEAD-5. By definition a struct represents a concept that is more global than its components and some processes will act directly on this concept. These processes should be bound to it i.e. as member functions rather than explicitly assuming a particular representation. This is particularly true of the interface; passing structs starts to expose an object's implimentation to its clients.
bool
type of C++ for booleans.
(Based on Taligent)
C programmers may tend to use int
instead of bool
as
that was all they had.
const char*
where possible.
(Minos)
(Type)variable
, nor the keyword reinterpret_cast
, and
only use const_cast
when obliged to do so when calling code
which has not used the const
keyword correctly.
(Atlas)
The dynamic_cast
is safe as it make use of run time type
information. If you have to make much use of the dynamic_cast
,
this could be a sign of poor design.
Unions offer a further means to shoot yourself in the foot. A union is similar to a class but all the data members have the same address. If you are a C programmer and are tempted to use a union think about polymorphism instead.
enum
for related constants rather than
const
.
(Taligent)
The enum
construct allows a new type to be defined and
hides the numerical values of the enumeration constants.
For example:
enum State {halted, starting, running, halting, paused};
[]
'
can only be used if the code is protected against out of bounds
access.
(Minos)
The built in arrays offer no protection from falling off the end, and they are rarely the most suitable container. There are situations, for example translation tables, where arrays are perfectly acceptible. However, as soon as arrays hold data passed to an object by its clients there is a risk that the arrays will overflow. For example, suppose the object will hold a file name that will be passed to it as character string, how big an array is really "safe"?
The minimal level of protection in such dynamic circumstances is as follows:-
For arrays of primitive data types ROOT offers a number of simple 1 dimensional arrays, for example TArrayF, an array of floats. These can be handled in a very intuitive way, for example:-
// Create an array of 10 floats
TArrayF my_floats{10);
// Use them
my_floats[0] = 0.;
for ( Int_t i=1; i<10, i++ ) my_floats[i] = my_floats[i-1] + 1;
For higher dimensional arrays, a standard library like CLHEP can be used.
Don't use the C style cast e.g.:-
float f = 3.14159; int i = 10 * (int) f;Instead choose the appropriate C++ style:-
Beside sets of coding rules, both ATLAS and BABAR have produced lists of good programming practices. Perhaps MINOS should merge these is a single list for our use?
B&N promote the separation of interface and implementation. They define an interface base class as a base class used to specify virtual functions that are defined in derived classes and called through pointers or references to the base class. Generally an interface base class has no data members and all its member functions are virtual.
class MyClassPrivate; class MyClass { private: MyClassPrivate* fPrivate; // hidden in implementation part public: .... };
put it in the implementaion part to avoid polluting the global name spaceclass MyClassPrivate { ... // put here the members of the MyClass };
assert(bool)
macro for implementing
preconditions, postconditions and class invariants. (Atlas)
else
, and the
case
and default
keywords for
a switch statement) with the start of the control structure. The
closing brace will be on a line of its own.
The spacing must be consistent within a
single level of a block although nested blocks can be different if
absolutely necessary. The aim is to maximise readability - more spaces
help unless you start having to split lines.
(MINOS)
if
, for
, do
or while
) does not use a compound statement it must be coded
on one line if it fits within the column maximum.
(Based on B&N p32)
int array[20] = {
27,54,78,92,12, 76,34,21,98,0,
34,675,87,9,23, 87,23,65,78,0
}
instead of:
int array[20] = { 27,54,78,
92,12,76,34,21,98,0,34,675,87,9,
23,87,23,65,78,
0
}
#ifndef FILENAME_H #define FILENAME_H // The text of the header goes in here ... #endif // FILENAME_H
func (...) const;
#include "directory_name/file_name.h"
.
Normally at least one level of directory should be specified.
(Atlas)
mutable
keyword should be
used where appropriate.)
(B&N p145)
explicit
keyword.
(Based on B&N p155)
delete
operator on
any pointer passed to it as an argument.
(Based on B&N p193).
#define
to define symbolic constants.
(Based on B&N p76)
#else
and #endif
directives should
carry a comment that tells what the corresponding #if
was
about if the conditional section is longer than five lines.
(Atlas)
EXIT_SUCCESS
or
EXIT_FAILURE
and no other value.
(Atlas)
new
and delete
instead of
malloc()
and free()
(B&N p78)
struct
is deprecated. It must not be used in
the class interface and should only be used internally under exceptional
circumstances.
(Minos)
bool
type of C++ for booleans.
(Based on Taligent)
const char*
where possible.
(Minos)
(Type)variable
, nor the keyword reinterpret_cast
, and
only use const_cast
when obliged to do so when calling code
which has not used the const
keyword correctly.
(Atlas)
enum
for related constants rather than
const
.
(Taligent)
[]
'
can only be used if the code is protected against out of bounds
access.
(Minos)