When developing new modules or enhancements to the code, the developer must pay careful attention to design and coding style. This chapter offers guidance on general design requirements and defines coding style rules. In addition, the specific considerations for inserting a new module into the code are presented in detail.
The complexity of NWChem and the large number of developers working with the code makes it highly advisable to consider very carefully the effect of even minor changes in the code. This is particularly the case when considering changes that may impact the performance of the code in a parallel computing environment. The first, last, and only rule is Think before you code! Then think again. Nothing will be as simple as you thought at first.
However, the code is not likely to develop new capability on its own, so you must do something, sooner or later. The following list of design guidelines should be followed when adding code to NWChem.
Remember, fortune favors the bold, but you will live longer and happier if you consult regularly with other NWChem developers and the NWChem Program Manager.
In a project this large, it is necessary to impose some standards on the coding style employed by developers. The primary goal of these standards is not to constrain developers, but to enhance both the quality of the final product and its functionality.
Code quality is somewhat subjective, but clearly embraces the ideas of
The single most important thing you can do to achieve quality code has little to do with programming style. It is the design -- putting in the necessary thought and effort before even a single line of code is written.
The following subsections present the recommended "Do's and Don'ts" for programming modules and modifications in NWChem. The recommendations are organized by a 'top-down' logic, to reflect the most efficient order in thinking about the various considerations the developer must keep in mind when designing a new piece of code.
Each source file should include a comment line that contains the CVS
revision and date information. This is accomplished by including a
comment line containing the string $
Id
$
. CVS
substitutes the correct version information each time the file is
checked out or updated. These lines are processed from the source and can be
output at runtime to aid in bug-tracking.
In order to allow for automatic configuration of various modules in a compilation of NWChem (to control the size of the executable in memory-critical situations), all top-level modules must have a standard interface. Currently it looks like this;
logical function MODULE(rtdb)The argument
rtdb
is the handle for the run-time database. The
function should return .true.
or .false.
on success or failure
respectively.
The only sources of information for a module are the database, or files with names that can be inferred from data in the database or from defaults. Futhermore the naming of database entries is standardized such that:
scf;...;end
block and the
prefix used in the databse is scf
. This is so that the user
can delete all state information using the UNSET
directive.
scf:energy
.
Use of global variables (e.g., common blocks) is generally a bad idea. Such variables break modularity, form hidden dependencies and make code hard to reuse and maintain. Do not use common blocks to pass data between routines.
However, common blocks are very useful in supporting a modular programming style which encourages code reuse and improves maintainability. To this end common blocks can be used to hide data behind a subroutine interface so that access to the common is limited to a few tightly integrated routines. The benefits of using common blocks (smaller argument lists, static data allocation, contiguous memory layout) can thus, with care, be realized without any problems. Examples of this include the basis, geometry, RTDB, integral, symmetry, global array, message passing, SCF, optimizer, input, and MP2 libraries.
To avoid name clashes and for easy identification, prefix all subroutine, function and common block names with the name of the module they are associated with. For instance,
All common block definitions, including typing of variables in the
common, are to be made once only in a single file (a .fh file),
that is included in other source using the C preprocessor. The
include
file should document the meaning of all variables.
This helps ensure that variables in a common block are consistently named and
dependencies of routines on common blocks are easily generated
and maintained.
All include
files should be named using the following conventions,
.fh
for files that can be included only by Fortran routines
.h
for files that can be included by C routines only, or
for files that are included by both C and Fortran routines
A very important distinction hinges on the seemingly trivial difference between
the two different include
forms,
#include "filename"
#include <filename>
"If the filename is quoted, searching for the file typically begins where the source program was found; if it is not found there, or if the name is enclosed inFor this reason, and by common convention, only system-defined<
and>
, searching follows an implementation-defined rule to find the file."
include
files are included using angle brackets. Those include
files that are defined
within an application are included using quotes. The
automatic generation of dependencies of source files upon include
files within NWChem relies upon this convention.
The command implicit none should appear at the top of every routine in the NWChem code. No other implicit statements are permitted and all variables must be explicitly declared. This rule should be religiously observed in new code. It
When integrating existing code, this rule may seem to be more work than it is worth, but several bugs in existing code have been found in this fashion.
REAL*8 is not standard Fortran. DOUBLE PRECISION is the standard, it is usually what you want, it is more portable, and standardization of declarations enables us to perform necessary code transformations more readily.
NWChem uses the ANSI C preprocessor to handle machine dependencies and other conditional compilation requirements. By forcing all C macros to be upper case the code is made more readable and we also avoid potential accidental munging of Fortran source. This practice is consistent with conventional use of the preprocessor in C programs.
This convention is complementary to the above C macro convention. If there are no fully upper-case Fortran tokens then there can be no accidental conflict with the C preprocessor.
So that these critical variables are immediately recognizable, the following conventions are recommended.
Alternatively, you can insert comment lines describing the variables at the point of declaration, if you do not want to follow these conventions.
All references to Fortran I/O units should be done with parameters or
variables instead of hardwired constants. For the ``standard I/O''
units, corresponding to the C stdin
, stdout
, and stderr
, you should
include the file stdio.fh and use the variables luin
,
luout
, and luerr
instead of 5, 6, and 0.
The code uses very few other files, and there is no organized list of parameter names for non-standard I/O units. Users are free to use parameter names that make sense to them, so long as they adhere to the convention. Using parameters rather than hardwired integer constants helps insure that I/O unit designations can be changed easily if needed, and may facilitate moving to a more general convention in a future version of the code.
All modules should understand the PRINT
directive and
accept at least the following keywords for this
none
-- no output whatsoever except for error messages
low
-- minimal output; e.g., title, critical parameters
and a final energy
medium
= default
-- usual output
high
-- extra verbose output
debug
-- anything useful for diagnosing problems
Ideally all applications should control most printing via the print control routines (see Section 7.4). A uniform look and feel is important.
All fatal errors should result in a call to errquit()
(see Section
7.3.2), which prints out the string and status value to both
standard error and standard output and attempts to kill all parallel
processes and to tidy any allocated system resources (e.g., system V
shared memory).
The use of comment lines is strongly recommended in all coding. Commented code is easier to read, and often is easier to debug, maintain, and modify. Liberal use of comments is particularly important in NWChem, since it is used by a large and diverse group of people, it is constantly being modified as capabilities are added and refined, and it has only a limited amount of detailed documentation.
Requirements for in-source documentation are given in detail in Chapter 11 but the general recommendation for comment lines in the code is the more the merrier. At a minimum the source code should be able to provide the following information,
In some circumstances, comments at the top of a routine can be quite lengthy since this is a very good place to store details of the algorithm. Automatic generation of documentation from code comments is being designed, but this will produce useful documentation only if developers write clear and concise commentary in the code as they work.
The following partial listings show examples of minimalist in-source documentation using comment lines. It would not be difficult to say more. The rule of thumb should be "from those who have much, more will be expected". The more important a routine is to a particular algorithm, the more it does in the way of carrying out the solution, the more detailed and voluminous should be it's comment lines.
Example of comments in a simple routine:
logical function bas_numbf(basis,nbf) implicit none integer basis ! [input] basis set handle integer nbf ! [output] number of basis functions * * nbf returns the total number of functions. * Returns true on success, false if the handle is invalid *
Example of comments in a less simple routine:
subroutine sym_symmetrize(geom, basis, odensity, g_a) C$Id: codingsty.tex,v 1.4 1998/12/15 16:22:36 d35162 Exp $ implicit none integer geom, basis ! [input] Handles integer g_a ! [input] Handle to input/output GA logical odensity ! [input] True if matrix is a density c c Symmetrize a skeleton matrix (in a global array) in the c given basis set. c c A <- (1/2h) * sum(R) [RT * (A + AT) * R] c c where h = the order of the group and R = operators of the c group (including the identity) c c Note that density matrices transform according to slightly c different rules to Hamiltonian matrices if components c of a shell (e.g., Cartesian d's) are not orthonormal. c (see Dupuis and King, IJQC 11, 613-625, 1977)
The use of tags/IDs/types on messages is strongly suggested. If all messages with the program have distinct types and the message-passing software forces the types of messages to match between sender and receiver, then there is a way to prove that messages are being sent and received correctly. If they are not, a runtime error will be detected. This is especially important to NWChem since the code makes use of many third party linear algebra libraries that do a lot of message passing.
Modules which do a significant amount of messaging
should reserve a section of the message
ID space for their own use (e.g., GA or PEIGS). Most modules, however, do
only a small amount of messaging. For these, the include
file msgids.fh should be used to reserve individual message IDs. This
file defines Fortran parameters for message IDs used in most NWChem
Hardwired message IDs should not be used in any NWChem routine.
The following bitwise operations (see Table 9.1 for definitions) are the recommended standards for use in NWChem.
ior(i,j)
-- inclusive OR
ieor(i,j)
-- exlusive OR
iand(i,j)
-- AND
not(i)
-- NOT or one's complement
rshift(i,nbits)
-- right shift with zero fill
lshift(i,nbits)
-- left shift with zero fill
|
These operations are readily generated using
in-line functions from most other definitions.
The shift examples in Table 3.1
use an eight bit word written with the most significant bit on the
left.
All operations operate on full integer words (32 or 64 bit as
necessary) and produce integer results. The declarations and any
necessary statement functions are in bitops.fh
. The presence
of data statements makes it impossible to have a single include file
make declarations and define statement functions. To circumvent this
the declarations are in bitops_decls.fh
and the statement
functions are in bitops_funcs.fh
.
At least one machine (the CRAY-T3D) discards all symbols that are not
explicitly referenced, even if other symbols from the same .o
file are used. Thus, BLOCK DATA
subprograms are not linked in.
One fix to this is to declare each BLOCK DATA
subprogram as an
undefined external on the link command, but this makes the link
command depend on the list of modules being built. An alternative
mechanism that works on the T3D is to reference each BLOCK DATA
subprogram in an EXTERNAL
statement within a SUBROUTINE
or
FUNCTION
that is guaranteed to be linked if any reference is to
be made to the COMMON
block being initialized.
This is being redesigned.