Next: 10. Testing the Program Up: NWCHEM Programmer's Guide, Release Previous: 8. Installing NWChem Contents

Subsections

9. Developing New Modules and Enhancements

When developing new modules or enhancements to the code, the developer must pay careful attention to design and coding style. This chapter offers guidance on general design requirements and defines coding style rules. In addition, the specific considerations for inserting a new module into the code are presented in detail.

9.1 General Design Guidelines

The complexity of NWChem and the large number of developers working with the code makes it highly advisable to consider very carefully the effect of even minor changes in the code. This is particularly the case when considering changes that may impact the performance of the code in a parallel computing environment. The first, last, and only rule is Think before you code! Then think again. Nothing will be as simple as you thought at first.

However, the code is not likely to develop new capability on its own, so you must do something, sooner or later. The following list of design guidelines should be followed when adding code to NWChem.

Design your code before you start adding code.
Set up a performance model that will effectively estimate the CPU, communication and IO costs.
Use the model to guide the development of the code. Remember that several algorithms may need to be developed, but the programmer should try to develop an algorithm that will scale in CPU, memory and IO.
Use the interfaces and APIs that are defined. DO NOT use any of the lower level routines that are used by the APIs. If you deem it necessary to use a lower level routine, first talk to one of the primary NWChem developers.
When possible and appropriate, think about creating objects instead of just data structures.
If an object is not appropriate, think about creating an API that isolates details from other programmers.
Create well defined modules. When possible create and/or use ``generic'' routines to emphasize reuse of code.
Don't be afraid to ask questions. It is better to ask and move in a sensible direction than to not ask and have to redo some or all of the programming.

Remember, fortune favors the bold, but you will live longer and happier if you consult regularly with other NWChem developers and the NWChem Program Manager.

9.2 Coding Style

In a project this large, it is necessary to impose some standards on the coding style employed by developers. The primary goal of these standards is not to constrain developers, but to enhance both the quality of the final product and its functionality.

Code quality is somewhat subjective, but clearly embraces the ideas of

correctness
maintainability
efficiency
readability
re-usability
modularity
ease of integration with other packages
speed of development
density of bugs
ease of debugging
detection of errors at run time
exposure of available functionality
ease-of-use of the API

Compromise is clearly necessary. We are interested in high-performance, so some key kernels may sacrifice readability (but perhaps not modularity) for efficiency, but most code (i.e., 99.9%) is not an inner loop in need of such optimization, as long as the overall structure is correct.

The single most important thing you can do to achieve quality code has little to do with programming style. It is the design -- putting in the necessary thought and effort before even a single line of code is written.

The following subsections present the recommended "Do's and Don'ts" for programming modules and modifications in NWChem. The recommendations are organized by a 'top-down' logic, to reflect the most efficient order in thinking about the various considerations the developer must keep in mind when designing a new piece of code.

9.2.1 Version information

Each source file should include a comment line that contains the CVS revision and date information. This is accomplished by including a comment line containing the string $Id$. CVS substitutes the correct version information each time the file is checked out or updated. These lines are processed from the source and can be output at runtime to aid in bug-tracking.

9.2.2 Standard interface for top-level modules

In order to allow for automatic configuration of various modules in a compilation of NWChem (to control the size of the executable in memory-critical situations), all top-level modules must have a standard interface. Currently it looks like this;

  logical function MODULE(rtdb)

The argument rtdb is the handle for the run-time database. The function should return .true. or .false. on success or failure respectively.

The only sources of information for a module are the database, or files with names that can be inferred from data in the database or from defaults. Futhermore the naming of database entries is standardized such that:

The string with which database entries are prefixed must be lowercase and match the module name used in the input. E.g., input for the SCF module appears in the scf;...;end block and the prefix used in the databse is scf. This is so that the user can delete all state information using the UNSET directive.
Common quantities (such as energy, gradient, ...) should be stored using that name. E.g., scf:energy.

9.2.3 No globally defined common blocks

Use of global variables (e.g., common blocks) is generally a bad idea. Such variables break modularity, form hidden dependencies and make code hard to reuse and maintain. Do not use common blocks to pass data between routines.

However, common blocks are very useful in supporting a modular programming style which encourages code reuse and improves maintainability. To this end common blocks can be used to hide data behind a subroutine interface so that access to the common is limited to a few tightly integrated routines. The benefits of using common blocks (smaller argument lists, static data allocation, contiguous memory layout) can thus, with care, be realized without any problems. Examples of this include the basis, geometry, RTDB, integral, symmetry, global array, message passing, SCF, optimizer, input, and MP2 libraries.

9.2.4 Naming of routines and common blocks

To avoid name clashes and for easy identification, prefix all subroutine, function and common block names with the name of the module they are associated with. For instance,

rtdb_... -- run-time database
ma_... -- memory allocator
ga_... -- global array
scf_... -- SCF
stpr_... -- Stepper (geometry optimization)

9.2.5 Inclusion of common block definitions

All common block definitions, including typing of variables in the common, are to be made once only in a single file (a .fh file), that is included in other source using the C preprocessor. The include file should document the meaning of all variables. This helps ensure that variables in a common block are consistently named and dependencies of routines on common blocks are easily generated and maintained.

9.2.6 Convention for naming `include` files

All include files should be named using the following conventions,

Use .fh for files that can be included only by Fortran routines
Use .h for files that can be included by C routines only, or for files that are included by both C and Fortran routines

9.2.7 Syntax for including files using the C preprocessor

A very important distinction hinges on the seemingly trivial difference between the two different include forms,

#include "filename"
#include <filename>

According to Kernighan and Ritchie:

"If the filename is quoted, searching for the file typically begins where the source program was found; if it is not found there, or if the name is enclosed in < and >, searching follows an implementation-defined rule to find the file."

For this reason, and by common convention, only system-defined include files are included using angle brackets. Those include files that are defined within an application are included using quotes. The automatic generation of dependencies of source files upon include files within NWChem relies upon this convention.

9.2.8 No implicitly typed variables

The command implicit none should appear at the top of every routine in the NWChem code. No other implicit statements are permitted and all variables must be explicitly declared. This rule should be religiously observed in new code. It

lets the compiler help you find typos and other errors
makes the code more readable and more maintainable
provides a natural point to document arguments and local variables
makes silly variable names like iii, ii1 both obvious and even more embarrassing when others catch you doing it

When integrating existing code, this rule may seem to be more work than it is worth, but several bugs in existing code have been found in this fashion.

9.2.9 Use `double precision` rather than `real*8`

REAL*8 is not standard Fortran. DOUBLE PRECISION is the standard, it is usually what you want, it is more portable, and standardization of declarations enables us to perform necessary code transformations more readily.

9.2.10 C macro definitions should be in upper case

NWChem uses the ANSI C preprocessor to handle machine dependencies and other conditional compilation requirements. By forcing all C macros to be upper case the code is made more readable and we also avoid potential accidental munging of Fortran source. This practice is consistent with conventional use of the preprocessor in C programs.

9.2.11 Fortran source should be in lower or mixed case

This convention is complementary to the above C macro convention. If there are no fully upper-case Fortran tokens then there can be no accidental conflict with the C preprocessor.

9.2.12 Naming of variables holding handles/pointers obtained from MA/GA

So that these critical variables are immediately recognizable, the following conventions are recommended.

handles obtained from MA should be prefaced with l_
pointers (into dbl_mb(), etc.) obtained from MA should be prefaced with k_
handles obtained from GA should be prefaced with g_

Alternatively, you can insert comment lines describing the variables at the point of declaration, if you do not want to follow these conventions.

9.2.13 Fortran unit numbers

All references to Fortran I/O units should be done with parameters or variables instead of hardwired constants. For the ``standard I/O'' units, corresponding to the C stdin, stdout, and stderr, you should include the file stdio.fh and use the variables luin, luout, and luerr instead of 5, 6, and 0.

The code uses very few other files, and there is no organized list of parameter names for non-standard I/O units. Users are free to use parameter names that make sense to them, so long as they adhere to the convention. Using parameters rather than hardwired integer constants helps insure that I/O unit designations can be changed easily if needed, and may facilitate moving to a more general convention in a future version of the code.

9.2.14 Use standard print control

All modules should understand the PRINT directive and accept at least the following keywords for this

none -- no output whatsoever except for error messages
low -- minimal output; e.g., title, critical parameters and a final energy
medium = default -- usual output
high -- extra verbose output
debug -- anything useful for diagnosing problems

Ideally all applications should control most printing via the print control routines (see Section 7.4). A uniform look and feel is important.

9.2.15 Error handling

All fatal errors should result in a call to errquit() (see Section 7.3.2), which prints out the string and status value to both standard error and standard output and attempts to kill all parallel processes and to tidy any allocated system resources (e.g., system V shared memory).

9.2.16 Comments

The use of comment lines is strongly recommended in all coding. Commented code is easier to read, and often is easier to debug, maintain, and modify. Liberal use of comments is particularly important in NWChem, since it is used by a large and diverse group of people, it is constantly being modified as capabilities are added and refined, and it has only a limited amount of detailed documentation.

Requirements for in-source documentation are given in detail in Chapter 11 but the general recommendation for comment lines in the code is the more the merrier. At a minimum the source code should be able to provide the following information,

terse comments at the top of each subroutine to describe (accurately!) its function,
documentation of dependencies/effects on state that are not passed directly through its argument list (e.g., files, the database, common blocks)
descriptions of all arguments, including the flow of information (i.e., label arguments as input or output, or input-output)
documentation of local variables with functions that are not apparent from their names, or which have an algorithmic role that is opaque or obscure

In some circumstances, comments at the top of a routine can be quite lengthy since this is a very good place to store details of the algorithm. Automatic generation of documentation from code comments is being designed, but this will produce useful documentation only if developers write clear and concise commentary in the code as they work.

The following partial listings show examples of minimalist in-source documentation using comment lines. It would not be difficult to say more. The rule of thumb should be "from those who have much, more will be expected". The more important a routine is to a particular algorithm, the more it does in the way of carrying out the solution, the more detailed and voluminous should be it's comment lines.

Example of comments in a simple routine:

  logical function bas_numbf(basis,nbf)
  implicit none
  integer basis   ! [input] basis set handle         
  integer nbf     ! [output] number of basis functions
*
*  nbf returns the total number of functions.
*  Returns true on success, false if the handle is invalid
*

Example of comments in a less simple routine:

      subroutine sym_symmetrize(geom, basis, odensity, g_a)
C$Id: codingsty.tex,v 1.4 1998/12/15 16:22:36 d35162 Exp $
      implicit none
      integer geom, basis  ! [input] Handles
      integer g_a          ! [input] Handle to input/output GA
      logical odensity     ! [input] True if matrix is a density
c
c     Symmetrize a skeleton matrix (in a global array) in the
c     given basis set.
c
c     A <- (1/2h) * sum(R) [RT * (A + AT) * R]
c
c     where h = the order of the group and R = operators of the
c     group (including the identity)
c
c     Note that density matrices transform according to slightly
c     different rules to Hamiltonian matrices if components
c     of a shell (e.g., Cartesian d's) are not orthonormal.
c     (see Dupuis and King, IJQC 11, 613-625, 1977)

9.2.17 Message IDs

The use of tags/IDs/types on messages is strongly suggested. If all messages with the program have distinct types and the message-passing software forces the types of messages to match between sender and receiver, then there is a way to prove that messages are being sent and received correctly. If they are not, a runtime error will be detected. This is especially important to NWChem since the code makes use of many third party linear algebra libraries that do a lot of message passing.

Modules which do a significant amount of messaging should reserve a section of the message ID space for their own use (e.g., GA or PEIGS). Most modules, however, do only a small amount of messaging. For these, the include file msgids.fh should be used to reserve individual message IDs. This file defines Fortran parameters for message IDs used in most NWChem Hardwired message IDs should not be used in any NWChem routine.

9.2.18 Bit operations -- `bitops.fh`

The following bitwise operations (see Table 9.1 for definitions) are the recommended standards for use in NWChem.

ior(i,j) -- inclusive OR
ieor(i,j) -- exlusive OR
iand(i,j) -- AND
not(i) -- NOT or one's complement
rshift(i,nbits) -- right shift with zero fill
lshift(i,nbits) -- left shift with zero fill

Table 9.1: Effect of Bit Operations

ior	ieor	iand	not	lshift	rshift

110	110	110	10	10111011	10111011
100	100	100		2 bits	2 bits
110	010	100	01	11101100	00101110

These operations are readily generated using in-line functions from most other definitions. The shift examples in Table 3.1 use an eight bit word written with the most significant bit on the left. All operations operate on full integer words (32 or 64 bit as necessary) and produce integer results. The declarations and any necessary statement functions are in bitops.fh. The presence of data statements makes it impossible to have a single include file make declarations and define statement functions. To circumvent this the declarations are in bitops_decls.fh and the statement functions are in bitops_funcs.fh.

9.2.19 Blockdata statements and linking

At least one machine (the CRAY-T3D) discards all symbols that are not explicitly referenced, even if other symbols from the same .o file are used. Thus, BLOCK DATA subprograms are not linked in. One fix to this is to declare each BLOCK DATA subprogram as an undefined external on the link command, but this makes the link command depend on the list of modules being built. An alternative mechanism that works on the T3D is to reference each BLOCK DATA subprogram in an EXTERNAL statement within a SUBROUTINE or FUNCTION that is guaranteed to be linked if any reference is to be made to the COMMON block being initialized.

This is being redesigned.

Next: 10. Testing the Program Up: NWCHEM Programmer's Guide, Release Previous: 8. Installing NWChem Contents

Dunyou Wang 2009-03-13