next up previous contents
Next: 10. Testing the Program Up: NWCHEM Programmer's Guide, Release Previous: 8. Installing NWChem   Contents

Subsections

9. Developing New Modules and Enhancements

When developing new modules or enhancements to the code, the developer must pay careful attention to design and coding style. This chapter offers guidance on general design requirements and defines coding style rules. In addition, the specific considerations for inserting a new module into the code are presented in detail.


9.1 General Design Guidelines

The complexity of NWChem and the large number of developers working with the code makes it highly advisable to consider very carefully the effect of even minor changes in the code. This is particularly the case when considering changes that may impact the performance of the code in a parallel computing environment. The first, last, and only rule is Think before you code! Then think again. Nothing will be as simple as you thought at first.

However, the code is not likely to develop new capability on its own, so you must do something, sooner or later. The following list of design guidelines should be followed when adding code to NWChem.

  1. Design your code before you start adding code.
  2. Set up a performance model that will effectively estimate the CPU, communication and IO costs.
  3. Use the model to guide the development of the code. Remember that several algorithms may need to be developed, but the programmer should try to develop an algorithm that will scale in CPU, memory and IO.
  4. Use the interfaces and APIs that are defined. DO NOT use any of the lower level routines that are used by the APIs. If you deem it necessary to use a lower level routine, first talk to one of the primary NWChem developers.
  5. When possible and appropriate, think about creating objects instead of just data structures.
  6. If an object is not appropriate, think about creating an API that isolates details from other programmers.
  7. Create well defined modules. When possible create and/or use ``generic'' routines to emphasize reuse of code.
  8. Don't be afraid to ask questions. It is better to ask and move in a sensible direction than to not ask and have to redo some or all of the programming.

Remember, fortune favors the bold, but you will live longer and happier if you consult regularly with other NWChem developers and the NWChem Program Manager.


9.2 Coding Style

In a project this large, it is necessary to impose some standards on the coding style employed by developers. The primary goal of these standards is not to constrain developers, but to enhance both the quality of the final product and its functionality.

Code quality is somewhat subjective, but clearly embraces the ideas of

Compromise is clearly necessary. We are interested in high-performance, so some key kernels may sacrifice readability (but perhaps not modularity) for efficiency, but most code (i.e., 99.9%) is not an inner loop in need of such optimization, as long as the overall structure is correct.

The single most important thing you can do to achieve quality code has little to do with programming style. It is the design -- putting in the necessary thought and effort before even a single line of code is written.

The following subsections present the recommended "Do's and Don'ts" for programming modules and modifications in NWChem. The recommendations are organized by a 'top-down' logic, to reflect the most efficient order in thinking about the various considerations the developer must keep in mind when designing a new piece of code.

9.2.1 Version information

Each source file should include a comment line that contains the CVS revision and date information. This is accomplished by including a comment line containing the string $Id$. CVS substitutes the correct version information each time the file is checked out or updated. These lines are processed from the source and can be output at runtime to aid in bug-tracking.

9.2.2 Standard interface for top-level modules

In order to allow for automatic configuration of various modules in a compilation of NWChem (to control the size of the executable in memory-critical situations), all top-level modules must have a standard interface. Currently it looks like this;

  logical function MODULE(rtdb)
The argument rtdb is the handle for the run-time database. The function should return .true. or .false. on success or failure respectively.

The only sources of information for a module are the database, or files with names that can be inferred from data in the database or from defaults. Futhermore the naming of database entries is standardized such that:

9.2.3 No globally defined common blocks

Use of global variables (e.g., common blocks) is generally a bad idea. Such variables break modularity, form hidden dependencies and make code hard to reuse and maintain. Do not use common blocks to pass data between routines.

However, common blocks are very useful in supporting a modular programming style which encourages code reuse and improves maintainability. To this end common blocks can be used to hide data behind a subroutine interface so that access to the common is limited to a few tightly integrated routines. The benefits of using common blocks (smaller argument lists, static data allocation, contiguous memory layout) can thus, with care, be realized without any problems. Examples of this include the basis, geometry, RTDB, integral, symmetry, global array, message passing, SCF, optimizer, input, and MP2 libraries.

9.2.4 Naming of routines and common blocks

To avoid name clashes and for easy identification, prefix all subroutine, function and common block names with the name of the module they are associated with. For instance,

9.2.5 Inclusion of common block definitions

All common block definitions, including typing of variables in the common, are to be made once only in a single file (a .fh file), that is included in other source using the C preprocessor. The include file should document the meaning of all variables. This helps ensure that variables in a common block are consistently named and dependencies of routines on common blocks are easily generated and maintained.

9.2.6 Convention for naming include files

All include files should be named using the following conventions,

9.2.7 Syntax for including files using the C preprocessor

A very important distinction hinges on the seemingly trivial difference between the two different include forms,

According to Kernighan and Ritchie:
"If the filename is quoted, searching for the file typically begins where the source program was found; if it is not found there, or if the name is enclosed in < and >, searching follows an implementation-defined rule to find the file."
For this reason, and by common convention, only system-defined include files are included using angle brackets. Those include files that are defined within an application are included using quotes. The automatic generation of dependencies of source files upon include files within NWChem relies upon this convention.

9.2.8 No implicitly typed variables

The command implicit none should appear at the top of every routine in the NWChem code. No other implicit statements are permitted and all variables must be explicitly declared. This rule should be religiously observed in new code. It

When integrating existing code, this rule may seem to be more work than it is worth, but several bugs in existing code have been found in this fashion.

9.2.9 Use double precision rather than real*8

REAL*8 is not standard Fortran. DOUBLE PRECISION is the standard, it is usually what you want, it is more portable, and standardization of declarations enables us to perform necessary code transformations more readily.

9.2.10 C macro definitions should be in upper case

NWChem uses the ANSI C preprocessor to handle machine dependencies and other conditional compilation requirements. By forcing all C macros to be upper case the code is made more readable and we also avoid potential accidental munging of Fortran source. This practice is consistent with conventional use of the preprocessor in C programs.

9.2.11 Fortran source should be in lower or mixed case

This convention is complementary to the above C macro convention. If there are no fully upper-case Fortran tokens then there can be no accidental conflict with the C preprocessor.

9.2.12 Naming of variables holding handles/pointers obtained from MA/GA

So that these critical variables are immediately recognizable, the following conventions are recommended.

Alternatively, you can insert comment lines describing the variables at the point of declaration, if you do not want to follow these conventions.

9.2.13 Fortran unit numbers

All references to Fortran I/O units should be done with parameters or variables instead of hardwired constants. For the ``standard I/O'' units, corresponding to the C stdin, stdout, and stderr, you should include the file stdio.fh and use the variables luin, luout, and luerr instead of 5, 6, and 0.

The code uses very few other files, and there is no organized list of parameter names for non-standard I/O units. Users are free to use parameter names that make sense to them, so long as they adhere to the convention. Using parameters rather than hardwired integer constants helps insure that I/O unit designations can be changed easily if needed, and may facilitate moving to a more general convention in a future version of the code.

9.2.14 Use standard print control

All modules should understand the PRINT directive and accept at least the following keywords for this

Ideally all applications should control most printing via the print control routines (see Section 7.4). A uniform look and feel is important.

9.2.15 Error handling

All fatal errors should result in a call to errquit() (see Section 7.3.2), which prints out the string and status value to both standard error and standard output and attempts to kill all parallel processes and to tidy any allocated system resources (e.g., system V shared memory).

9.2.16 Comments

The use of comment lines is strongly recommended in all coding. Commented code is easier to read, and often is easier to debug, maintain, and modify. Liberal use of comments is particularly important in NWChem, since it is used by a large and diverse group of people, it is constantly being modified as capabilities are added and refined, and it has only a limited amount of detailed documentation.

Requirements for in-source documentation are given in detail in Chapter 11 but the general recommendation for comment lines in the code is the more the merrier. At a minimum the source code should be able to provide the following information,

In some circumstances, comments at the top of a routine can be quite lengthy since this is a very good place to store details of the algorithm. Automatic generation of documentation from code comments is being designed, but this will produce useful documentation only if developers write clear and concise commentary in the code as they work.

The following partial listings show examples of minimalist in-source documentation using comment lines. It would not be difficult to say more. The rule of thumb should be "from those who have much, more will be expected". The more important a routine is to a particular algorithm, the more it does in the way of carrying out the solution, the more detailed and voluminous should be it's comment lines.

Example of comments in a simple routine:

  logical function bas_numbf(basis,nbf)
  implicit none
  integer basis   ! [input] basis set handle         
  integer nbf     ! [output] number of basis functions
*
*  nbf returns the total number of functions.
*  Returns true on success, false if the handle is invalid
*

Example of comments in a less simple routine:

      subroutine sym_symmetrize(geom, basis, odensity, g_a)
C$Id: codingsty.tex,v 1.4 1998/12/15 16:22:36 d35162 Exp $
      implicit none
      integer geom, basis  ! [input] Handles
      integer g_a          ! [input] Handle to input/output GA
      logical odensity     ! [input] True if matrix is a density
c
c     Symmetrize a skeleton matrix (in a global array) in the
c     given basis set.
c
c     A <- (1/2h) * sum(R) [RT * (A + AT) * R]
c
c     where h = the order of the group and R = operators of the
c     group (including the identity)
c
c     Note that density matrices transform according to slightly
c     different rules to Hamiltonian matrices if components
c     of a shell (e.g., Cartesian d's) are not orthonormal.
c     (see Dupuis and King, IJQC 11, 613-625, 1977)

9.2.17 Message IDs

The use of tags/IDs/types on messages is strongly suggested. If all messages with the program have distinct types and the message-passing software forces the types of messages to match between sender and receiver, then there is a way to prove that messages are being sent and received correctly. If they are not, a runtime error will be detected. This is especially important to NWChem since the code makes use of many third party linear algebra libraries that do a lot of message passing.

Modules which do a significant amount of messaging should reserve a section of the message ID space for their own use (e.g., GA or PEIGS). Most modules, however, do only a small amount of messaging. For these, the include file msgids.fh should be used to reserve individual message IDs. This file defines Fortran parameters for message IDs used in most NWChem Hardwired message IDs should not be used in any NWChem routine.

9.2.18 Bit operations -- bitops.fh

The following bitwise operations (see Table 9.1 for definitions) are the recommended standards for use in NWChem.


Table 9.1: Effect of Bit Operations

ior ieor iand not lshift rshift
           
110 110 110 10 10111011 10111011
100 100 100   2 bits 2 bits
110 010 100 01 11101100 00101110



These operations are readily generated using in-line functions from most other definitions. The shift examples in Table 3.1 use an eight bit word written with the most significant bit on the left. All operations operate on full integer words (32 or 64 bit as necessary) and produce integer results. The declarations and any necessary statement functions are in bitops.fh. The presence of data statements makes it impossible to have a single include file make declarations and define statement functions. To circumvent this the declarations are in bitops_decls.fh and the statement functions are in bitops_funcs.fh.

9.2.19 Blockdata statements and linking

At least one machine (the CRAY-T3D) discards all symbols that are not explicitly referenced, even if other symbols from the same .o file are used. Thus, BLOCK DATA subprograms are not linked in. One fix to this is to declare each BLOCK DATA subprogram as an undefined external on the link command, but this makes the link command depend on the list of modules being built. An alternative mechanism that works on the T3D is to reference each BLOCK DATA subprogram in an EXTERNAL statement within a SUBROUTINE or FUNCTION that is guaranteed to be linked if any reference is to be made to the COMMON block being initialized.

This is being redesigned.


next up previous contents
Next: 10. Testing the Program Up: NWCHEM Programmer's Guide, Release Previous: 8. Installing NWChem   Contents
Dunyou Wang 2009-03-13