SeqQuest Programming Style Manual
By:
Peter A. Schultz
<
paschul@sandia.gov >
Sandia National Laboratories
Albuquerque, NM 87185
Mission statement
The purpose of this Style Manual is to enforce clarity and consistency
in coding style for SeqQuest development. Clear code is easier to
develop further - rebugging (to add features), and debugging (to find
and fix the bugs that inevitably arise in an actively developed and used
code) - and a consistency of style is a vital aspect of code clarity.
SeqQuest may have many conventions that are idiosyncratic, code that is
not optimally efficient, and coding practices that may not reflect the
latest thought in computer science. It is, however, very consistent in
its application of its idiosyncratic conventions, it is highly portable,
scales well, and, key for the future viability of the code, is easily
read, understood, and modified. SeqQuest has its own distinctive style,
that will be different from any other code, as all other codes will be
different from each other. Any changes to SeqQuest should respect the
coding conventions that already exist, so that a future developer will
continue to have a code as consistent, accessible, and understandable as
the current version. Mimic the patterns you see in the code, to the
extent possible. This manual will codify many of these conventions,
and describe the basic rules to be followed in development of the
current version of SeqQuest.
Outline
Overview
SeqQuest is a code to do electronic structure calculations within the
density functional approximation and pseudopotentials, using contracted
Gaussian basis sets. SeqQuest is written in Fortran 77, and a very
vanilla, portable f77. SeqQuest has a very "flat" structure, with a
single main program that does very little computation itself. The
main program manages memory and program flow, and calls a sequence of
(shallow) subroutines to do the computational work. Data is
communicated via passed argument lists, not common blocks. Those
familiar with BLAS/LAPACK libraries will recognize the style. There
is one large central workspace in the main program, wk(maxwkd),
from which all significant storage is taken, and maxwkd is set
in the parameter file. There is one main input file, and one main user
output file, and a multitude of larger temporary binary files the
program uses to do its work.
The Laws
- Consistency and clarity of code are paramount. Only on very rare
occasions can clarity be subordinated to other considerations.
You will not see those occasions.
- Conform to the existing style style in the code, not your own,
not the latest fad in computer science. Consistency is the
key to the long-term viability of any code. Mixing styles is fatal.
- Clean up any new code. Changes are NOT done when they work, they
are done when they work AND they are clean.
- The language is FORTRAN 77, and a very vanilla f77.
- "fixed form" 72-character lines (i.e. NO "extended source" a la f90)
- No machine dependent code.
- No dynamic memory allocations, no pointers, no structures,
no "clever" coding where a plainer implementation will serve.
- Sacrifice performance if necessary.
- Variable declarations will be consistent with IMPLICIT DOUBLE PRECISION (a-h,o-z).
Apologies to all the strong typing types, but that's the rules.
All integer variable names will begin with i-n,
and no variables beginning with i-n will be reals.
All real variable names will begin with a-h,o-z),
and no variables beginning with a-h,o-z will be integers.
Use of IMPLICIT NONE is permitted/encouraged in new routines,
but all variable types will conform to the above naming conventions.
- NO include statements. SeqQuest has only one include statement,
in the main program to bring in dimension parameters, and there
will be no other include statements anywhere in the code.
- NO creation of O(N**2) arrays anywhere in the program, where O(N)
parameters are: natmd, norbd, nkd, nlatd. Space for all such
large arrays will be taken from central workspace in main program,
an array called wk().
- Original declaration of O(N) arrays occurs in main program only.
None of dimensioning parameters in the parameter file may be
used to create arrays in any subroutine, i.e., all O(N) arrays
used in subroutines must be passed in from main program.
- NO new common blocks.
Data is to be communicated to subroutines via passed arguments.
- NO file unit numbers done by explicit integers: all file
access shall be via integer variables. E.g. "write(6,fmt)" must
be replaced by "write(IWR,fmt)", where IWR is an integer
variable that has been set to refer to this file.
- NO direct use of Fortran OPEN or CLOSE. All file manipulation will be
done using FLxxxx routines (that manage unit numbers).
- NO passed real constants. E.g., "call FOO(1.0)" should be
written as "call FOO( one )" where "one" is a real
variable set earlier in the code. Passing integers is ok.
- NO statement functions, beyond the few that already exist.
- Comments only on lines beginning with "c", not "!" or other
funky characters. White space should begin with "c", i.e. no
completely blank lines within a routine.
- NO tabs; indenting is done using spaces.
- Test thoroughly.
- Only those who write the rules can change the rules.
The Basics
Many of the conventions described in this section will be apparent
from inspecting the code itself. Hence, the best approach is, to
the extent possible, use existing code as a template/model for new
development. This section will highlight some of the less obvious
conventions, and reiterate the more important ones. Details will
be expanded on in the next section.
- Document carefully. Describe the purpose of larger modules, and
explain complicated coding sections that may not be obvious to
someone looking at the code years from now after you have left to
go on to bigger and better things. Focus on the why (and why not).
- Use generic calls to library routines where possible. There are
exceptions. Use DERF/ERF explicitly as there are no safe generics
for the error function, and IABS for integers and ABS for reals.
- Avoid using complex variables. Use doubly dimensioned reals
instead where practical.
- Avoid using ENTRY except in special cases.
- All I/O of longer records (orbital matrices and grid fields) is
done through special "big" routines that break the records into smaller
pieces. This is because write/read of long records can be faulty
on some machines.
- Do not use BACKSPACE().
- Avoid using explicit constants in the executable code. Use named
constants (e.g. "zero", "half", "one") instead that have been set in
declarations via parameter or data statements.
- Use careful indenting to format source code (two spaces).
- When writing data to output listing file, always include descriptive
statement in the output line. Something that can be found with a "grep".
- Looping is by do ... continue or do ... enddo only.
NO do while loops.
- Line continuation character is a "$"
- Labels should be properly sequenced, except for emphasis.
- All labels are right justified, NOT left justified.
- Use "call MKZERO( len, arr )" to clear (set to zeroes) an array
"arr(len)"
- Branching by if/elseif/else/endif preferred.
- Avoid compound calls. E.g., avoid "call FOO( BAR( x ) )"
Conventions
The program does its internal work in:
- Rydberg energy units (1 Ry = 0.5 Hartree = 13.605 eV)
- Bohr distance units (1 bohr = 0.52918 Ang = 0.052918 nm)
The main program concentrates on memory management and flow control.
It calls a sequence of subroutines, controlled by various flags, and
manages the memory required by the large arrays used by SeqQuest.
The main routine is special, and looks very different
from any subroutine.
- Computation should be in subroutines, not the main routine.
- do-loops are heartily discouraged in the main routine.
- write statements are self-documenting (no orphaned numbers in output!).
- Memory is tightly controlled
- File open/close done using FLxxxx routines, NOT the fortran
primitive OPEN/CLOSE.
- Unit numbers for file i/o are named integer variables.
No undirected write(*,*) and no write(14,*).
- All data communicated to subroutines through passed arguments
The principal limiting factor in SeqQuest is usually the amount of
memory needed to run a problem, rather than the amount of time it
takes to run a problem. I.e., SeqQuest is more memory-bound than
cpu-bound. Hence, the use of memory is very tightly controlled in
SeqQuest.
All large-scale memory is taken from a single large workspace in
wk(maxwkd), where maxwkd is a parameter set in
the main parameter file, using pointer-like integers. The routine
WKMEM is the most important routine in the code: it partitions
memory within wk(), and checks for memory sufficiency FOR THE ENTIRE
CODE. Hence, WKMEM should be consulted before attempting any use of
space within wk().
The wk() array is sectioned into pieces at the beginning of the code by
pointer-like integer variables i01-i12, with spacing dictated by the size
of big arrays. The first four, i01-i04, reserve enough space for either
orbital matrices (nmat=nk*norb**2) or grid fields (nptr=n1r*n2r*n3r).
(NB: this assumption is subject to change, check WKMEM for latest).
The last eight spaces, i05-i12, only guarantee enough space for grid
fields. Each of these spaces may be used as temporary space,
and the intermittent comments "MEM" enumerate the contents of all
the active "pointers" at that point in the code.
- No dynamic memory.
- Orbital matrices/grid fields must come from main workspace wk() and
be given descriptive local names (and documented in main program).
- O(N) work arrays should be passed from wksmX() in main program.
- Memory is tightly scheduled - be careful not to overstep boundaries.
The user listing output file should be self-documenting. It is not just
the source code, but the output file the user sees which needs to have
documentation in order to be readable and understandable by humans.
I/O to all files is carefully structured to try and reduce the size
of files, and to make data more easily accessible to the program without
churning up disk. The FLxxxx routines have been set up to manage files.
They allocate available (free) unit numbers (unit numbers are not to be
hard-coded!), connect to correct directory structure (as needed), and
complete the file names.
- Files will be explicitly opened using FLxxxx routines, not a
fortran OPEN/CLOSE.
- I/O is always to named units (IRD,IWR,isetdat,ianalyz, etc.) and
not to explicit numbers. I.e. write(6,*) is forbidden. Sole
exception: exception/error handling in a deep routine that does not
have IWR locally can write diagnostics using "write(*,fmt)".
- NO "print" statements, use "write" instead.
- standard input unit is IRD [from call FLGETIRD( IRD )].
- standard output unit is IWR [from call FLGETIWR( IWR )].
- Format labels are 8xxx for a read, 9xxx for a write.
- A write to the user output files should be carefully formatted and
documented with a descriptive (preferably uniquely "grep-"able
string) comment. For example,
write(IWR,*) engy
should be replaced with something more descriptive:
write(IWR,9020) 'TOTAL energy (Ry) =',engy
9020 format(1x,a,f20.10)
- The preference is to do I/O from the top-level program, rather
than called routines.
- Read/write of long records (grid fields, orbital matrices)
should be done through a special set of routines in the code
(WRITBIG/READBIG/BACKBIG/READSKP) that break up the long
record into smaller pieces (The read/write of long records
has been a problem on some Alpha-based machines).
Variable names
outline
Consistent naming conventions are important to being able to follow
code easily. The style used in SeqQuest is rather old-fashioned
fortran: short, but descriptive names, with few underscores and using
lower case (very specific exceptions are all upper case). The virtue
is that it leads to code, that while quite dense, is rather readable.
In general, single character or double character variable names are
used only as very local temporary variables (notable exception is
that "i" and "j" are used frequently to index basis functions in the
big orbital matrices), such as the loop index in a very short do-loop.
Keep names short, to preserve code-space on a fortran 72-character
line, and to conform to naming conventions used in other routines.
- All lower case.
- Restrict to eight characters or less (this convention may be
relaxed at a future time, but, for now, keep them short).
- All variables with names beginning with i-n are integers, and v.v.
Exception: logicals may begin with a capital "L".
- Dimensioning parameters end in "d", e.g. nkd, norbd, natmd.
- For passed variables, use the same name in the subroutine as was
used in the calling routine, except for big arrays from main
workspace wk(), which should be given a descriptive local name.
- Local variables will have the same names, where available, as they
do in other routines. E.g., iatm and jatm are used to index atoms,
not ia and ja, because the former are conventionally used elsewhere
in the code. Always best to use existing routines as a model.
- No underscore characters, except for logicals.
- Logicals should have names that lead to descriptive use in an "if"
statement. E.g. "if( do_neb )then" or "if( EVEN )then".
Upper/lower case
outline
- Upper case used for:
- subroutine and functions names (call FOO( x ))
- file manipulations (e.g.,REWIND)
- declarations (DIMENSION/DATA/PARAMETER/CHARACTER, etc.)
- routine exits (STOP/RETURN/END)
- standard unit numbers (IRD,IWR being standard in/out)
- first character "L" in logical variable ("Loptall")
- Lower case used:
Subroutines/function calls
outline
- Subroutine and function names are all upper case.
- Subroutine and function names try to be eight characters or less.
- Use generics for math calls where possible. For the error
function, there is not a safe generic, so use ERF or DERF as
necessary. Also, IABS and ABS to distinguish integer encouraged.
- Use "call MKZERO( len, arr )" to clear array arr(len).
- Use explicit arithmetic for 3-vectors and 2-vectors (complex)
rather than function/subroutine calls. E.g., prefer this:
adotb = a(1)*b(1) + a(2)*b(2) + a(3)*b(3)
rather than
adotb = DDOT( 3, a,1, b,1 )
- Communicate data through passed arguments rather than common.
- Avoid compound calls. E.g., replace "call FOO( BAR(x) )"
with "call FOO( barx )" where "barx = BAR( x )".
- Should be space on the inside of parentheses of arguments.
Routine structure
outline
Subroutine and function source code is very highly structured. Use
an existing routine as a template if building a new routine, as it is
easier to ensure conformance to style that way than to build on the
basis of what will be necessarily incomplete instructions about style.
The idea is to make all the routines look as similar to one another as
possible, and, therefore, by inspection, notice bugs because patterns
are violated. Deviation from any pattern should be done only with
compelling reason, and should be documented.
- The typical routine has the following elements:
- name banner (five comments lines, middle line >>>>> {name}
- the routine declaration: subroutine/function FOO( .... )
- main documentation section: purpose/author/revision history
- more detailed comments/notes/variable descriptor comments
- IMPLICIT DOUBLE PRECISION (a-h,o-z)
or IMPLICIT NONE
- variable declarations, by kind (input/output/scratch/local).
- DATA declarations
- statement functions (HIGHLY discouraged)
- c >>>> EXECUTABLE CODE:
- executable part of code
- c That's all, Folks!
- RETURN/END
- These elements are formatted in detail, use an existing routine
for a model.
- Passed arguments take the same name as in the calling routine,
except that big arrays passed from big workspace "wk()" in
main routine should be given names descriptive of their use.
- I/O units should be passed first, dimension variables next,
data arrays next, and arrays from workspace wk() last. Try
to conform to order seen in other calls. Arrays out of workspace
should be documented in calling program as to name and type (i-input,
o-output, s-scratch). E.g.:
call VESSLO( IWR, ibndpot, n1r,n2r,n3r,nptr, weight,
$ ws(is1),ws(is2),ws(is3),ftarray,
$ wk(i11), wk(i06), wk(i08) )
c --> rhoslo-io espot-2os gvecmag-i
where i11=slow density, i06=electrostatic pot'l, i08=g-vectors mags.
- RETURN from anywhere other than the end of a routine is discouraged.
Routine families
outline
Certain sets of subroutines have similar internal structure, and
code is written to emphasize the similarities where possible.
Two major distinct families exist. First, the analytic "two-center"
(SIJ=overlap, TIJ=kinetic, FRC2CTR=forces) and "three-center" routines
(VLOCMAT, VLOCMII, VLOCFRC, NLOCMAT, NLOCFRC) for local and non-local
integrals share much internal structure in common. For example,
label numbers are used to emphasize the similarities between routines
rather than to have strict numerical ordering within routines.
Second, the grid matrix element routines (GRDOVLP, GRIDRHO, VSLOMAT,
VSLOFRC, ESLOFRC) share much common internal structure. There are
smaller sets of similar routines. If developing a new routine,
try to follow model of existing routine in a family.
Looping should be done through do-loops, rather than do-while or
if/goto constructs or other code construct.
- Loops are via do/continue or do/enddo, not do while
- Do NOT pass loop index into subroutine, e.g.,
do i=1,n
call VERYBAD( i )
enddo
is to be avoided at all costs. Some optimizing compilers will
get this wrong.
- No compound label-sharing loops. E.g.,
do 10 j=1,n
do 10 i=1,n
s1
10 continue
should be written as two loops:
do 20 j=1,n
do 10 i=1,n
s1
10 continue
20 continue
- do/enddo used for short loops without jumps to outside of loops.
- labelled loops otherwise (long loops, branches, complicated).
This is fine:
do i=1,n
s1
enddo
But, this should be changed:
do i=1,n
if( foobar ) goto 11
enddo
11 continue
to a labelled loop:
do 10 i=1,n
if( foobar ) goto 11
10 continue
11 continue
- Note that labels are right-justified, and two-space indenting.
- Note that exception goto label is indexed 11 to the loop's 10.
- Labelled do-loop ends with "continue" statement, not active statement.
Use of labels is highly conventional, and there are some special
numbers to respect. Code clarity is the goal. "Big numbers" such
as 1000,2000 or 100,200, etc should be use to denote "important"
branch points or do-loops, with smaller increments for less
important loops/branch points. Use labels to highlight code,
not simply to denote sequence.
- Labels are right-justified, NOT left-justified.
- Special label numbers reserved for specific purposes:
- 8xxx reserved for "read" format statements
- 9xxx reserved for "write" format statements
- "13" reserved for error/exception handling (e.g., 13xx, xx13).
- 999 reserved for final jump location before routine exit
- 99x reserved for exit with some final processing
- Labels should be in numerical order, but with special treatment
of I/O labels, and other exceptional labels.
- read/write labels 8xxx/9xxx should be in numerical order as well,
where the "xxx" suggests the location within routine.
Consistent indenting leads to more readable code. Everyone has their
own conventions; the convention for SeqQuest is as follows:
- Code indent is by TWO spaces.
- NO tabs
- Open the first line after a "do" statement, close last line before
the final continue in a do-loop:
do 10 i=1,n
s1
...
10 continue
- Open within the branch points of an if-block:
if( log1 )then
s1
elseif( log2 )then
s2
else
s3
endif
- Indent in continuation is one space:
call FOO( arga,argb, ... ,argn,
$ argo, ... ,argz )
- Note: continuation character is a "$"
- On (very) rare occasions, loops get deep enough that there is
little code space left on a fortran 72-character line. In those
cases, forego indenting in top level loops, or, better yet, try
to use subroutine calls. Use good judgment, with code clarity
being the goal rather than mindless adherence to a rule.
Use of space in source code is highly conventional, and is designed
for ease in reading code. While not strictly obeyed, the conventions
are followed rather closely barring some compelling reason not to.
The following lists common cases, but as always, it is better to
inspect existing code and conform to conventions seen in the code.
- Two space indenting for code, one space for continuation
- One space inside (but not outside) parentheses around routine
arguments: "call FOO( bar )" and not "call FOO (bar)".
- One space inside (but not outside) if-statements:
"if( log )then" and not "if (log) then"
- However, "if( log ) s1", when not an if-block.
- Single space around labels in do-statement: "do 10 i=1,n"
- Two spaces inside no-label do-statement: "do^^i=1,n"
- Eliminate internal spaces in fortran terms, i.e.,
USE: / goto / enddo / elseif / endif /,
NOT: / go to / end do / else if / end if /.
- One space on either side of "=",
e.g. "a = b", not "a=b"
EXCEPT for do-statement, where there are no spaces: "do i=1,n"
- One space either side of "+" or "-": "a = b + c"
- No space around "*" or "/", except for emphasis:
"a = b*c"
- Space after read/write: "read(IRD,8000) label"
- No space around array index: "wk(i01)"
- When in doubt, check existing code for examples.
- These rules can be bent when confronted with the 72-character line
limit in fortran. However, preference is to break into a continued
line rather than trying to squeeze everything onto one line.
Return to Top
Send questions and comments to:
Peter Schultz
at
paschul@sandia.gov
Last updated:
December 15, 2007