Phoenix Issues, Problems, Workarounds
This page is intended to keep users informed of current problems,
issues, and work-arounds on Phoenix. It will be updated regularly.
Issues
Resolved Issues
Issues
Although we have set up HDF5 and netcdf modules on the X1, the
ftn compiler cannot find the "Fortran modules" included in those
packages. This is not a problem if you are not "use"ing a Fortran
module.
The only workaround at this time is to add the "-em -p<path>"
options to your command line where <path> is something you
need to figure out. You can do a "module display <module>"
to see information about the module you are using - namely the path.
And then use the appropriate path with the "-p" option.
Even with this limitation, loading these modules is encouraged if
you need to use these libraries because you will link against the
appropriate SSP or MSP libraries automatically just by using the
-hssp switch or not. As well as other environment variables being
set properly.
Added Mar 24, 2005.
It has been observed that there is relatively large procedure call
overhead. If you have a procedure call inside a loop, you may witness
a large penalty.
The best way to deal with this is to have the compiler inline the
procedure if at all possible, or inline it yourself. Alternatively,
you could push the loop down into the procedure call.
Added Oct 28, 2003.
The external32 format in MPI I/O does not work properly on the X1.
The workaround is to use native.
Added Oct 28, 2003.
Occasionally runtime bus errors are generated when a code is
recompiled. The error can be corrected by removing the
current executable and relinking or recompiling the code.
The error can be avoided by always removing the existing
executable before recompiling or relinking the code.
Resolved
There is a ten minute time limit on "true" interactive runs.
When you reach this limit currently, there was no error message
to indicate the run was killed. Rather you got error messages
like "segmentation fault" or "memory fault" which might have led one
to believe there is a problem with the code, when there might
not be.
This is now fixed and says something to the effect
./a.out.[13]: 101087 Exceeded CPU time limit
But note this is mixed in with other messages like segmentation
fault or memory fault or even with the stack trace (if you set
TRACEBK).
And if you need more time than 10 minutes,
then do a "qsub -I -lwalltime=2:00:00,mppe=<x>"
to get a longer interactive job with <x> processors.
Added Nov 18, 2004. Resolved March 2005.
It has been observed that the 5.3.0.1 and 5.2.0.x C compiler
does not work when using -hlist and very long path names.
This is fixed in 5.3.0.2. It can be "worked around" with the
older compilers by not using -hlist.
Added Feb 18, 2005.
It has been observed that the 5.2 Fortran compiler does not
always produce correct multistreaming code. Symptoms have been
incorrect answers and MSYNC errors. The problem appears in
5.2.0.1 to 5.2.0.5. The 5.2.0.6 version is much better.
The best way to deal with this is to switch PrgEnv levels
via "module swap PrgEnv PrgEnv.5301" or to switch
cftn levels via "module switch cftn cftn.5301". Or to go
back to PrgEnv.5105.
Added Sep 13, 2004. Updated Feb 17, 2005.
The 2.3 Unicos/mp has a bug where page-faults due to system interrupts
(like compiling on the OS node) force page-faults on the application nodes
thereby slowing down running applications.
This is to be fixed in Unicos/mp 2.4.x where x > 0 - probably sometime
after Jan 1, 2004.
Added Oct 28, 2003.
It has been identified that the PE can not deal with the large
INODE numbers associated with the 4tb workspace. This may prevent
you from being able to compile or link your code especially if you
have more than 1 path from which to inlude files. It may also affect
pat_build and pat_report. If you encounter this, and doing these
operations in your home space is not a viable option, please let
consult@ccs.ornl.gov know.
Fixed when PE 5.2 was installed.
Added Feb 16, 2004. Updated May 4, 2004.
Under certain circumstances PZGETRF might hang. A fix has been
successfully tested, but not put into a PE yet.
Fixed when a new libsci installed.
Added 16, 2004. Updated May 4, 2004.
Under certain circumstances PZSWAP might hang. A fix has been
successfully tested, but not put into a PE yet.
Fixed when a new mpt was installed.
Added 16, 2004. Updated May 4, 2004.
There are many reported cases of core dumps when the -Omodinline
compiler option is used. The only workaround at this time is to reduce
the optimization to -Oinline0. This workaround lets one proceed on
the compilation, but leave much to be desired with respect to optimization.
Fixed in PE 5.2.
Added Oct 28, 2003. Updated May 4, 2004.
The BLACS gridmap function did not work properly.
This was fixed.
Added Oct 28, 2003. Resolved late 2004.
When the -eo option is used in a PBS batch script to merge
standard out and standard error into a single file, the standard
error file is lost and only the standard out file is retained.
When using "scp" to tranfer files, the file transfer rates are
slow. You can avoid
the slow tranfers by using ftp or by tranfering the file
via another CCS system (eagle, cheetah, HPSS, or home).
A pre-release of the new openssh was installed which produced a 3X
improvement.
Documentation for the Cray X1 is available at
the following URL:
http://cray.com/craydoc/.
For convenience, copies of the Cray documents have also been
placed in /dfs/doc/x1_mp20/doc/. However, the documentation
here may not be as up-to-date as the documentation available at the
URL indicated above.
phoenix
| ram
| cheetah
| eagle
|