NCCS | User Info | search  

Phoenix Issues, Problems, Workarounds

Introduction

This page is intended to keep users informed of current problems, issues, and work-arounds on Phoenix. It will be updated regularly.

Issues

Resolved Issues

Documentation



Issues

Can't find Fortran modules when loading HDF5 or netcdf modules

Although we have set up HDF5 and netcdf modules on the X1, the ftn compiler cannot find the "Fortran modules" included in those packages. This is not a problem if you are not "use"ing a Fortran module.

The only workaround at this time is to add the "-em -p<path>" options to your command line where <path> is something you need to figure out. You can do a "module display <module>" to see information about the module you are using - namely the path. And then use the appropriate path with the "-p" option.

Even with this limitation, loading these modules is encouraged if you need to use these libraries because you will link against the appropriate SSP or MSP libraries automatically just by using the -hssp switch or not. As well as other environment variables being set properly.

Added Mar 24, 2005.

Large Procedure Call Overhead

It has been observed that there is relatively large procedure call overhead. If you have a procedure call inside a loop, you may witness a large penalty.

The best way to deal with this is to have the compiler inline the procedure if at all possible, or inline it yourself. Alternatively, you could push the loop down into the procedure call.

Added Oct 28, 2003.

MPI I/O bug with external32 format

The external32 format in MPI I/O does not work properly on the X1. The workaround is to use native.

Added Oct 28, 2003.

Runtime Bus Error

Occasionally runtime bus errors are generated when a code is recompiled. The error can be corrected by removing the current executable and relinking or recompiling the code. The error can be avoided by always removing the existing executable before recompiling or relinking the code.



Resolved

Interactive run dies mysteriously

There is a ten minute time limit on "true" interactive runs. When you reach this limit currently, there was no error message to indicate the run was killed. Rather you got error messages like "segmentation fault" or "memory fault" which might have led one to believe there is a problem with the code, when there might not be.

This is now fixed and says something to the effect

./a.out.[13]: 101087 Exceeded CPU time limit
But note this is mixed in with other messages like segmentation fault or memory fault or even with the stack trace (if you set TRACEBK).

And if you need more time than 10 minutes, then do a "qsub -I -lwalltime=2:00:00,mppe=<x>" to get a longer interactive job with <x> processors.

Added Nov 18, 2004. Resolved March 2005.

cc -hlist does not work with long path names

It has been observed that the 5.3.0.1 and 5.2.0.x C compiler does not work when using -hlist and very long path names.

This is fixed in 5.3.0.2. It can be "worked around" with the older compilers by not using -hlist.

Added Feb 18, 2005.

Multistreaming with PrgEnv 5.2

It has been observed that the 5.2 Fortran compiler does not always produce correct multistreaming code. Symptoms have been incorrect answers and MSYNC errors. The problem appears in 5.2.0.1 to 5.2.0.5. The 5.2.0.6 version is much better.

The best way to deal with this is to switch PrgEnv levels via "module swap PrgEnv PrgEnv.5301" or to switch cftn levels via "module switch cftn cftn.5301". Or to go back to PrgEnv.5105.

Added Sep 13, 2004. Updated Feb 17, 2005.

Application slowdown due to system interrupts

The 2.3 Unicos/mp has a bug where page-faults due to system interrupts (like compiling on the OS node) force page-faults on the application nodes thereby slowing down running applications.

This is to be fixed in Unicos/mp 2.4.x where x > 0 - probably sometime after Jan 1, 2004.

Added Oct 28, 2003.

Programming Environment has trouble in 4tb workspace area

It has been identified that the PE can not deal with the large INODE numbers associated with the 4tb workspace. This may prevent you from being able to compile or link your code especially if you have more than 1 path from which to inlude files. It may also affect pat_build and pat_report. If you encounter this, and doing these operations in your home space is not a viable option, please let consult@ccs.ornl.gov know.

Fixed when PE 5.2 was installed.

Added Feb 16, 2004. Updated May 4, 2004.

PZGETRF hangs

Under certain circumstances PZGETRF might hang. A fix has been successfully tested, but not put into a PE yet.

Fixed when a new libsci installed.

Added 16, 2004. Updated May 4, 2004.

PZSWAP hangs

Under certain circumstances PZSWAP might hang. A fix has been successfully tested, but not put into a PE yet.

Fixed when a new mpt was installed.

Added 16, 2004. Updated May 4, 2004.

-Omodinline core dumps

There are many reported cases of core dumps when the -Omodinline compiler option is used. The only workaround at this time is to reduce the optimization to -Oinline0. This workaround lets one proceed on the compilation, but leave much to be desired with respect to optimization.

Fixed in PE 5.2.

Added Oct 28, 2003. Updated May 4, 2004.

BLACS Gridmap not working properly

The BLACS gridmap function did not work properly. This was fixed.

Added Oct 28, 2003. Resolved late 2004.

The PBS option #PBS -j eo doesn't work

When the -eo option is used in a PBS batch script to merge standard out and standard error into a single file, the standard error file is lost and only the standard out file is retained.

Slow scp transfers

When using "scp" to tranfer files, the file transfer rates are slow. You can avoid the slow tranfers by using ftp or by tranfering the file via another CCS system (eagle, cheetah, HPSS, or home).

A pre-release of the new openssh was installed which produced a 3X improvement.


Documentation

Documentation for the Cray X1 is available at the following URL:

http://cray.com/craydoc/.

For convenience, copies of the Cray documents have also been placed in /dfs/doc/x1_mp20/doc/. However, the documentation here may not be as up-to-date as the documentation available at the URL indicated above.


phoenix | ram | cheetah | eagle

ornl | nccs | ccs | computers | disclaimer

URL http://www.ccs.ornl.gov/Phoenix/issues.html
Updated: Wednesday, 23-Mar-2005 15:46:06 EST
consult@ccs.ornl.gov