NAME
	  CORE - File or directory produced by faulting	process(es).

     SYNOPSIS
	  include <sys/core.h>

     DESCRIPTION
	  The operating	system writes a	core file when a process in an
	  application terminates with an error.	A process that ter-
	  minates with an error	is called a faulting process. The fol-
	  lowing support is provided for core files:

	  o Core files are created for parallel	and non-parallel
	    applications.

	  o Environment	variables control where	and how	core files are
	    generated.

	  o The	coreinfo command displays information about core
	    files.

	  The most common errors that generate core files for applica-
	  tions	are memory violations, illegal instructions, bus
	  errors, and user-generated quit signals. The following sig-
	  nals may result in core files	being written: SIGQUIT,
	  SIGILL, SIGTRAP, SIGABRT, SIGFPE, SIGBUS, SIGSEGV, SIGSYS,
	  SIGIOT, and SIGEMT. See the signal(4)	manual page for	more
	  information.

	  If a faulting	process	is in a	regular	non-parallel applica-
	  tion,	the operating system writes a core file	named core. If
	  a faulting process is	in a parallel application, the follow-
	  ing happens by default:

	  o The	entire application is terminated.

	  o A directory	named core is created.

	  o In the core	directory, the operating system	creates	a core
	    file for the first faulting	process. The core file gets a
	    name with the following form: core.pid, where pid is the
	    process ID (PID) for the process.

	  o In the core	directory, the operating system	writes the
	    file named allocinfo containing global information about
	    the	parallel application.

	  If a core file or directory exists at	the time of a fault,
	  it is	silently removed (if permissions allow)	and a new one
	  created. A message is	displayed only if there	is insuffi-
	  cient	file space to write the	core file or if	the core
	  directory was	created	but cannot be written into due to
	  access permission.

	  If the core action environment variables are set such	that
	  more than one	faulting process dumps core, the bootmagic
	  string PARACORE_MAX_PROCESSES	limits how many	processes dump
	  core.

     SPECIFYING	THE LOCATION OF	CORE FILES
	  By default, the core file or directory is created in the
	  current working directory for	the application. The environ-
	  ment variable	CORE_PATH can be used to change	this default
	  location.

	  CORE_PATH
		    Directory pathname where the core file or direc-
		    tory is created. The default is the	working	direc-
		    tory in which you execute an application.

	  Since	it is possible to change directories or	to change an
	  environment variable from within an application, the follow-
	  ing rules apply as to	their effect on	the placement of core:

	  o For	non-parallel applications, any change of directory or
	    environment	variable (for example, chdir(),	or putenv())
	    that occurs	prior to the fault can effect the location of
	    core.

	  o For	parallel applications (compiled	with -nx), any change
	    of directory or environment	variable within	the applica-
	    tion does not effect the location of core.

	  o For	host/node applications (compiled with -lnx), any
	    change of directory	or environment variable	that is	per-
	    formed by the host prior to	a fault	in any process can
	    effect the location	of core.

	  The following	example	specifies that a directory
	  /usr/develop/corefiles is where core files or	directories
	  should be placed:

	    % setenv CORE_PATH /usr/develop/corefiles

				      NOTE

	       Writing core files to a PFS directory is	not allowed.
	       If CORE_PATH specifies a	directory in a PFS file	sys-
	       tem, a core file	or core-file directory will NOT	be
	       created when an application faults.

     DEFINING CORE FILE	TYPES, FREQUENCIES, AND
	  ACTIONS

	  More than one	process	in a parallel application may ter-
	  minate with an error.	This happens because there is a	period
	  of time between when the first faulting process terminates
	  and the other	processes in the application terminate.	You
	  can use the following	environment variables to specify how
	  and when core	files are created:

	  CORE_ACTION_FIRST
		    Specifies how the operating	system handles the
		    first faulting process in a	parallel or
		    non-parallel application.

	  CORE_ACTION_FAULT
		    Specifies how the operating	system handles fault-
		    ing	processes other	than the first faulting	pro-
		    cess in a parallel application.

	  CORE_ACTION_OTHER
		    Specifies how the operating	system handles
		    non-faulting processes in a	parallel application.

	  You can specify the following	values for the environment
	  variables CORE_ACTION_FIRST, CORE_ACTION_FAULT, and
	  CORE_ACTION_OTHER:

	  FULL	    Creates a full core	file that includes the entire
		    data region. Default action	for first faulting
		    process (both parallel and non-parallel applica-
		    tions).

	  TRACE	    Creates a partial core that	includes the register
		    and	stack information only.

	  KILL	    Stops the application without creating core	files.
		    Default action for faulting	(except	the first
		    faulting process) and non-faulting processes.

	  You can also specify the following value for the environment
	  variable CORE_ACTION_OTHER:

	  CONT	    Continue executing.	Do not stop or create core
		    files.

	  The following	example	specifies how core files or core-file
	  directories are created:

	    % setenv CORE_ACTION_FIRST FULL
	    % setenv CORE_ACTION_FAULT TRACE
	    % setenv CORE_ACTION_OTHER CONT

	  This example specifies creating a full core file for the
	  first	faulting process in an application, a core file	with
	  trace	information only for other faulting files, and no core
	  files	for non-faulting processes.

	  The default values for the environment variables maximize
	  the debug information	available and minimize the file	space
	  needed for core files. The environment variables allow you
	  to adjust the	amount of core dumped based on your knowledge
	  of the application size (number of processes used) and the
	  file space available.

	  Use the following guidelines when setting the	environment
	  variables for	core files or directories:

	  o The	CONT action can	be specified only for
	    CORE_ACTION_OTHER. If CONT is specified for	either
	    CORE_ACTION_FIRST or CORE_ACTION_FAULT, the	error will not
	    be caught until an application faults. When	the applica-
	    tion faults, a warning is displayed	on the console and the
	    default action for each core action	environment variable
	    is assumed.

	  o If CORE_ACTION_FIRST, CORE_ACTION_FAULT, and
	    CORE_ACTION_OTHER are all set to not create	core files
	    (that is KILL or CONT), then no core file or core direc-
	    tory is created and	an existing one	is untouched.

	  o For	parallel applications, combinations where
	    CORE_ACTION_FIRST is set to	KILL but CORE_ACTION_FAULT and
	    CORE_ACTION_OTHER are not set to KILL may result in	a core
	    directory being created with an allocinfo file and nothing
	    else. For example, if CORE_ACTION_FIRST and
	    CORE_ACTION_OTHER are set to KILL and CORE_ACTION_FAULT is
	    set	to FULL	when a single process in an application
	    faults, a core directory is	created	with nothing but an
	    allocinfo file.

	  o For	parallel applications executing	on a system that has
	    the	PARACORE_MAX_PROCESSES bootmagic string	set to 1 (the
	    default), only CORE_ACTION_FIRST is	effective. The system
	    ignores CORE_ACTION_FAULT and CORE_ACTION_OTHER. Under
	    these conditions, the first	faulting process is the	only
	    process that can dump core.

	  o For	parallel applications executing	on a system that has
	    the	PARACORE_MAX_PROCESSES bootmagic string	set to a value
	    greater than one, the system applies CORE_ACTION_FAULT and
	    CORE_ACTION_OTHER only when	the application	size is	less
	    than or equal to the bootmagic string's setting.

	  o For	parallel applications executing	on a system that has
	    the	PARACORE_MAX_PROCESSES bootmagic string	set to -1
	    (unlimited), the system applies CORE_ACTION_FIRST,
	    CORE_ACTION_FAULT and CORE_ACTION_OTHER as described in
	    the	first four bulleted text items.

     MAXIMUM SIZE FOR CORE FILES
	  The maximum size of each individual core file	is limited by
	  the setrlimit() function. Core files that exceed the limit
	  are not created.

     EXAMINING CORE FILES
	  You can use the coreinfo command and the IPD tool to examine
	  core files. The coreinfo command displays summary informa-
	  tion about processes that have dumped	core. The IPD tool
	  displays stack tracebacks and	data stored in core files.

	  See the coreinfo(1) and ipd manual pages for more informa-
	  tion about this command and tool.

     CORE FILE FORMAT
	  The format and a brief description of	the content of a core
	  file is described below. Refer to the	indicated include
	  files	for structure details.

	  Core header
		    Identifies the core	file, the process that gen-
		    erated the file (relative to the partition), and
		    the	offset from the	start of the core file to the
		    beginning of each of the other sections in the
		    core file. The following structure is defined in
		    the	include	file core.h.
     struct core_hdr {
	  int	   c_magic;	    /* core file magic number */
	  unsigned short c_swap;    /* byte swap field */
	  short	   c_version;	    /* core file version */
	  int	   c_type;	    /* core file type */
	  struct timeval c_timdat;  /* time & date core	file created */
	  int	   c_signo;	    /* signal that killed process */
	  int	   c_sigcode;	    /* signal subcode */
	  long	   c_numnodes;	    /* number of nodes in the partition	*/
	  long	   c_node;	 /* logical node number	of the dumping process */
	  long	   c_ptype;	    /* ptype of	the dumping process */
	  off_t	   c_procinfo;	    /* offset of process info */
	  off_t	   c_applinfo;	    /* offset of application info (APPLINFO_T) */
	  long	   c_nregions;	    /* number of region	descriptors */
	  off_t	   c_firstreg;	    /* offset of first region desc in core file	*/
	  long	   c_nthreads;	    /* number of active	threads	*/
	  off_t	   c_firstthread;   /* offset of first thread_info in core file	*/
	  long	   c_activethread;  /* index of	last active (faulting) thread */
     };

	  Process information
		    Identifies the process (relative to	other
		    processes),	the executable,	and its	arguments. The
		    program name is stored as it is given to the exec
		    system call, thus, the root	and current directory
		    information	is needed so that a full path name can
		    be constructed. Offsets are	relative to the	start
		    of the core	file. The following structure is
		    defined in the include file	core.h.
     struct core_proc_info {
	  pid_t	    c_pid;	 /* process id */
	  pid_t	    c_ppid;	 /* parent process id */
	  pid_t	    c_pgid;	 /* process group leader id */
	  long	    c_prglen;	 /* length of program path */
	  off_t	    c_prgname;	 /* offset of relative path name of program */
	  long	    c_rootlen;	 /* length of exec root	directory path */
	  off_t	    c_rootname;	 /* offset of exec root	directory path */
	  long	    c_cwdlen;	 /* length of exec current directory path */
	  off_t	    c_cwdname;	 /* offset of exec current directory path */
     };

	  Application information
		    Identifies the execution control characteristics.
		    All	other application information and partition
		    information	is contained in	a separate file	called
		    allocinfo written in the core directory. The fol-
		    lowing structure is	defined	in the include file
		    mcmsg/mcmsg_appl.h.

     typedef struct applinfo {
	  unsigned long	    app;	     /*	application id */
	  unsigned long	    process_lock;    /*	lock process data in memory */
	  unsigned long	    pkt_size;	     /*	message	packet size */
	  unsigned long	    memory_buffer;   /*	total message buffer to	allocate */
	  unsigned long	    memory_export;   /*	total buffer for other nodes */
	  unsigned long	    memory_each;     /*	buffer available for each node */
	  unsigned long	    send_threshold;  /*	send multiple packet threshold */
	  unsigned long	    send_count;	     /*	pkts to	send when send_threshold */
	  unsigned long	    give_threshold;  /*	send give message threshold */
	  unsigned long	    noc;	     /*	number of correspondents */
	  unsigned long	    rows;	     /*	number of rows in application */
	  unsigned long	    columns;	     /*	number of columns in application */
	  unsigned long	    unused[12];
     } APPLINFO_T;

	  Region descriptors
		    Describes all regions for a	process. The offset
		    value is null if the contents of the region	was
		    not	written	to the region section of the core
		    file. The following	structure is defined in	the
		    include file core.h.
     struct core_region_desc {
	  off_t		   r_offset;	 /* offset of VM region	in core	file */
	  vm_address_t	   r_vaddr;	 /* virtual address of region start */
	  vm_size_t	   r_size;	 /* region size	(bytes)	*/
	  vm_prot_t	   r_prot;	 /* VM protection (e.g.	VM_PROT_READ) */
     };

	  Thread information
		    Contains state (register) information for each
		    Mach thread	in the process.	The structure consists
		    of a list of all the registers and is defined in
		    mach/machine/thread_status.h.

	  Region contents
		    Contains memory image of regions described in the
		    region descriptors section.

     ALLOCINFO FILE FORMAT
	  Information for the entire application that is not locally
	  available to a node's	server is written to the file allo-
	  cinfo	within the core	directory. The following structures
	  show the format and content of the allocinfo file. These
	  structures are defined in the	include	file allocinfo.h.

     struct allocinfo_hdr {
	  int		     a_magic;	     /*	allocinfo file magic number */
	  unsigned short     a_swap;	     /*	byte swap field	*/
	  short		     a_version;	     /*	allocinfo file version */
	  nx_part_info_t     a_partition;    /*	NX partition info */
	  nx_app_info_t	     a_application;  /*	NX application info */
     };
     typedef struct {
	  uid_t		  uid;	      /* User Id */
	  gid_t		  gid;	      /* Group Id */
	  int		  access;     /* Access	Permissions */
	  int		  sched;      /* NX_STD	or NX_GANG */
	  unsigned long	  rq;	      /* Rollin	Quantum	*/
	  int		  epl;	      /* Effective priority limit */
	  int		  nodes;      /* Number	of nodes in the	partition */
     /*	NOTES: mesh_x and mesh_y are only set if the mesh is a contiguous */
     /*	rectangle. Otherwise the are -1.*/
	  int		  mesh_x;     /* X dimension of	partition */
	  int		  mesh_y;     /* Y dimension of	partition */
     /*	The enclose_mesh_x and enclose_mesh_y are the
     * minimum rectangular dimensions that will	enclose
     * the partition. These dimensions may contain nodes enclose
     * nodes that are not part of the specified	partition
     */
	  int		  enclose_mesh_x;
	  int		  enclose_mesh_y;
	  int		  flags_or_size;  /* Internal Use only */
	  int		  part_id;	  /* Internal Use only */
	  int		  free;		  /* Internal Use only */
	  int		  reserved[7];
     } nx_part_info_t;

     typedef struct {
	  int		  size;	      /* Number	of nodes in application	*/
	  int		  nrows;      /* X dimension of	application, 1 if
				       * nodes are not contiguous
				       */
	  int		  ncols;      /* Y dimension of	application,set	to size
				       * if nodes are not contiguous
				       */
	  int		  priority    /* Priority of application */
	  unsigned long	  rolled_in;  /* Milliseconds this appl	rolled in */
	  unsigned long	  elapsed;    /* Milliseconds this appl	rolled in */
	  uid_t		  uid;	      /* UID of	user running application */
	  gid_t		  nx_acctid;  /* NX account id (MACS)
				       * of application	*/
	  struct tm	  start_time; /* Time stamp of when
				       * application started
				       */
     } nx_app_info_t;

     LIMITATIONS AND WORKAROUNDS
	  There	is no means of limiting	the size of a core directory
	  as a whole.

	  Once an application faults and core file creation begins, it
	  cannot be interrupted.

     SEE ALSO
	  commands: coreinfo, bootmagic, pspart, nx_pspart

	  OSF/1	Programmer's Reference:	setrlimit(2), signal(4)






















Acknowledgement and Disclaimer