Show
Ignore:
Timestamp:
06/19/07 11:31:30 (2 years ago)
Author:
voran
Message:
  • started updating manual for 0.97
Files:

Legend:

Unmodified
Added
Removed
Modified
Copied
Moved
  • trunk/doc/operations.xml

    r305 r526  
    2626            new job. Aspects of this request are checked both on the 
    2727            server side, and in <filename>cqsub</filename>, for better 
    28             user error messages. Whenever a job is created or changes 
    29             state, appropriate events are emitted. These events can be 
    30             seen using the <filename>eminfo.py</filename> command. Any 
    31             client that has subscribed to this sort of event will 
    32             receive a copy. 
     28            user error messages. <!-- Whenever a job is created or changes --> 
     29<!--        state, appropriate events are emitted. These events can be --> 
     30<!--        seen using the <filename>eminfo.py</filename> command. Any --> 
     31<!--        client that has subscribed to this sort of event will --> 
     32<!--        receive a copy. --> 
    3333          </para> 
    3434        </listitem> 
     
    171171 
    172172    <para> 
    173       Job accounting log messages are logged to syslog. The basic 
    174       messages are logged by the queue-manager for job queuing (Q), 
    175       execution (S), and exit (E). 
     173      Job accounting log messages are logged to files in the directory 
     174      specified by <filename>log_dir</filename> in the [cqm] section 
     175      of the config file. Basic messages are logged by the 
     176      queue-manager for job queuing (Q), execution (S), and exit 
     177      (E). Additional messages include the location where the job is 
     178      running and the exit code. 
    176179 
    177180        <programlisting> 
    178181Q;jobid;user;queue 
    179182S;jobid;user;job name;nodes;processors;mode;walltime 
     183Job jobid/user/Q:queue: Running job on location 
    180184E;jobid;user;walltime 
     185Job jobid/user on nodes nodes done. queue:[queuetime]s 
     186  user:[walltime]s  exit:exitcode 
    181187        </programlisting> 
    182  
    183     </para> 
    184  
     188    </para> 
    185189    <para> 
    186190      Example: 
    187191      <programlisting> 
    188 Dec 15 17:29:36 localhost cqm[4152]: Q;25537;voran;default 
    189 Dec 15 17:29:39 localhost cqm[4152]: S;25537;voran;N/A;32;32;co;10 
    190 Dec 15 17:29:59 localhost cqm[4152]: E;25537;voran;10 
     1922007-06-06 12:43:50 Q;59;bob;default 
     1932007-06-06 12:44:56 S;59;bob;N/A;32;32;co;20 
     1942007-06-06 12:44:56 Job 59/bob/4539/Q:default: Running job on 32_R000_J108_N3 
     1952007-06-06 12:44:56 Job 59/bob using kernel default 
     1962007-06-06 12:45:08 E;59;bob;14 
     1972007-06-06 12:45:09 Job 59/bob on 32 nodes done. queue:65.88s user:11.98s  exit:0 
    191198      </programlisting> 
    192199 
     
    206213      Example: 
    207214      <programlisting> 
    208 Dec 15 17:29:39 localhost bgsched[4760]: Job 25537/voran: Scheduling 
     215Dec 15 17:29:39 localhost bgsched[4760]: Job 25537/bob: Scheduling 
    209216  job 25537 on partition 32wayN0 
    210 Dec 15 17:29:39 localhost cqm[4152]: Job 25537/voran using kernel default 
    211 Dec 15 17:29:39 localhost bgpm[4124]: Job 25537/voran: ProcessGroup 1 
     217Dec 15 17:29:39 localhost cqm[4152]: Job 25537/bob using kernel default 
     218Dec 15 17:29:39 localhost bgpm[4124]: Job 25537/bob: ProcessGroup 1 
    212219  Started on partition 32wayN0. pid: 4220 
    213 Dec 15 17:29:39 localhost bgpm[4220]: Job 25537/voran: Running 
     220Dec 15 17:29:39 localhost bgpm[4220]: Job 25537/bob: Running 
    214221  /bgl/BlueLight/ppcfloor/bglsys/bin/mpirun mpirun -np 32 -partition  
    215   32wayN0 -mode co -cwd /home/voran/tests -exe /home/voran/tests/ring-hello 
    216 Dec 15 17:35:49 localhost bgpm[4124]: Job 25537/voran: ProcessGroup 1 
     222  32wayN0 -mode co -cwd /home/bob/tests -exe /home/bob/tests/ring-hello 
     223Dec 15 17:35:49 localhost bgpm[4124]: Job 25537/bob: ProcessGroup 1 
    217224  Finshed with exit code 0. pid 4220 
    218 Dec 15 17:35:59 localhost cqm[4152]: Job 25537/voran on 32 nodes 
     225Dec 15 17:35:59 localhost cqm[4152]: Job 25537/bob on 32 nodes 
    219226  done. queue:2.99s user:10.18s 
    220227      </programlisting> 
     
    234241      down to a text stream (using Python's cPickle module), and saves 
    235242      this data in a file in the directory 
    236       <filename>/var/spool/sss</filename>. The filenames in this 
     243      <filename>/var/spool/cobalt/</filename>. The filenames in this 
    237244      directory correspond to the component implementation name. This 
    238245      is the name that appears in syslog log messages (ie cqm, bgpm, 
     
    240247    </para> 
    241248 
    242     <para
    243       This data can be manipulated from a python interpreter using the 
    244       <filename>cddbg.py</filename>. This should not be attempted 
    245       unless you really know what you are doing. 
    246     </para
     249<!--     <para> --
     250<!--       This data can be manipulated from a python interpreter using the --> 
     251<!--       <filename>cddbg.py</filename>. This should not be attempted --> 
     252<!--       unless you really know what you are doing. --> 
     253<!--     </para> --
    247254  </section> 
    248255  <section>