Changeset 526 for trunk/doc/operations.xml
- Timestamp:
- 06/19/07 11:31:30 (2 years ago)
- Files:
-
- trunk/doc/operations.xml (modified) (5 diffs)
Legend:
- Unmodified
- Added
- Removed
- Modified
- Copied
- Moved
trunk/doc/operations.xml
r305 r526 26 26 new job. Aspects of this request are checked both on the 27 27 server side, and in <filename>cqsub</filename>, for better 28 user error messages. Whenever a job is created or changes29 state, appropriate events are emitted. These events can be 30 seen using the <filename>eminfo.py</filename> command. Any 31 client that has subscribed to this sort of event will 32 receive a copy. 28 user error messages. <!-- Whenever a job is created or changes --> 29 <!-- state, appropriate events are emitted. These events can be --> 30 <!-- seen using the <filename>eminfo.py</filename> command. Any --> 31 <!-- client that has subscribed to this sort of event will --> 32 <!-- receive a copy. --> 33 33 </para> 34 34 </listitem> … … 171 171 172 172 <para> 173 Job accounting log messages are logged to syslog. The basic 174 messages are logged by the queue-manager for job queuing (Q), 175 execution (S), and exit (E). 173 Job accounting log messages are logged to files in the directory 174 specified by <filename>log_dir</filename> in the [cqm] section 175 of the config file. Basic messages are logged by the 176 queue-manager for job queuing (Q), execution (S), and exit 177 (E). Additional messages include the location where the job is 178 running and the exit code. 176 179 177 180 <programlisting> 178 181 Q;jobid;user;queue 179 182 S;jobid;user;job name;nodes;processors;mode;walltime 183 Job jobid/user/Q:queue: Running job on location 180 184 E;jobid;user;walltime 185 Job jobid/user on nodes nodes done. queue:[queuetime]s 186 user:[walltime]s exit:exitcode 181 187 </programlisting> 182 183 </para> 184 188 </para> 185 189 <para> 186 190 Example: 187 191 <programlisting> 188 Dec 15 17:29:36 localhost cqm[4152]: Q;25537;voran;default 189 Dec 15 17:29:39 localhost cqm[4152]: S;25537;voran;N/A;32;32;co;10 190 Dec 15 17:29:59 localhost cqm[4152]: E;25537;voran;10 192 2007-06-06 12:43:50 Q;59;bob;default 193 2007-06-06 12:44:56 S;59;bob;N/A;32;32;co;20 194 2007-06-06 12:44:56 Job 59/bob/4539/Q:default: Running job on 32_R000_J108_N3 195 2007-06-06 12:44:56 Job 59/bob using kernel default 196 2007-06-06 12:45:08 E;59;bob;14 197 2007-06-06 12:45:09 Job 59/bob on 32 nodes done. queue:65.88s user:11.98s exit:0 191 198 </programlisting> 192 199 … … 206 213 Example: 207 214 <programlisting> 208 Dec 15 17:29:39 localhost bgsched[4760]: Job 25537/ voran: Scheduling215 Dec 15 17:29:39 localhost bgsched[4760]: Job 25537/bob: Scheduling 209 216 job 25537 on partition 32wayN0 210 Dec 15 17:29:39 localhost cqm[4152]: Job 25537/ voranusing kernel default211 Dec 15 17:29:39 localhost bgpm[4124]: Job 25537/ voran: ProcessGroup 1217 Dec 15 17:29:39 localhost cqm[4152]: Job 25537/bob using kernel default 218 Dec 15 17:29:39 localhost bgpm[4124]: Job 25537/bob: ProcessGroup 1 212 219 Started on partition 32wayN0. pid: 4220 213 Dec 15 17:29:39 localhost bgpm[4220]: Job 25537/ voran: Running220 Dec 15 17:29:39 localhost bgpm[4220]: Job 25537/bob: Running 214 221 /bgl/BlueLight/ppcfloor/bglsys/bin/mpirun mpirun -np 32 -partition 215 32wayN0 -mode co -cwd /home/ voran/tests -exe /home/voran/tests/ring-hello216 Dec 15 17:35:49 localhost bgpm[4124]: Job 25537/ voran: ProcessGroup 1222 32wayN0 -mode co -cwd /home/bob/tests -exe /home/bob/tests/ring-hello 223 Dec 15 17:35:49 localhost bgpm[4124]: Job 25537/bob: ProcessGroup 1 217 224 Finshed with exit code 0. pid 4220 218 Dec 15 17:35:59 localhost cqm[4152]: Job 25537/ voranon 32 nodes225 Dec 15 17:35:59 localhost cqm[4152]: Job 25537/bob on 32 nodes 219 226 done. queue:2.99s user:10.18s 220 227 </programlisting> … … 234 241 down to a text stream (using Python's cPickle module), and saves 235 242 this data in a file in the directory 236 <filename>/var/spool/ sss</filename>. The filenames in this243 <filename>/var/spool/cobalt/</filename>. The filenames in this 237 244 directory correspond to the component implementation name. This 238 245 is the name that appears in syslog log messages (ie cqm, bgpm, … … 240 247 </para> 241 248 242 <para>243 This data can be manipulated from a python interpreter using the 244 <filename>cddbg.py</filename>. This should not be attempted 245 unless you really know what you are doing. 246 </para>249 <!-- <para> --> 250 <!-- This data can be manipulated from a python interpreter using the --> 251 <!-- <filename>cddbg.py</filename>. This should not be attempted --> 252 <!-- unless you really know what you are doing. --> 253 <!-- </para> --> 247 254 </section> 248 255 <section>