Halt fileΒΆ
The halt file tracks various conditions that are used to determine
whether or not a run should continue to execute. The halt file is kept
in the prefix directory. It is updated by the library during the run,
and it is also updated externally through the scr_halt
command.
Internally, the data of the halt file is organized as a hash. Here are
the contents of an example halt file.
CheckpointsLeft
7
ExitAfter
1298937600
ExitBefore
1298944800
HaltSeconds
1200
ExitReason
SCR_FINALIZE_CALLED
The CheckpointsLeft
field provides a counter on the number of
checkpoints that should be completed before SCR stops the job. With each
checkpoint, the library decrements this counter, and the run stops if it
hits 0.
The ExitAfter
field records a timestamp (seconds since UNIX epoch).
At various times, SCR compares the current time to this timestamp, and
it halts the run as soon as the current time exceeds this timestamp.
The ExitBefore
field combined with the HaltSeconds
field inform
SCR that the run should be halted at specified number of seconds before
a specified time. Again, SCR compares the current time to the time
specified by subtracting the HaltSeconds
value from the
ExitBefore
timestamp (seconds since UNIX epoch). If the current time
is equal to or greater than this time, SCR halts the run.
Finally, the ExitReason
field records a reason the job is or should
be halted. If SCR ever detects that this field is set, it halts the job.
A user can add, modify, and remove halt conditions on a running job
using the scr_halt
command. Each time an application completes a
dataset, SCR checks settings in the halt file. If any halt condition is
satisfied, SCR flushes the most recent checkpoint, and then each process
calls exit()
. Control is not returned to the application.