next up previous contents
Next: 2.11 Special Environment Considerations Up: 2. Users' Manual Previous: 2.9 Parallel Applications in

  
2.10 More about how Condor vacates a job

When Condor needs to vacate a job from a machine for whatever reason, it sends the job an asynchronous signal specified in the ``KillSig'' attribute of the job's classad. The value of this attribute can be specified by the user at submit time by placing the kill_sig command in the condor_submit submit-description command file.

If a program wanted to do some special work each time Condor kicks them off a machine, all it would need to do is setup a signal handler for some trappable signal as a ``cleanup'' signal. When submitting this job, specify this cleanup signal to use with kill_sig. However, whatever cleanup work the job does had better be quick -- if the job takes too long to go away after Condor tells it to do so, Condor follows up with a SIGKILL signal which immediatly terminates the process.

A job that linked with the Condor libraries via the condor_compile command and subsequently submitted into the Standard Universe will checkpoint and exit upon receit of a SIGTSTP signal. Thus, SIGTSTP is the default value for KillSig when submitting into the Standard Universe. However, the user's code can checkpoint itself at any time by calling one of the following functions exported by the Condor libraries:

ckpt()
Will perform a checkpoint and then return
ckpt_and_exit()
Will checkpoint and exit; Condor will then restart the process again later, potentially on a different machine

For jobs submitted into the Vanilla Universe, the default value for KillSig is SIGTERM, which is the usual method to nicely terminate a program in Unix.


next up previous contents
Next: 2.11 Special Environment Considerations Up: 2. Users' Manual Previous: 2.9 Parallel Applications in
condor-admin@cs.wisc.edu