Next: condor_userprio
Up: 5. Command Reference Manual
Previous: condor_status
Subsections
condor_submit
Queue jobs for execution on remote machines
condor_submit
[-v]
[-n schedd_name]
[-r schedd_name]
submit-description file
condor_submit is the program for submitting jobs to Condor.
condor_submit requires a submit-description file which contains commands
to direct the queuing of jobs. One description file may contain
specifications for the queuing of many condor jobs at once. All jobs queued by a
single invocation of condor_submit must share the same executable, and
are referred to as a ``job cluster''. It is advantageous to submit
multiple jobs as a single cluster because:
- Only one copy of the checkpoint file is needed to
represent all jobs in a cluster until they begin execution.
- There is much less overhead involved for Condor to start the next
job in a cluster than for Condor to start a new cluster. This can make
a big difference if you are submitting lots of short running jobs.
SUBMIT DESCRIPTION FILE COMMANDS
Each condor job description file describes one cluster of jobs to be
placed in the condor execution pool. All jobs in a cluster must share
the same executable, but they may have different input and output files,
and different program arguments, etc. The submit-description file is then
used as the only command-line argument to condor_submit.
The submit-description file must contain one executable command and at least one
queue command. All of the other commands have default actions.
The commands which can appear in the submit-description file are:
- executable = <name>
- The name of the executable file for this
job cluster. Only one executable command may be present in a description
file. If submitting into the Standard Universe, which is the default,
then the named executable must have been re-linked with the Condor
libraries (such as via the condor_compile command). If submitting into
the Vanilla Universe, then the named executable need not be re-linked and
can be any process which can run in the background (shell scripts work
fine as well).
- input = <pathname>
- Condor assumes that its jobs are
long-running, and that the user will not wait at the terminal for their
completion. Because of this, the standard files which normally access
the terminal, (stdin, stdout, and stderr), must refer to files. Thus,
the filename specified with input should contain any keyboard
input the program requires (i.e. this file becomes stdin). If not
specified, the default value of /dev/null is used.
- output = <pathname>
- The output filename will capture
any information the program would normally write to the screen (i.e.
this file becomes stdout). If not specified, the default value of
/dev/null is used.
- error = <pathname>
- The error filename will capture any
error messages the program would normally write to the screen (i.e. this
file becomes stderr). If not specified, the default value of /dev/null
is used.
- arguments = <argument_list>
- List of arguments to be supplied
to the program on the command line.
- initaldir = <directory-path>
- Used to specify the current
working directory for the Condor job. Should be a path to a preexisting
directory. If not specified, condor_submit will automatically insert
the user's current working directory at the time condor_submit was run
as the value for initialdir.
- requirements = <ClassAd Boolean Expression>
- The requirements
command is a boolean ClassAd expression which uses C-like operators. In
order for any job in this cluster to run on a given machine, this
requirements expression must evaluate to true on the given machine. For
example, to require that whatever machine executes your program has a
least 64 Meg of RAM and has a MIPS performance rating greater than 45,
use:
requirements = Memory >= 64 && Mips > 45
Only one requirements command may be present in a
description file. By default, condor_submit
appends the following clauses to the requirements expression:
- 1.
- Arch and OpSys are set equal to the Arch and OpSys of the
submit machine. In other words: unless you request otherwise, Condor will give your
job machines with the same architecture and operating system version as
the machine running condor_submit.
- 2.
- Disk > ExecutableSize. To ensure there is enough disk space on the
target machine for Condor to copy over your executable.
- 3.
- VirtualMemory >= ImageSize. To ensure the target machine
has enough virtual memory to run your job.
- 4.
- If Universe is set to Vanilla, FileSystemDomain is set equal to
the submit machine's FileSystemDomain.
You can view the requirements of a job
which has already been submitted (along with everything else about the
job ClassAd) with the command condor_q -l; see the command reference for
condor_q on page
. Also, see the Condor Users
Manual for complete information on the syntax and available attributes
that can be used in the ClassAd expression.
- rank = <ClassAd Float Expression>
- A ClassAd Floating-Point
expression that states how to rank machines which have already met the requirements
expression. Essentially, rank expresses preference. A higher numeric value
equals better rank. Condor will give the job the machine with the
highest rank. For example,
requirements = Memory > 60
rank = Memory
asks Condor to find all available machines with more than 60 megabytes of memory
and give the job the one with the most amount of memory. See the Condor Users
Manual for complete information on the syntax and available attributes
that can be used in the ClassAd expression.
- priority = <priority>
- Condor job priorities range from -20 to
+20, with 0 being the default. Jobs with higher numerical priority will
run before jobs with lower numerical priority. Note that this priority
is on a per user basis; setting the priority will determine the order in
which your own jobs are executed, but will have no effect on whether or
not your jobs will run ahead of another user's jobs.
- notification = <when>
- Owners of condor jobs are notified by
email when certain events occur. If when is set to
``ALWAYS'', the owner will be notified whenever the job is
checkpointed, and when it completes. If when is set to
``COMPLETE'' (the default), the owner will be notified when the
job terminates. If when is set to ``ERROR'', the owner will
only be notified if the job terminates abnormally. Finally, if
when is set to ``NEVER'', the owner will not be mailed,
regardless what happens to the job.
- notify_user = <email-address>
- Used to specify the email
address to use when Condor sends email about a job. If not specified,
Condor will default to using :
job-owner@UID_DOMAIN
where UID_DOMAIN is specified by the Condor site administrator. If
UID_DOMAIN has not been specified, Condor will send the email
to :
job-owner@submit-machine-name
- getenv = <True | False>
- If getenv is set to
True, then condor_submit will copy all of the user's current
shell environment variables at the time of job submission into the job
ClassAd. The job will therefore execute with the same set of environment
variables that the user had at submit time. Defaults to False.
- environment = <parameter_list>
- List of environment variables
of the form :
<parameter> = <value>
Multiple environment variables can be specified by separating them with a
semicolon (`` ; ''). These environment variables will be placed into the
job's environment before execution. The length of all characters
specified in the environment is currently limited to 4096 characters.
- log = <pathname>
- Use log to specify a filename where
Condor will write a log file of what is happening with this job cluster.
For example, Condor will log into this file when and where the job
begins running, when the job is checkpointed and/or migrated, when the
job completes, etc. Most users find specifying a log file to be very
handy; its use is recommended. If no log entry is specified,
Condor does not create a log for this cluster.
- universe = <vanilla | standard | pvm | scheduler>
- Specifies
which Condor Universe to use when running this job. The Condor Universe
specifies a Condor execution environment. The standard Universe
is the default, and tells Condor that this job has been re-linked via
condor_compile with the Condor libraries and therefore supports
checkpointing and remote system calls. The vanilla Universe is an
execution environment for jobs which have not been linked with the
Condor libraries. Note: use the vanilla Universe to
submit shell scripts to Condor. The pvm Universe is for a
parallel job written with PVM 3.3, and scheduler is for a job that
should act as a metascheduler. See the Condor User's Manual for more
information about using Universe.
- image_size = <size>
- This command tells Condor the maximum
virtual image size to which you believe your program will grow during
its execution. Condor will then execute your job only on machines which
have enough resources, (such as virtual memory), to support executing
your job. If you do not specify the image size of your job in the
description file, Condor will automatically make a (reasonably accurate)
estimate about its size and adjust this estimate as your program runs.
If the image size of your job is underestimated, it may crash due to
inability to acquire more address space, e.g. malloc() fails. If the image
size is overestimated, Condor may have difficulty finding machines which
have the required resources. size must be in kbytes, e.g. for
an image size of 8 megabytes, use a size of 8000.
- machine_count = <min..max>
- If machine_count is
specified, Condor will not start the job until it can simultaneously
supply the job with min machines. Condor will continue to try to provide up
to max machines, but will not delay starting of the job to do so.
If the job is started with fewer than max machines, the job
will be notified via a usual PvmHostAdd notification as additional
hosts come on line.
Important: only use machine_count if an only if
submitting into the PVM Universe. At this time, machine_count
must be used only with a parallel PVM application.
- coresize = <size>
- Should the user's program abort and produce
a core file, coresize specifies the maximum size in bytes of the
core file which the user wishes to keep. If coresize is not
specified in the command file, the system's user resource limit
``coredumpsize'' is used (except on HP-UX).
- nice_user = <True | False>
- Normally, when a machine
becomes available to Condor, Condor decides which job to run based upon
user and job priorities. Setting nice_user equal to True
tells Condor not to use your regular user priority, but that this job
should have last priority amongst all users and all jobs. So jobs
submitted in this fashion run only on machines which no other
non-nice_user job wants -- a true ``bottom-feeder'' job! This is very
handy if a user has some jobs they wish to run, but do not wish to use
resources that could instead be used to run other people's Condor jobs. Jobs
submitted in this fashion have ``nice-user.'' pre-appended in front of
the owner name when viewed from condor_q or condor_userprio. The
default value if False.
- kill_sig = <signal-number>
- When Condor needs to kick a job
off of a machine, it will send the job the signal specified by
signal-number. signal-number needs to be an integer which
represents a valid signal on the execution machine. For jobs submitted
to the Standard Universe, the default value is the number for
SIGTSTP
which tells the Condor libraries to initiate a checkpoint
of the process. For jobs submitted to the Vanilla Universe, the default
is SIGTERM
which is the standard way to terminate a program in UNIX.
- +<attribute> = <value>
- A line which begins with a '+'
(plus) character instructs condor_submit to simply insert the
following attribute into the job ClasssAd with the given
value.
- queue [number-of-procs
- ] Places one or more copies of the job into
the Condor queue. If desired, new input, output,
error, initialdir, arguments, nice_user,
priority, kill_sig, coresize, or image_size
commands may be issued between queue commands. This is very handy
when submitting multiple runs into one cluster with one submit file; for
example, by issuing an initialdir between each queue
command, each run can work in its own subdirectory. The optional
argument number-of-procs specifies how many times to submit the
job to the queue, and defaults to 1.
In addition to commands, the submit-description file can contain macros
and comments:
- Macros
- Parameterless macros in the form of $(macro_name)
may be inserted anywhere in condor description files. Macros can be
defined by lines in the form of
<macro_name> = <string>
Two pre-defined macros are supplied by the description file parser. The
$(Cluster) macro supplies the number of the job cluster, and the
$(Process) macro supplies the number of the job. These macros are
intended to aid in the specification of input/output files, arguments,
etc., for clusters with lots of jobs, and/or could be used to supply a
Condor process with its own cluster and process numbers on the command
line.
- Comments
- Blank lines and lines beginning with a '#' (pound-sign)
character are ignored by the submit-description file parser.
Supported options are as follows:
- -v
- Verbose output - display the created job class-ad
- -n schedd_name
- Submit to the specified schedd. This option is used when there is more than one schedd running on the submitting machine
- -r schedd_name
- Submit to a remote schedd. The jobs
will be submitted to the schedd on the specified remote host, and their
owner will be set to ``nobody".
condor_submit will exit with a status value of 0 (zero) upon success, and a
non-zero value upon failure.
Example 1: The below example queues three jobs for
execution by Condor. The first will be given command line arguments of
'15' and '2000', and will write its standard output to 'foo.out1'. The
second will be given command line arguments of '30' and '2000', and will
write its standard output to 'foo.out2'. Similarly the third will have
arguments of '45' and '6000', and will use 'foo.out3' for its standard
output. Standard error output, (if any), from all three programs will
appear in 'foo.error'.
####################
#
# Example 1: queueing multiple jobs with differing
# command line arguments and output files.
#
####################
Executable = foo
Arguments = 15 2000
Output = foo.out1
Error = foo.err1
Queue
Arguments = 30 2000
Output = foo.out2
Error = foo.err2
Queue
Arguments = 45 6000
Output = foo.out3
Error = foo.err3
Queue
Example 2: This submit-description file example queues 150
runs of program 'foo' which must have been compiled and linked for
Silicon Graphics workstations running IRIX 6.x. Condor will not attempt
to run the processes on machines which have less than 32 megabytes of
physical memory, and will run them on machines which have at least 64
megabytes if such machines are available. Stdin, stdout, and stderr will
refer to ``in.0'', ``out.0'', and ``err.0'' for the first run of this program
(process 0). Stdin, stdout, and stderr will refer to ``in.1'', ``out.1'',
and ``err.1'' for process 1, and so forth. A log file containing entries
about where/when Condor runs, checkpoints, and migrates processes in this
cluster will be written into file ``foo.log''.
####################
#
# Example 2: Show off some fancy features including
# use of pre-defined macros and logging.
#
####################
Executable = foo
Requirements = Memory >= 32 && OpSys == "IRIX6" && Arch =="SGI"
Rank = Memory >= 64
Image_Size = 28 Meg
Error = err.$(Process)
Input = in.$(Process)
Output = out.$(Process)
Log = foo.log
Queue 150
- For security reasons, Condor will refuse to run any jobs submitted
by user root (UID = 0) or by a user whose default group is group wheel
(GID = 0). Jobs submitted by user root or a user with a default group of
wheel will appear to sit forever in the queue in an unexpanded state.
- All pathnames specified in the submit-description file must be
less than 256 characters in length, and command line arguments must be
less than 4096 characters in length; otherwise, condor_submit gives a
warning message but the jobs will not execute properly.
- Somewhat understandably, behavior gets bizzare if the user makes
the silly mistake of requesting multiple Condor jobs to write to the
same file, and/or if the user alters any files that need to be accessed
by a Condor job which is still in the queue (i.e. compressing of data or
output files before a Condor job has completed is a common mistake).
Condor User Manual
Condor Team, University of Wisconsin-Madison
Copyright © 1990-1998 Condor Team, Computer Sciences Department,
University of Wisconsin-Madison, Madison, WI. All Rights Reserved.
No use of the Condor Software Program is authorized
without the express consent of the Condor Team. For more information
contact: Condor Team, Attention: Professor Miron Livny,
7367 Computer Sciences, 1210 W. Dayton St., Madison, WI 53706-1685,
(608) 262-0856 or miron@cs.wisc.edu.
U.S. Government Rights Restrictions: Use, duplication, or disclosure
by the U.S. Government is subject to restrictions as set forth in
subparagraph (c)(1)(ii) of The Rights in Technical Data and Computer
Software clause at DFARS 252.227-7013 or subparagraphs (c)(1) and
(2) of Commercial Computer Software-Restricted Rights at 48 CFR
52.227-19, as applicable, Condor Team, Attention: Professor Miron
Livny, 7367 Computer Sciences, 1210 W. Dayton St., Madison,
WI 53706-1685, (608) 262-0856 or miron@cs.wisc.edu.
See the Condor Version 6.0.3 Manual for
additional notices.
Next: condor_userprio
Up: 5. Command Reference Manual
Previous: condor_status
condor-admin@cs.wisc.edu