Next: 3.5 Configuring The Startd
Up: 3. Administrators' Manual
Previous: 3.3 Installing Contrib Modules
Subsections
3.4
Configuring Condor
This section describes how to configure all parts of the Condor
system. First, we describe some general information about the config
files, their syntax, etc. Then, we describe settings that effect all
Condor daemons and tools. Finally, we have a section describing the
settings for each part of Condor. The only exception to this are the
settings that control the policy under which Condor will start,
suspend, resume, vacate or kill jobs. These settings (and other
important concepts from the condor_startd) are described in
section 3.5 on ``Configuring Condor's Job
Execution Policy''.
3.4.1
Introduction to Config Files
The Condor configuration files are used to customize how Condor
operates at a given site. The basic configuration as shipped with
Condor should work well for most sites, with a few exceptions of
things that might need special customization. Please see the section
from the Installation section of this manual for details on where
Condor's config files are found.
Each condor program will, as part of its initialization process,
``configure'' itself by calling a library routine which parses the
various config files that might be used including pool-wide,
platform-specific, machine-specific, and root-owned config files. The
result is a list of constants and expressions which the program may
evaluate as needed at run time.
Definitions in the configuration file come in two flavors, macros and
expressions. Macros provide string valued constants which remain
static throughout the execution of the program. Expressions can be
arithmetic, boolean, or string valued, and can be evaluated
dynamically at run time.
The order in which Macros and Expressions are defined is important,
since you cannot define anything in terms of something else that
hasn't been defined yet. This is particularly important if you break
up your config files using the LOCAL_CONFIG_FILE setting
described in sections 3.4.2
and 3.9.2 below.
3.4.1.1
Config File Macros
Macro definitions are of the form:
<macro_name> = <macro_definition>
NOTE: You must have whitespace between the macro name, the
``='' sign, and the macro definition. Macro invocations are of the
form:
$(macro_name)
Macro definitions may contain references to previously defined
macros. Nothing in a config file can reference Macros which have
not yet been defined. Thus,
A = xxx
C = $(A)
is a legal set of macro definitions, and the resulting value of ``C'' is
``xxx''. Note that ``C'' is actually bound to ``$(A)'', not its value, thus
A = xxx
C = $(A)
A = yyy
is also a legal set of macro definitions and the resulting value of
``C'' is ``yyy''. However,
A = $(C)
C = xxx
is not legal, and will result in the Condor daemons and tools exiting
when they try to parse their config files.
3.4.1.2
Config File Expressions
Expression definitions are of the form:
<expression_name> : <expression>
NOTE: You must have whitespace between the expression name,
the ``:'' sign, and the expression definition. Expressions may
contain constants, operators, and other expressions. Macros may also
be used to aid in writing expressions. Constants may be booleans,
denoted by ``true'' (or ``t'') or ``false'' (or ``f''), signed
integers, floating point values, or strings enclosed in double quotes
("). All config file expressions are simply inserted into various
ClassAds. Please see the appendix on ClassAds for details about
ClassAd expression operators, and how ClassAd expressions are
evaluated.
Note that expression which contain references to other expressions are
bound to the expressions definition, not its current value, but
expressions which contain macro invocations are bound to the current
value of the macro. Thus
X : "xxx"
Y : X
X : "yyy"
will result in ``Y'' being evaluated as ``yyy'' at run time, but
X = "xxx"
Y : $(X)
X = "yyy"
will result in ``Y'' having a run time value of ``xxx''.
3.4.1.3
Other Syntax
Other than macros and expressions, a Condor config file can contain
comments or continuations. A comment is any line begining with a
``#''. A continuation is any entry (either macro or expression) that
continues across multiples lines. This is accomplished with the
``
'' sign at the end of any line that you wish to continue onto
another. For example,
START : (KeyboardIdle > 15 * $(MINUTE)) && \
((LoadAvg - CondorLoadAvg) <= 0.3)
or,
ADMIN_MACHINES = condor.cs.wisc.edu, raven.cs.wisc.edu, \
stork.cs.wisc.edu, ostrich.cs.wisc.edu, \
bigbird.cs.wisc.edu
HOSTALLOW_ADMIN = $(ADMIN_MACHINES)
3.4.1.4
Pre-Defined Macros and Expressions
Condor provides a number of pre-defined macros and expressions that
help you configure Condor. Pre-defined macros are listed as
$(macro_name), while pre-defined expressions are just listed as
expression_name, to denote how they should be referenced in
other macros or expressions.
The first set are special entries whose values are determined at
runtime and cannot be overridden. These are inserted automatically by
the library routine which parses the config files.
- CurrentTime
- This expression
provides the current result of the system call time(2). This
is an integer containing the number of seconds since an arbitrary
date defined by UNIX as the ``beginning of time'', hereafter
referred to as the UNIX date.
- $(FULL_HOSTNAME)
- This is the
fully qualified hostname of the local machine (hostname plus domain
name).
- $(HOSTNAME)
- This is just the
hostname of the local machine (no domain name).
- $(TILDE)
- This is the full path to the
home directory of user ``condor'', if such a user exists on the
local machine.
- $(SUBSYSTEM)
- This ``subsystem''
name of the daemon or tool that is evaluating the macro. The
different subsystem names are described in
section 3.4.1 below.
The final set are entries whose default values are determined
automatically at runtime but which can be overridden.
- $(ARCH)
- This setting defines the string
used to identify the architecture of the local machine to Condor.
The condor_startd will advertise itself with this attribute so
that users can submit binaries compiled for a given platform and
force them to run on the correct machines. condor_submit will
automatically append a requirement to the job ClassAd that it must
run on the same ARCH and OPSYS of the machine where
it was submitted, unless the user specifies ARCH and/or
OPSYS explicitly in their submit file. See the
condor_submit(1) man page for details.
- $(OPSYS)
- This setting defines the
string used to identify the operating system of the local machine to
Condor. See the entry on ARCH above for more information.
If this setting is not defined in the config file, Condor will
automatically insert the operating system of this machine as
determined byuname.
- $(FILESYSTEM_DOMAIN)
- This parameter defaults to the fully
qualified hostname of the machine it is evaluated on. See
section 3.4.5 on ``Shared
Filesystem Config File Entries'' below for the full description of
its use and under what conditions you would want to override it.
- $(UID_DOMAIN)
- This parameter defaults to the fully
qualified hostname of the machine it is evaluated on. See
section 3.4.5 on ``Shared
Filesystem Config File Entries'' below for the full description of
its use and under what conditions you would want to override it.
Since ARCH and OPSYS will automatically be set to the
right things, we recomend that you do not override them yourself.
Only do so if you know what you are doing.
3.4.1.5
Condor Subsystem Names
IMPORTANT NOTE: Many of the entries in the config file will
be named with the subsystem of the various Condor daemons.
This is a unique string which identifies a given daemon within the
Condor system. The possible subsystem names are:
- STARTD
- SCHEDD
- MASTER
- COLLECTOR
- NEGOTIATOR
- KBDD
- SHADOW
- STARTER
- CKPT_SERVER
- SUBMIT
In the description of the actual config file entries, ``SUBSYS'' will
stand for one of these possible subsystem names.
3.4.2
Condor-wide Config File Entries
This section describes settings which effect all parts of the Condor
system.
- CONDOR_HOST
- This macro is
just used to define the NEGOTIATOR_HOST and
COLLECTOR_HOST macros. Normally, the condor_collector
and condor_negotiator would run on the same machine. If for some
reason they weren't, CONDOR_HOST would not be needed. Some
of the host-based security macros use CONDOR_HOST by
default. See section 3.7 on ``Setting up
IP/host-based security in Condor'' for details.
- COLLECTOR_HOST
- The
hostname of the machine where the condor_collector is running for
your pool. Normally it would just be defined with the
CONDOR_HOST macro described above.
- NEGOTIATOR_HOST
- The
hostname of the machine where the condor_negotiator is running for
your pool. Normally it would just be defined with the
CONDOR_HOST macro described above.
- RELEASE_DIR
- The full path to
the Condor release directory, which holds the bin, etc, lib, and
sbin directories. Other macros are defined relative to this one.
- BIN
- This directory points to the
Condor bin directory, where user-level programs are installed. It
is usually just defined relative to the RELEASE_DIR macro.
- LIB
- This directory points to the
Condor lib directory, where libraries used to link jobs for Condor's
standard universe are stored. The condor_compile program uses
this macro to find these libraries, so it must be defined.
LIB is usually just defined relative to the
RELEASE_DIR macro.
- SBIN
- This directory points to the
Condor sbin directory, where Condor's system binaries (such as the
binaries for the Condor daemons) and administrative tools are
installed. Whatever directory SBIN points to should
probably be in the PATH of anyone who is acting as a Condor
administrator.
- LOCAL_DIR
- The location of the
local Condor directory on each machine in your pool. One common
option is to use the condor user's home directory which you could
specify with $(tilde). For example:
LOCAL_DIR = $(tilde)
On machines with a shared filesystem, where either the
$(tilde) directory or another directory you want to use is
shared among all machines in your pool, you might use the
$(hostname) macro and have a directory with many
subdirectories, one for each machine in your pool, each named by
hostnames. For example:
LOCAL_DIR = $(tilde)/hosts/$(hostname)
or:
LOCAL_DIR = $(release_dir)/hosts/$(hostname)
- LOG
- This entry is used to specify the
directory where each Condor daemon writes its log files. The names
of the log files themselves are defined with other macros, which use
the LOG macro by default. The log directory also acts as
the current working directory of the Condor daemons as the run, so
if one of them should drop a core file for any reason, it would wind
up in the directory defined by this macro. Normally, LOG is
just defined in terms of $(LOCAL_DIR).
- SPOOL
- The spool directory is where
certain files used by the condor_schedd are stored, such as the
job queue file, and the initial executables of any jobs that have
been submitted. In addition, if you are not using a checkpoint
server, all the checkpoint files from jobs that have been submitted
from a given machine will be store in that machine's spool
directory. Therefore, you will want to ensure that the spool
directory is located on a partition with enough disk space. If a
given machine is only setup to execute Condor jobs and not submit
them, it would not need a spool directory (or this macro defined).
Normally, SPOOL is just defined in terms of
$(LOCAL_DIR).
- EXECUTE
- This directory acts as
the current working directory of any Condor job that is executing on
the local machine. If a given machine is only setup to only submit
jobs and not execute them, it would not need an execute directory
(or this macro defined). Normally, EXECUTE is just defined
in terms of $(LOCAL_DIR).
- LOCAL_CONFIG_FILE
- The
location of the local, machine-specific config file for each machine
in your pool. The two most common options would be putting this
file in the $(LOCAL_DIR) you just defined, or putting all
local config files for your pool in a shared directory, each one
named by hostname. For example:
LOCAL_CONFIG_FILE = $(LOCAL_DIR)/condor_config.local
or:
LOCAL_CONFIG_FILE = $(release_dir)/etc/$(hostname).local
or, not using your release directory:
LOCAL_CONFIG_FILE = /full/path/to/configs/$(hostname).local
Begining with Condor version 6.0.1, the LOCAL_CONFIG_FILE
is treated as a list of files, not a single file. So, you can use
either a comma or space seperated list of files as its value. This
allows you to specify multiple files as the ``local config file''
and each one will be processed in order (with parameters set in
later files overridding values from previous files). This allows
you use one global config file for multiple platforms in your pool,
define a platform-specific config file for each platform, and
finally use a local config file for each machine. For more
information on this, see section 3.9.2 on
``Configuring Condor for Multiple Platforms'' on
page
.
- CONDOR_ADMIN
- This is the email
address that Condor will send mail to when something goes wrong in
your pool. For example, if a daemon crashes, the condor_master
can send an obituary to this address with the last few lines
of that daemon's log file and a brief message that describes what
signal or exit status that daemon exited with.
- MAIL
- This is the full path to a mail
sending program that understands that ``-s'' means you wish to
specify a subject to the message you're sending. On all platforms,
the default shipped with Condor should work. Only if you have
installed things in a non-standard location on your system would you
need to change this setting.
- RESERVED_SWAP
- This setting
determines how much swap space you want to reserve for your own
machine. Condor will not start up more condor_shadow processes if
the amount of free swap space on your machine falls below this
level.
- RESERVED_DISK
- This setting
determines how much disk space you want to reserve for your own
machine. When Condor is reporting the amount of free disk space in
a given partition on your machine, it will always subtract this
amount. For example, the condor_startd advertises the amount of
free space in the EXECUTE directory described above.
- LOCK
- Condor needs to create a few
lock files to synchronize access to various log files. Because of
problems we've had with network filesystems and file locking over
the years, we highly recommend that you put these lock
files on a local partition on each machine. If you don't have your
LOCAL_DIR on a local partition, be sure to change this
entry. Whatever user (or group) condor is running as needs to have
write access to this directory. If you're not running as root, this
is whatever user you started up the condor_master as. If you are
running as root, and there's a condor account, it's probably condor.
Otherwise, it's whatever you've set in the CONDOR_IDS
environment variable. See section 3.10.2 on ``UIDs in
Condor'' for details on this.
- HISTORY
- This entry defines the
location of the Condor history file, which stores information about
all Condor jobs that have completed on a given machine. This entry
is used by both the condor_schedd which appends the information,
and condor_history, the user-level program that is used to view
the history file.
- DEFAULT_DOMAIN_NAME
- If you don't use a fully qualified name in your /etc/hosts
file (or NIS, etc.) for either your official hostname or as an
alias, Condor wouldn't normally be able to use fully qualified names
in places that it'd like to. You can set this parameter to the
domain you'd like appended to your hostname, if changing your host
information isn't a good option. This parameter must be set in the
global config file (not the LOCAL_CONFIG_FILE specified
above. The reason for this is that the FULL_HOSTNAME
special macro is used by the config file code in Condor which needs
to know the full hostname. So, for DEFAULT_DOMAIN_NAME to
take effect, Condor must already have read in its value. However,
Condor must set the FULL_HOSTNAME special macro since you
might use that to define where your local config file is. So, after
reading the global config file, Condor figures out the right values
for HOSTNAME and FULL_HOSTNAME and inserts them
into its configuration table.
- CREATE_CORE_FILES
- Condor can be told whether or not you want the Condor daemons to
create a core file if something really bad happens. This just sets
the resource limit for the size of a core file. By default, we
don't do anything, and leave in place whatever limit was in effect
when you started the Condor daemons (normally the condor_master).
If this parameter is set and "True", we increase the limit to as
large as it gets. If it's set to "False", we set the limit at 0
(which means that no core files are even created). Core files
greatly help the Condor developers debug any problems you might be
having. By using the parameter, you don't have to worry about
tracking down where in your boot scripts you need to set the core
limit before starting Condor, etc. You can just set the parameter
to whatever behavior you want Condor to enforce. This parameter has
no default value, and is commented out in the default config file.
3.4.3
Daemon Logging Config File Entries
These entries control how and where the Condor daemons write their log
files. All of these entries are named with the subsystem (as
described in section 3.4.1 above) of the daemon
you wish to control logging for.
- SUBSYS_LOG
- This is the name of
the log file for the given subsystem. For example,
STARTD_LOG gives the location of the log file for the
condor_startd. These entries are defined relative to the
LOG macro described above. The actual names of the files
are also used in the VALID_LOG_FILES entry used by
condor_preen, which is described below. If you change one of the
filenames with this setting, be sure to change the
VALID_LOG_FILES entry as well, or condor_preen will
delete your newly named log files.
- MAX_SUBSYS_LOG
- This
setting controls the maximum length in bytes to which the various
logs will be allowed to grow. Each log file will grow to the
specified length, then be saved to a ``.old'' file. The ``.old''
files are overwritten each time the log is saved, thus the maximum
space devoted to logging for any one program will be twice the
maximum length of its log file.
- TRUNC_SUBSYS_LOG_ON_OPEN
- If this macro is defined and set
to ``True'' the affected log will be truncated and started from an
empty file with each invocation of the program. Otherwise, new
invocations of the program will simply append to the previous log
file. By default this setting is turned off for all daemons.
- SUBSYS_DEBUG
- All of the
Condor daemons can produce different levels of output depending on
how much information you want to see. The various levels of
verbosity for a given daemon are determined by this entry. All
daemons have a default level, D_ALWAYS, and log message for
that level will be printed to the daemon's log, regardless of what
you have set here. The other possible debug levels are:
- D_FULLDEBUG
- Generally, turning on
this setting provides very verbose output in the log files.
- D_DAEMONCORE
- This provides log
file entries for things that are specific to DaemonCore, such as
timers the daemons have set, the commands that are registered, and
so on. If both D_FULLDEBUG and D_DAEMONCORE are set,
you get VERY verbose output.
- D_PRIV
- This flag provides turns on log
messages about the privilege state switching that the daemons
do. See section 3.10.2 on UIDs in Condor for more details.
- D_COMMAND
- With this flag set, a
Any daemon that uses DaemonCore will print out a log message
whenever a command comes in. The name and integer of the command
are printed, whether the command was sent via UDP or TCP, and where
the command was sent from. Because the condor_kbdd works by
sending UDP commands to the condor_startd whenever there is
activity on the X server, we don't recommend turning on
D_COMMAND login in the condor_startd, since you will get so
many messages that the log file will be fairly useless to you. On
platforms that use the condor_kbdd, this is turned off in the
condor_startd by default.
- D_LOAD
- The condor_startd keeps track
of the load average on the machine where it is running. Both the
general system load average, and the load average being generated by
Condor's activity there. With this flag set, the condor_startd
will print out a log message with the current state of both of these
load averages whenever it computes them. This flag only effects the
condor_startd.
- D_JOB
- When this flag is set, the
condor_startd will dump out to its log file the contents of any
job ClassAd that the condor_schedd sends to claim the
condor_startd for its use. This flag only effects the
condor_startd.
- D_MACHINE
- When this flag is set,
the condor_startd will dump out to its log file the contents of
its resource ClassAd when the condor_schedd tries to claim the
condor_startd for its use. This flag only effects the
condor_startd.
- D_SYSCALLS
- This flag is used to
make the condor_shadow log remote syscall requests and return
values. This can help track down problems a user is having with a
particular job since you can see what system calls the job is
performing, which, if any, are failing, and what the reason for the
failure is. The condor_schedd also uses this flag for the server
portion of the queue management code. So, with D_SYSCALLS
defined in SCHEDD_DEBUG you will see verbose logging of all
queue management operations the condor_schedd performs.
3.4.4
DaemonCore Config File Entries
Please read section 3.6 on ``DaemonCore'' for details
about DaemonCore is. There are certain config file settings that
DaemonCore uses which affect all Condor daemons (except the checkpoint
server, shadow, and starter, none of which use DaemonCore yet).
- HOSTALLOW...
- All of the
settings that begin with either HOSTALLOW or
HOSTDENY are settings for Condor's host-based security.
Please see section 3.7 on ``Setting up
IP/host-based security in Condor'' for details on all of these
settings and how to configure them.
- SHUTDOWN_GRACEFUL_TIMEOUT
- This entry determines how long
you are willing to let daemons try their graceful shutdown methods
before they do a hard shutdown. It is defined in terms of seconds.
The default is 1800 (30 minutes).
- SUBSYS_ADDRESS_FILE
- Every Condor daemon that uses
DaemonCore has a command port where commands can be sent. The
IP/port of the daemon is put in that daemon's ClassAd so that other
machines in the pool can query the condor_collector (which listens
on a well-known port) to find the address of a given daemon on a
given machine. However, tools and daemons executing on the same
machine they wish to communicate with don't have to query the
collector, they can simply look in a file on the local disk to find
the IP/port. Setting this entry will cause daemons to write the
IP/port of their command socket to the file you specify. This way,
local tools will continue to operate, even if the machine running
the condor_collector crashes. Using this file will also generate
slightly less network traffic in your pool (since condor_q,
condor_rm, etc won't have to send any messages over the network to
locate the condor_schedd, for example). This entry is not needed
for the collector or negotiator, since their command sockets are at
well-known ports anyway.
- SUBSYS_EXPRS
- This entry
allows you to have the any DaemonCore daemon advertise arbitrary
expressions from the config file in its ClassAd. Give the
comma-separated list of entries from the config file you want in the
given daemon's ClassAd.
NOTE: The condor_negotiator and condor_kbdd do not send
ClassAds now, so this entry does not effect them at all. The
condor_startd, condor_schedd, condor_master, and
condor_collector do send ClassAds, so those would be valid
subsystems to set this entry for.
OTHER NOTE: Setting SUBMIT_EXPRS has the slightly
different effect of having the named expressions inserted into all
the job ClassAds that condor_submit creates. This is equivalent
to the ``+'' syntax in submit files. See the
condor_submit(1) man page for details.
OTHER NOTE: Because of the different syntax of the config
file and ClassAds, you might have to do a little extra work to get a
given entry into the ClassAd. In particular, ClassAds require quote
marks (") around your strings. Numeric values can go in directly,
as can expressions or boolean macros. For example, if you wanted
the startd to advertise a macro that was a string, a numeric macro,
and a boolean expression, you'd have to do something like the
following:
STRING_MACRO = This is a string macro
NUMBER_MACRO = 666
BOOL_MACRO = True
EXPR : CurrentTime >= $(NUMBER_MACRO) || $(BOOL_MACRO)
MY_STRING_MACRO = "$(STRING_MACRO)"
STARTD_EXPRS = MY_STRING_MACRO, NUMBER_MACRO, BOOL_MACRO, EXPR
3.4.5
Shared Filesystem Config File Entries
These entries control how Condor interacts with various shared and
network filesystems. If you are using AFS as your shared filesystem,
be sure to read section 3.9.1 on ``Using Condor with
AFS''
- UID_DOMAIN
- Often times,
especially if all the machines in the pool are administered by the
same organization, all the machines to be added into a Condor pool
share the same login account information. Specifically, does user X
have UID Y on all machines within a given Internet/DNS domain? This
is usually the case if a central authority creates user logins and
maintains a common /etc/passwd file on all machines (perhaps via
NIS/Yellow Pages, distributing the passwd file, etc). If this is the
case, then set this macro to the name of the Internet/DNS domain
where this is true. For instance, if all the machines in this Condor
pool within the Internet/DNS zone ``cs.wisc.edu'' have a common
passwd file, UID_DOMAIN would be set to ``cs.wisc.edu''. If
this is not the case you can comment out the entry and Condor will
automatically use the fully qualified hostname of each machine. If
you put in a ``*'', that means a wildcard to match all domains and
therefore to honor all UIDs - dangerous idea.
Condor uses this information to determine if it should run a given
Condor job on the remote execute machine with the UID of whomever
submitted the job or with the UID of user ``nobody''. If you set
this to ``none'' or don't set it at all, then Condor jobs will
always execute with the access permissions of user ``nobody''. For
security purposes, it is not a bad idea to have Condor jobs that
migrate around on machines across an entire organization to run as
user ``nobody'', which by convention has very restricted access to
the disk files of a machine. Standard Universe Condor jobs are
perfectly happy to run as user nobody since all I/O is redirected
back via remote system calls to a shadow process running on the
submit machine which is authenticated as the user. If you only plan
on running Standard Universe jobs, then it is a good idea to simply
set this to ``none'' or don't define it. Vanilla Universe jobs,
however, cannot take advantage of Condor's remote system calls.
Vanilla Universe jobs are dependent upon NFS, RFS, AFS, or some
shared filesystem setup to read/write files as they bounce around
from machine to machine. If you want to run Vanilla jobs and your
shared filesystems are via AFS, then you can safely leave this as
``none'' as well. But if you wish to use Vanilla jobs with Condor
and you have shared filesystems via NFS or RFS, then you should
enter in a legitimate domain name where all your UIDs match (you
should be doing this with NFS anyway!) on all machines in the pool,
or else users in your pool who submit Vanilla jobs will have to make
their files world read/write (so that user nobody can access them).
Some gritty details for folks who want to know: If the submitting
machine and the remote machine about to execute the job both have
the same login name in the passwd file for a given UID, and the
UID_DOMAIN claimed by the submit machine is indeed found to
be a subset of what an inverse lookup to a DNS (domain name server)
or NIS reports as the fully qualified domain name for the submit
machine's IP address (this security measure safeguards against the
submit machine from simply lying), THEN the job will run
with the same UID as the user who submitted the job. Otherwise it
will run as user ``nobody''.
Note: the UID_DOMAIN parameter is also used when Condor
sends email back to the user about a completed job; the address
Job-Owner@UID_DOMAIN is used, unless UID_DOMAIN
is ``none'', in which case Job-Owner@submit-machine is
used.
- SOFT_UID_DOMAIN
- This
setting is used in conjuction with the UID_DOMAIN setting
described above. If the UID_DOMAIN settings match on both
the execute and submit machines, but the uid of the user who
submitted the job isn't in the passwd file (or password info if NIS
is being used) of the execute machine, the condor_starter will
normally exit with an error. If you set SOFT_UID_DOMAIN
to be ``True'', Condor will simply start the job with the specified
uid, even if it's not in the passwd file.
- FILESYSTEM_DOMAIN
- This
setting is similar in concept to UID_DOMAIN, but here we
need the Internet/DNS domain name where all the machines within that
domain can access the same set of NFS file servers.
Often times, especially if all the machines in the pool are
administered by the same organization, all the machines to be added
into a Condor pool can mount the same set of NFS fileservers onto
the same place in the directory tree. Specifically, do all the
machines in the pool within a specific Internet/DNS domain mount the
same set of NFS file servers onto the same path mount-points? If
this is the case, then set this macro to the name of the
Internet/DNS domain where this is true. For instance, if all the
machines in the Condor pool within the Internet/DNS zone
``cs.wisc.edu'' have a common passwd file and mount the same volumes
from the same NFS servers, set FILESYSTEM_DOMAIN to
``cs.wisc.edu''. If this is not the case you can comment out the
entry, and Condor will automatically set it to the fully qualified
hostname of the local machine.
- HAS_AFS
- Set this to ``True'' if
all the machines you plan on adding in your pool all can access a
common set of AFS fileservers. Otherwise, set it to ``False''.
- FS_PATHNAME
- If you're using
AFS, Condor needs to know where the AFS ``fs'' command is located so
that it can verify the AFS cell-names of machines in the pool. The
default value of /usr/afsws/bin/fs is also the default that
AFS uses.
- VOS_PATHNAME
- If you're using
AFS, Condor needs to know where the AFS ``vos'' command is located
so that it can compare fileserver names of volumes. The default
value of /usr/afsws/etc/vos is also the default that AFS
uses.
- RESERVE_AFS_CACHE
- If
your machine is running AFS and the AFS cache lives on the same
partition as the other Condor directories, and you want Condor to
reserve the space that your AFS cache is configured to use, set this
entry to ``True''. It defaults to ``False''.
- USE_NFS
- This setting influences
how Condor jobs running in the Standard Universe will access their
files. Condor will normally always redirect the file I/O requests
of Standard Universe jobs back to be executed on the machine which
submitted the job. Because of this, as a Condor job migrates around
the network, the filesystem always appears to be identical to the
filesystem where the job was submitted. However, consider the case
where a user's data files are sitting on an NFS server. The machine
running the user's program will send all I/O over the network to the
machine which submitted the job, which in turn sends all the I/O
over the network a second time back to the NFS file server. Thus,
all of the program's I/O is being sent over the network twice.
If you set this macro to ``True'', then Condor will attempt to
read/write files without redirecting them back to the submitting
machine if both the submitting machine and the machine running the
job are both accessing the same NFS servers (if they're both in the
same FILESYSTEM_DOMAIN, as described above). The result is
I/O performed by Condor Standard Universe programs is only sent over
the network once.
While sending all file operations over the network twice might sound
really bad, unless you are operating over networks where bandwidth
as at a very high premium, practical experience reveals that this
scheme offers very little real performance gain. There are also
some (fairly rare) situations where this scheme can break down.
Setting USE_NFS to ``False'' is always safe. It may result
in slightly more network traffic, but Condor jobs are ideally heavy
on CPU and light on I/O anyway. It also ensures that a remote
Standard Universe Condor job will always use Condor's remote system
calls mechanism to reroute I/O and therefore see the exact same
filesystem that the user sees on the machine where she/he submitted
the job.
Some gritty details for folks who want to know: If the you set
USE_NFS to ``True'', and the FILESYSTEM_DOMAIN of
both the submitting machine and the remote machine about to execute
the job match, and the FILESYSTEM_DOMAIN claimed by the
submit machine is indeed found to be a subset of what an inverse
lookup to a DNS (domain name server) reports as the fully qualified
domain name for the submit machine's IP address (this security
measure safeguards against the submit machine from simply lying),
THEN the job will access files via a local system call,
without redirecting them to the submitting machine (a.k.a. with
NFS). Otherwise, the system call will get routed back to the
submitting machine via Condor's remote system call mechanism.
- USE_AFS
- If your machines have AFS
and the submit and execute machines are in the same AFS cell, this
setting determines whether Condor will use remote system calls for
Standard Universe jobs to send I/O requests to the submit machine,
or if it should use local file access on the execute machine (which
will then use AFS to get to the submitter's files). Read the
setting above on USE_NFS for a discussion of why you might
want to use AFS access instead of remote system calls.
One important difference between USE_NFS and
USE_AFS is the AFS cache. With USE_AFS set to
``True'', the remote Condor job executing on some machine will start
messing with the AFS cache, possibly evicting the machine owner's
files from the cache to make room for its own. Generally speaking,
since we try to minimize the impact of having a Condor job run on a
given machine, we don't recomend using this setting.
While sending all file operations over the network twice might sound
really bad, unless you are operating over networks where bandwidth
as at a very high premium, practical experience reveals that this
scheme offers very little real performance gain. There are also
some (fairly rare) situations where this scheme can break down.
Setting USE_AFS to ``False'' is always safe. It may result
in slightly more network traffic, but Condor jobs are ideally heavy
on CPU and light on I/O anyway. ``False'' ensures that a remote
Standard Universe Condor job will always see the exact same
filesystem that the user on sees on the machine where he/she
submitted the job. Plus, it will ensure that the machine where the
job executes doesn't have its AFS cache screwed up as a result of
the Condor job being there.
However, things may be different at your site, which is why the
setting is there.
3.4.6
Checkpoint Server Config File Entries
These entries control whether or not Condor user a checkpoint server.
In addition, if you are using a checkpoint server, this section
describes the settings that the checkpoint server itself needs to have
defined. If you decide to use a checkpoint server, you must install
it seperately (it is not included in the main Condor binary
distribution or installation procedure). See
section 3.3.5 on ``Installing a Checkpoint Server''
for details on installing and running a checkpoint server for your
pool.
NOTE: If you are setting up a machine to join to UW-Madison CS
Department Condor pool, you should configure the machine to
use a checkpoint server, and use ``condor-ckpt.cs.wisc.edu'' as the
checkpoint server host (see below).
- USE_CKPT_SERVER
- A boolean
which determines if you want a given machine machine to use the
checkpoint server for your pool.
- CKPT_SERVER_HOST
- The
hostname of the checkpoint server for your pool.
- CKPT_SERVER_DIR
- The
checkpoint server needs this macro defined to the full path of the
directory the server should use to store checkpoint files.
Depending on the size of your pool and the size of the jobs your
users are submitting, this directory (and it's subdirectories) might
need to store many megabytes of data.
3.4.7 condor_master Config File Entries
These settings control the condor_master.
- DAEMON_LIST
- This macro
determines what daemons the condor_master will start and keep its
watchful eyes on. The list is a comma or space seperated list of
subsystem names (described above in
section 3.4.1). For example,
DAEMON_LIST = MASTER, STARTD, SCHEDD
NOTE: On your central manager, your DAEMON_LIST
will be different from your regular pool, since it will include
entries for the condor_collector and condor_negotiator.
NOTE: On machines running Digital Unix or IRIX, your
DAEMON_LIST will also include ``KBDD'', for the
condor_kbdd, which is a special daemon that runs to monitor
keyboard and mouse activity on the console. It is only with this
special daemon that we can aquire this information on those
platforms.
- SUBSYS
- Once you have defined which
subsystems you want the condor_master to start, you must provide
it with the full path to each of these binaries. For example:
MASTER = $(SBIN)/condor_master
STARTD = $(SBIN)/condor_startd
SCHEDD = $(SBIN)/condor_schedd
Generally speaking, these would be defined relative to the
SBIN macro.
- PREEN
- In addition to the daemons
defined in DAEMON_LIST, the condor_master also starts up
a special process, condor_preen to clean out junk files that have
been left lying around by Condor. This macro determines where the
condor_master finds the preen binary. It also controls how
condor_preen behaves by the command-line arguments you pass to
``-m'' means you want email about files condor_preen finds that it
thinks it should remove. ``-r'' means you want condor_preen to
actually remove these files. If you don't want preen to run at all,
just comment out this setting.
- PREEN_INTERVAL
- This macro
determines how often condor_preen should be started. It is
defined in terms of seconds and defaults to 86400 (once a day).
- PUBLISH_OBITUARIES
- When a daemon crashes, the condor_master can send email to the
address specified by CONDOR_ADMIN with an obituary letting
the administrator know that the daemon died, what it's cause of
death was (which signal or exit status it exited with), and
(optionally) the last few entries from that daemon's log file. If
you want these obituaries, set this entry to ``True''.
- OBITUARY_LOG_LENGTH
- If you're getting obituaries, this setting controls how many lines
of the log file you want to see.
- START_MASTER
- If this setting
is defined and set to ``False'' when the master starts up, the first
thing it will do is exit. This may seem strange, but perhaps you
just don't want Condor to run on certain machines in your pool, yet
the boot scripts for your entire pool are handled by a centralized
system that starts up the condor_master automatically. This is
certainly an entry you'd most likely find in a local config file,
not your global config file.
- START_DAEMONS
- This setting
is similar to the START_MASTER macro described above.
However, the condor_master doesn't exit, it just doesn't start any
of the daemons listed in the DAEMON_LIST. This way, you
could start up the daemons at some later time with a condor_on
command if you wished.
- MASTER_UPDATE_INTERVAL
- This entry determines how often
the condor_master sends a ClassAd update to the
condor_collector. It is defined in seconds and defaults to 300
(every 5 minutes).
- MASTER_CHECK_NEW_EXEC_INTERVAL
- This
setting controls how often the condor_master checks the timestamps
of the daemons it's running. If any daemons have been modified, the
master restarts them. It is defined in seconds and defaults to 300
(every 5 minutes).
- MASTER_NEW_BINARY_DELAY
- Once the condor_master has
discovered a new binary, this macro controls how long it waits
before attempting to execute the new binary. This delay is here
because the condor_master might notice a new binary while you're
in the process of copying in new binaries and the entire file might
not be there yet (in which case trying to execute it could yield
unpredictable results). The entry is defined in seconds and
defaults to 120 (2 minutes).
- SHUTDOWN_FAST_TIMEOUT
- This macro determines the maximum
amount of time you're willing to give the daemons to perform their
fast shutdown procedure before the condor_master just kills them
outright. It is defined in seconds and defaults to 120 (2 minutes).
- MASTER_BACKOFF_FACTOR
- If a daemon keeps crashing, we
use exponential backoff so we wait longer and longer before
restarting it. At the end of this section, there is an example that
shows how all these settings work. This setting is the base of the
exponent used to determine how long to wait before starting the
daemon again. It defaults to 2.
- MASTER_BACKOFF_CEILING
- This entry determines the maximum
amount of time you want the master to wait between attempts to start
a given daemon. (With 2.0 as the MASTER_BACKOFF_FACTOR,
you'd hit 1 hour in 12 restarts). This is defined in terms of
seconds and defaults to 3600 (1 hour).
- MASTER_RECOVER_FACTOR
- How long should a daemon run
without crashing before we consider it recovered. Once a
daemon has recovered, we reset the number of restarts so the
exponential backoff stuff goes back to normal. This is defined in
terms of seconds and defaults to 300 (5 minutes).
Just for clarity, here's a little example of how all these exponential
backoff settings work. The example is worked out in terms of the
default settings.
When a daemon crashes, it is restarted in 10 seconds. If it keeps
crashing, we wait longer and longer before restarting it based on how
many times it's been restarted. We take the number of times the
daemon has restarted, take the MASTER_BACKOFF_FACTOR
(defaults to 2) to that power, and add 9. Sounds complicated, but
here's how it works:
1st crash: restarts == 0, so, 9 + 2^0 = 9 + 1 = 10 seconds
2nd crash: restarts == 1, so, 9 + 2^1 = 9 + 2 = 11 seconds
3rd crash: restarts == 2, so, 9 + 2^2 = 9 + 4 = 13 seconds
...
6th crash: restarts == 5, so, 9 + 2^5 = 9 + 32 = 41 seconds
...
9th crash: restarts == 8, so, 9 + 2^8 = 9 + 256 = 265 seconds
If the daemon kept dying and restarting, after the 13th crash, you'd
have:
13th crash: restarts == 12, so, 9 + 2^12 = 9 + 4096 = 4105 seconds
This is bigger than the MASTER_BACKOFF_CEILING, which
defaults to 3600, so the daemon would really be restarted after only
3600 seconds, not 4105. Assuming a few hours went by like this, with
the condor_master trying again every hour (since the numbers would
get even more huge, but would always be capped by the ceiling).
Eventually, imagine that daemon finally started and didn't crash (for
example, after the email you got about the daemon crashing, you
realized that you had accidentally deleted its binary so you
reinstalled it). If it stayed alive for
MASTER_RECOVER_FACTOR seconds (defaults to 5 minutes). We'd
reset the count of how many restarts this daemon has performed. So,
if 15 minutes later, it died again, it would be restarted in 10
seconds, not 1 hour.
The moral of the story is that this is some relatively complicated
stuff. The defaults we have work quite well, and you probably
won't want to change them for any reason.
- MASTER_EXPRS
- This setting is
described above in section 3.4.4 as
SUBSYS_EXPRS.
- MASTER_DEBUG
- This setting
(and other settings related to debug logging in the master) is
described above in section 3.4.3 as
SUBSYS_DEBUG.
- MASTER_ADDRESS_FILE
- This setting is described above in
section 3.4.4 as
SUBSYS_ADDRESS_FILE
3.4.8 condor_startd Config File Entries
These settings control general operation of the condor_startd.
Information on how to configure the condor_startd to start, suspend,
resume, vacate and kill remote Condor jobs can be found in a separate
top-level section, section 3.5 on
``Configuring The Startd Policy''. In there, you will find
information on the startd's states and activities. If
you see entries in the config file that are not described here, it is
because they control state or activity transitions within the
condor_startd and are described in
section`3.5.
- STARTER
- This macro holds the full
path to the regular condor_starter binary the startd should
spawn. It is normally defined relative to $(SBIN).
- ALTERNATE_STARTER_1
- This macro holds the full path to the special condor_starter.pvm
binary the startd spawns to service PVM jobs. It is normally
defined relative to $(SBIN), since by default,
condor_starter.pvm is installed in the regular Condor release
directory.
- POLLING_INTERVAL
- When a
startd is claimed, this setting determines how often we should poll
the state of the machine to see if we need to suspend, resume,
vacate or kill the job. Defined in terms of seconds and defaults to
5.
- UPDATE_INTERVAL
- This
entry determines how often the startd should send a ClassAd update
to the condor_collector. The startd also sends update on any
state or activity change, or if the value of its START expression
changes. See section 3.5.5 on ``condor_startd
States'', section 3.5.6 on ``condor_startd
Activities'', and section 3.5.3 on ``condor_startd
START expression'' for details on states, activities, and the
START expression respectively. This entry is defined in
terms of seconds and defaults to 300 (5 minutes).
- STARTD_HAS_BAD_UTMP
- Normally, when the startd is computing the idle time of all the
users of the machine (both local and remote), it checks the
utmp file to find all the currently active ttys, and only
checks access time of the devices associated with active logins.
Unfortunately, on some systems, utmp is unreliable, and the
startd might miss keyboard activity by doing this. So, if your
utmp is unreliable, set this setting to ``True'' and the
startd will check the access time on all tty and pty devices.
- CONSOLE_DEVICES
- This
macro allows the startd to monitor console (keyboard and mouse)
activity by checking the access times on special files in
/dev. Activity on these files shows up as ``ConsoleIdle''
time in the startd's ClassAd. Just give a comma-separated list of
the names of devices you want considered the console, without the
``/dev/'' portion of the pathname. The defaults vary from
platform to platform, and are usually correct.
One possible exception to this is that on Linux, we use ``mouse'' as
one of the entries here. Normally, Linux installations put in a
soft link from /dev/mouse that points to the appropriate
device (for example, /dev/psaux for a PS/2 bus mouse, or
/dev/tty00 for a serial mouse connected to com1). However,
if your installation doesn't have this soft link, you will either
need to put it in (which you'll be glad you did), or change this
setting to point to the right device.
Unfortunately, there are no such devices on Digital Unix or IRIX
(don't be fooled by /dev/keyboard0, etc, the kernel does not
update the access times on these devices) so this entry is not
useful there, and we must use the condor_kbdd to get this
information by connecting to the X server.
- STARTD_JOB_EXPRS
- When
the startd is claimed by a remote user, it can also advertise
arbitrary attributes from the ClassAd of the job its working on.
Just list the attribute names you want advertised. Note: since
these are already ClassAd expressions, you don't have to do anything
funny with strings, etc.
- STARTD_EXPRS
- This setting is
described above in section 3.4.4 as
SUBSYS_EXPRS.
- STARTD_DEBUG
- This setting
(and other settings related to debug logging in the startd) is
described above in section 3.4.3 as
SUBSYS_DEBUG.
- STARTD_ADDRESS_FILE
- This setting is described above in
section 3.4.4 as
SUBSYS_ADDRESS_FILE
3.4.9 condor_schedd Config File Entries
These settings control the condor_schedd.
- SHADOW
- This macro determines the
full path of the condor_shadow binary that the condor_schedd
spawns. It is normally defined in terms of $(SBIN).
- SHADOW_PVM
- This macro
determines the full path of the special condor_shadow.pvm binary
used for supporting PVM jobs that the condor_schedd spawns. It is
normally defined in terms of $(SBIN).
- MAX_JOBS_RUNNING
- This
setting controls the maximum number of condor_shadow processes
you're willing to let a given condor_schedd spawn. The actual
number of condor_shadow's might be less than that if you reached
your RESERVED_SWAP limit.
- MAX_SHADOW_EXCEPTIONS
- This setting controls the maximum
number of times that a condor_shadow processes can have a fatal
error (exception) before the condor_schedd will simply relinquish
the match associated with the dying shadow. Defaults to 5.
- SCHEDD_INTERVAL
- This
entry determines how often the condor_schedd should send a ClassAd
update to the condor_collector. It is defined in terms of seconds
and defaults to 300 (every 5 minutes).
- JOB_START_DELAY
- When the
condor_schedd has finished negotiating and has a lot of new
condor_startd's that it has claimed, the condor_schedd can wait
a certain delay before starting up a condor_shadow for each job
it's going to run. This prevents a sudden, large load on the submit
machine as it spawns many shadows simultaneously, and having to deal
with their startup activity all at once. This macro determines how
how long the condor_schedd should wait in between spawning each
condor_shadow. Defined in terms of seconds and defaults to 2.
- ALIVE_INTERVAL
- This
setting determines how often the schedd should send a keep alive
message to any startd it has claimed. When the schedd claims a
startd, it tells the startd how often it's going to send these
messages. If the startd doesn't get one of these messages after 3
of these intervals has passed, the startd releases the claim, and
the schedd is no longer paying for the resource (in terms of
priority in the system). The macro is defined in terms of seconds
and defaults to 300 (every 5 minutes).
- SHADOW_SIZE_ESTIMATE
- This entry is the estimated virtual memory size of each
condor_shadow process. Specified in kilobytes. The default
varies from platform to platform.
- SHADOW_RENICE_INCREMENT
- When the schedd spawns a new
condor_shadow, it can do so with a nice-level. This is a
mechanism in UNIX where you can assign your own processes a lower
priority so that they don't interfere with interactive use of the
machine. This is very handy for keeping a submit machine with lots
of shadows running still useful to the owner of the machine. The
entry can be any integer between 1 and 19. It defaults to 10.
- QUEUE_CLEAN_INTERVAL
- The schedd maintains the job queue on a given machine. It does so
in a persistent way such that if the schedd crashes, it can recover
a valid state of the job queue. The mechanism it uses is a
transaction-based log file (this is the job_queue.log file,
not the SchedLog file). This file contains some initial
state of the job queue, and a series of transactions that were
performed on the queue (such as new jobs submitted, jobs completing,
checkpointing, whatever). Periodically, the schedd will go through
this log, truncate all the transactions and create a new file with
just the new initial state of the log. This is a somewhat expensive
operation, but it speeds up when the schedd restarts since there are
less transactions it has to play to figure out what state the job
queue is really in. This macro determines how often the schedd
should to this ``queue cleaning''. It is defined in terms of
seconds and defaults to 86400 (once a day).
- ALLOW_REMOTE_SUBMIT
- Starting with Condor Version 6.0, users can run condor_submit on
one machine and actually submit jobs to another machine in the
pool. This is called a remote submit. Jobs submitted in
this way are entered into the job queue owned by user ``nobody''.
This entry determines whether you want to allow such a thing to
happen to a given schedd. It defaults to ``False''.
- QUEUE_SUPER_USERS
- This
macro determines what usernames on a given machine have
super-user access to your job queue, meaning that they can
modify or delete the job ClassAds of other users. (Normally, you
can only modify or delete ClassAds that you own from the job queue).
Whatever username corresponds with the UID that Condor is running as
(usually ``condor'') will automatically get included in this list
because that is needed for Condor's proper functioning. See
section 3.10.2 on ``UIDs in Condor'' for more details on
this. By default, we give ``root'' the ability to remove other
user's jobs, in addition to user ``condor''.
- SCHEDD_LOCK
- This entry
specifies what lock file should be used for access to the
SchedLog file. It must be a separate file from the
SchedLog, since the SchedLog may be rotated and you
want to be able to synchronize access across log file rotations.
This macro is defined relative to the LOCK macro described
above. If, for some strange reason, you decide to change this
setting, be sure to change the VALID_LOG_FILES entry that
condor_preen uses as well.
- SCHEDD_EXPRS
- This setting is
described above in section 3.4.4 as
SUBSYS_EXPRS.
- SCHEDD_DEBUG
- This setting
(and other settings related to debug logging in the schedd) is
described above in section 3.4.3 as
SUBSYS_DEBUG.
- SCHEDD_ADDRESS_FILE
- This setting is described above in
section 3.4.4 as
SUBSYS_ADDRESS_FILE
3.4.10 condor_shadow Config File Entries
This setting effects the condor_shadow
- MAX_DISCARDED_RUN_TIME
- If the shadow is unable to read a
checkpoint file from the checkpoint server, it keeps trying only if
the job has accumulated more than this many seconds of CPU usage.
Otherwise, the job is started from scratch. Defaults to 3600 (1
hour). This setting is only used if USE_CKPT_SERVER is
True.
- SHADOW_LOCK
- This entry
specifies what lock file should be used for access to the
ShadowLog file. It must be a separate file from the
ShadowLog, since the ShadowLog may be rotated and you
want to be able to synchronize access across log file rotations.
This macro is defined relative to the LOCK macro described
above. If, for some strange reason, you decide to change this
setting, be sure to change the VALID_LOG_FILES entry that
condor_preen uses as well.
- SHADOW_DEBUG
- This setting
(and other settings related to debug logging in the shadow) is
described above in section 3.4.3 as
SUBSYS_DEBUG.
3.4.11 condor_shadow.pvm Config File Entries
These settings control the condor_shadow.pvm, the special shadow
that supports PVM jobs inside Condor. See
section
``Installing PVM Support in
Condor'' for details.
- PVMD
- This macro holds the full path
to the special condor_pvmd, the Condor PVM Daemon. This daemon is
installed in the regular Condor release directory by default, so the
macro is usually defined in terms of $(SBIN).
- PVMGS
- This macro holds the full
path to the special condor_pvmgs, the Condor PVM Group Server
Daemon, which is needed to support PVM groups. This daemon is
installed in the regular Condor release directory by default, so the
macro is usually defined in terms of $(SBIN).
- SHADOW_DEBUG
- This setting
(and other settings related to debug logging in the shadow) is
described above in section 3.4.3 as
SUBSYS_DEBUG.
3.4.12 condor_starter Config File Entries
This setting effects the condor_starter.
- JOB_RENICE_INCREMENT
- When the starter spawns a Condor job, it can do so with a
nice-level. This is a mechanism in UNIX where you can assign
your own processes a lower priority so that they don't interfere
with interactive use of the machine. If you have machines with lots
of real memory and swap space so the only scarce resource is CPU
time, you could use this setting in conjunction with a policy that
always allowed Condor to start jobs on your machines so that Condor
jobs would always run, but interactive response on your machines
would never suffer. You probably wouldn't even notice Condor was
running jobs. See section 3.5 on
``Configuring The Startd Policy'' for full details on setting up a
policy for starting and stopping jobs on a given machine. The entry
can be any integer between 1 and 19. It is commented out by
default.
- STARTER_LOCAL_LOGGING
- This macro determines whether the
starter should do local logging to its own log file, or send debug
information back to the condor_shadow where it will end up in the
ShadowLog. It defaults to ``True''
- STARTER_DEBUG
- This setting
(and other settings related to debug logging in the starter) is
described above in section 3.4.3 as
SUBSYS_DEBUG.
3.4.13 condor_submit Config File Entries
If you want condor_submit to automatically append an expression to
the Requirements expression or Rank expression of jobs at your site
use the following entries:
- APPEND_REQ_VANILLA
- Expression to append to vanilla job requirements.
- APPEND_REQ_STANDARD
- Expression to append to standard job requirements.
- APPEND_RANK_STANDARD
- Expression to append to vanilla job rank.
- APPEND_RANK_VANILLA
- Expression to append to standard job rank.
IMPORTANT NOTE: The APPEND_RANK_STANDARD and
APPEND_RANK_VANILLA macros were called
``APPEND_PREF_STANDARD'' and
``APPEND_PREF_VANILLA'' in previous versions of Condor.
In addition, you can provide default Rank expressions if your users
don't specify their own:
- DEFAULT_RANK_VANILLA
- Default Rank for vanilla jobs.
- DEFAULT_RANK_STANDARD
- Default Rank for standard jobs.
Both of these macros default to the jobs preferring machines where
there is more main memory than the image size of the job, expressed
as:
((Memory*1024) > Imagesize)
3.4.14 condor_preen Config File Entries
These settings control condor_preen.
- PREEN_ADMIN
- This entry
determines what email address condor_preen will send email to (if
it's configured to send email at all... see the entry for
PREEN above). Defaults to $(CONDOR_ADMIN).
- VALID_SPOOL_FILES
- This
entry contains a (comma or space separated) list of files that
condor_preen considers valid files to find in the SPOOL
directory. Defaults to all the files that are valid. If you change
the HISTORY setting above, you'll want to change this
setting as well.
- VALID_LOG_FILES
- This
entry contains a (comma or space separated) list of files that
condor_preen considers valid files to find in the LOG
directory. Defaults to all the files that are valid. If you change
the names of any of the log files above, you'll want to change this
setting as well. In addition the defaults for the
SUBSYS_ADDRESS_FILE are listed here, so if you change
those, you'll need to change this entry, too.
3.4.15 condor_collector Config File Entries
These settings control the condor_collector.
- CLASSAD_LIFETIME
- This
macro determines how long a ClassAd can remain in the collector
before it is discarded as stale information. The ClassAds sent to
the collector might also have an attribute that says how long the
lifetime should be for that specific ad. If that attribute is
present the collector will either use it or the
CLASSAD_LIFETIME, whichever is greater. The macro is
defined in terms of seconds, and defaults to 900 (15 minutes).
- MASTER_CHECK_INTERVAL
- This setting defines often the
collector should check for machines that have ClassAds from some
daemons, but not from the condor_master (orphaned daemons)
and send email about it. Defined in seconds, defaults to 10800 (3
hours)
- CLIENT_TIMEOUT
- Network
timeout when talking to daemons that are sending an update. Defined
in seconds, defaults to 30.
- QUERY_TIMEOUT
- Network
timeout when talking to anyone doing a query. Defined in seconds,
defaults to 60.
- CONDOR_DEVELOPERS
- Condor will send email once per week to this address with the output
of the condor_status command, which simply lists how many machines
are in the pool and how many are running jobs. Use the default
value of ``condor-admin@cs.wisc.edu''. This default will send the
weekly status message to the Condor Team at University of
Wisconsin-Madison, the developers of Condor. The Condor Team uses
these weekly status messages in order to have some idea as to how
many Condor pools exist in the world. We would really appreciate
getting the reports as this is one way we can convince funding
agencies that Condor is being used in the ``real world''. If you do
not wish this information to be sent to the Condor Team, you could
enter ``NONE'' which disables this feature, or put in some other
address that you want the weekly status report sent to.
- COLLECTOR_NAME
- The parameter is used to specify a short description of your pool.
It should be about 20 characters long. For example, the name of the
UW-Madison Computer Science Condor Pool is ``UW-Madison CS''.
- CONDOR_DEVELOPERS_COLLECTOR
- By default, every pool sends
periodic updates to a central condor_collector at UW-Madison with
basic information about the status of your pool. This includes only
the number of total machines, the number of jobs submitted, the
number of machines running jobs, the hostname of your central
manager, and the COLLECTOR_NAME specified above. These
updates help us see how Condor is being used around the world. By
default, they will be sent to condor.cs.wisc.edu. If you don't want
these updates to be sent from your pool, set this entry to
``NONE''.
- COLLECTOR_DEBUG
- This setting
(and other settings related to debug logging in the collector) is
described above in section 3.4.3 as
SUBSYS_DEBUG.
3.4.16 condor_negotiator Config File Entries
These settings control the condor_negotiator.
- NEGOTIATOR_INTERVAL
- How often should the negotiator start a negotiation cycle? Defined
in seconds, defaults to 300 (5 minutes).
- NEGOTIATOR_TIMEOUT
- What timeout should the negotiator use on it's network connections
to the schedds and startds? Defined in seconds, defaults to 30.
- PRIORITY_HALFLIFE
- This
entry defines the half-life of the user priorities. See
section 2.8.2
on User Priorities for more details. Defined in seconds, defaults
to 86400 (1 day).
- PREEMPTION_HOLD
- If the
PREEMPTION_HOLD expression evaluates to true, the
condor_negotiator won't preempt the job running on a given machine
even if a user with a higher priority has jobs they want to run.
This helps prevents thrashing. The default is to wait 2 hours
before preempting any job.
- NEGOTIATOR_DEBUG
- This setting
(and other settings related to debug logging in the negotiator) is
described above in section 3.4.3 as
SUBSYS_DEBUG.
Next: 3.5 Configuring The Startd
Up: 3. Administrators' Manual
Previous: 3.3 Installing Contrib Modules
condor-admin@cs.wisc.edu