This section contains the instructions for installing Condor at your site. Condor's installation will setup a default configuration which you can then learn how to customize in the sections which follow.
Please read the the copyright and disclaimer information in
section on
page
of the manual, or in the
file LICENSE.TXT, before proceeding. Installation and
use of Condor is acknowledgement that you have read and agreed to these
terms.
The Condor binary distribution is packaged in the following 5 files and 2 directories:
Before you install, please consider joining the condor-world mailing list. Traffic on this list is kept to an absolute minimum. It is only used to announce new releases of Condor. To subscribe, send a message to majordomo@cs.wisc.edu with the body:
subscribe condor-world
Before you install Condor at your site, there are a few important decisions you must make about the basic layout of your pool. These are:
If you feel you already know the answers to these questions, you can skip to the 'Installation Procedure' section below, section 3.2.2. If you are unsure about any of them, read on.
One machine in your pool must be the Central Manager. You should setup and install Condor on this machine first. This is the centralized information repository for the Condor pool and is also the machine that does match-making between available machines and waiting jobs. If the Central Manager machine crashes, any currently active matches in the system will keep running, but no new matches will be made. Moreover, most Condor tools will stop working. Because of the importance of this machine for the proper functioning of Condor, we recommend you install it on a machine that is likely to stay up all the time, or at the very least, one that will be rebooted quickly if it does crash. Also, because all the daemons will send updates (by default every 5 minutes) to this machine, it is advisable to consider network traffic and your network layout when choosing your central manager.
We strongly recommend that you start up the Condor daemons as root.
Otherwise, Condor can do very little to enforce security and policy
decisions. If you don't have root access and would like to install
Condor, under most platforms you can run Condor under any user you'd
like. However, there are serious security consequences of this.
Please see section 3.10.1 on page
in the manual for details on running Condor as non-root.
Either root will be administering Condor directly, or someone else would be acting as the Condor administrator. If root has delegated the responsibility to another person but doesn't want to grant that person root access, root can specify a condor_config.root file that will override settings in the other condor config files. This way, the global condor_config file can be owned and controlled by whoever is condor-admin, and the condor_config.root can be owned and controlled only by root. Settings that would compromise root security (such as which binaries are started as root) can be specified in the condor_config.root file while other settings that only control policy or condor-specific settings can still be controlled without root access.
To simplify installation of Condor at your site, we recommend that you create a 'condor' user on all machines in your pool. The condor daemons will create files (such as the log files) owned by this user, and the home directory can be used to specify the location of files and directories needed by Condor. The home directory of this user can either be shared among all machines in your pool, or could be a separate home directory on the local partition of each machine. Both approaches have advantages and disadvantages. Having the directories centralized can make administration easier, but also concentrates the resource usage such that you potentially need a lot of space for a single shared home directory. See the section below on machine-specific directories for more details.
If you choose not to create a condor user, you must specify via the
CONDOR_IDS environment variable which uid.gid pair should be used for
the ownership of various Condor files. See section 3.10.2 on
``UIDs in Condor'' on page in the Administrator's
Manual for details.
Condor needs a few directories that are unique on every machine in your pool. These are 'spool', 'log', and 'execute'. Generally, all three are subdirectories of a single machine specific directory called the 'local directory' (specified by the LOCAL_DIR parameter in the config file).
If you have a 'condor' user with a local home directory on each machine, the LOCAL_DIR could just be user condor's home directory ('LOCAL_DIR = $(TILDE)' in the config file). If this user's home directory is shared among all machines in your pool, you would want to create a directory for each host (named by hostname) for the local directory ('LOCAL_DIR = $(TILDE)/hosts/$(HOSTNAME)' for example). If you don't have a condor account on your machines, you can put these directories wherever you'd like. However, where to place them will require some thought, as each one has its own resource needs:
Generally speaking, we recommend that you do not put these directories (except lock) on the same partition as /var, since if the partition fills up, you will fill up /var as well, which will cause lots of problems for your machines. Ideally, you'd have a separate partition for the Condor directories that the only consequence of filling up would be Condor's malfunction, not your whole machine.
In addition, because we recommend that you start the Condor daemons as
root, we allow you to create config files that are owned and
controlled by root that will override any other condor settings. This
way, if the condor administrator isn't root, the regular condor config
files can be owned and writable by condor-admin, but root doesn't have
to grant root access to this person. See
section 3.10.3 on page in the
manual for a detailed discussion of the root config files, if you
should use them, and what settings should be in them.
In general, there are a number of places that condor will look to find its config files. The first file it looks for is the global config file. These locations are searched in order until a config file is found. If none contain a valid config file, Condor will print an error message and exit:
Next, Condor tries to load the machine-specific, or local config file. The only way to specify the local config file is in the global config file, with the LOCAL_CONFIG_FILE macro. If that macro isn't set, no local config file is used. Beginning with Condor version 6.0.1, this macro can be a list of files instead of a single file.
The root config files come in last. The global file is searched for in the following places:
The local root config file is found with the LOCAL_ROOT_CONFIG_FILE macro. If that isn't set, no local root config file is used. Beginning with Condor version 6.0.1, this macro can also be a list of files instead of a single file.
Every binary distribution contains a 'release.tar' file that contains four subdirectories: 'bin', 'etc', 'lib' and 'sbin'. Wherever you choose to install these 4 directories we call the 'release directory' (specified by the 'RELEASE_DIR' parameter in the config file). Each release directory contains platform dependent binaries and libraries, so you will need to install a separate one for each kind of machine in your pool.
All of the files in the 'bin' directory are programs the end Condor users should expect to have in their path. You could either put them in a well known location (such as /usr/local/condor/bin) which you have Condor users add to their PATH environment variable, or copy those files directly into a well known place already in user's PATHs (such as /usr/local/bin). With the above examples, you could also leave the binaries in /usr/local/condor/bin and put in soft links from /usr/local/bin to point to each program.
All of the files in the 'sbin' directory are Condor daemons and agents, or programs that only the Condor administrator would need to run. Therefore, we recommend that you only add these programs to the PATH of the Condor administrator.
The files in the 'lib' directory are the condor libraries that must be linked in with user jobs for all of Condor's checkpointing and migration features to be used. 'lib' also contains scripts used by the condor_compile program to help relink jobs with the condor libraries. These files should be placed in a location that is world-readable, but they do not need to be placed in anyone's PATH. The condor_compile script checks the config file for the location of the lib directory.
'etc' contains an 'examples' subdirectory which holds various example config files and other files used for installing Condor. 'etc' is the recommended location to keep the master copy of your config files. You can put in soft links from one of the places mentioned above that Condor checks automatically to find it's global config file.
The documentation provided with Condor is currently only available in HTML, Postscript and PDF (Adobe Acrobat). It can be locally installed wherever is customary at your site. You can also find the Condor documentation on the web at: http://www.cs.wisc.edu/condor/manual.
If you are using AFS at your site, be sure to read the
section 3.9.1 on page in the
manual. Condor does not currently have a way to authenticate itself
to AFS. We're working on a solution, it's just not ready for version
6.0. So, what this means is that you are probably not going to want
to have the LOCAL_DIR for Condor on AFS. However, you can
(and probably should) have the Condor RELEASE_DIR on AFS, so
that you can share one copy of those files and upgrade them in a
centralized location. You will also have to do something special if
you submit jobs to Condor from a directory on AFS. Again, read manual
section 3.9.1 for all the gory details.
The Condor release directory takes up a fair amount of space. This is another reason why it's a good idea to have it on a shared filesystem. The rough size requirements for the release directory on various platforms are listed in table 3.1.
|
In addition, you will need a lot of disk space in the local directory of any machines that are submitting jobs to Condor. See question 5 above for details on this.
IF YOU HAVE DECIDED TO CREATE A 'condor' USER AND GROUP, YOU SHOULD DO THAT ON ALL YOUR MACHINES BEFORE YOU DO ANYTHING ELSE.
The easiest way to install Condor is to use one or both of the scripts provided to help you: condor_install and condor_init. You should run these scripts as the user that you are going to run the Condor daemons as. First, run condor_install on the machine that will be a fileserver for shared files used by Condor, such as the release directory, and possibly the condor user's home directory. When you do, choose the ``full-install'' option in step #1 described below.
Once you have run condor_install on a file server to setup your release directory and configure Condor for your site, you should run condor_init on any other machines in your pool to create any locally used files that aren't created by condor_install. In the most simple case, where nearly all of Condor is installed on a shared file system, even though Condorinstall will create nearly all the files and directories you need, you will still need to use condor_init to create the LOCK directory on the local disk of each machine. If you have a shared release directory, but the LOCAL_DIR is local on each machine, condor_init will create all the directories and files needed in LOCAL_DIR. In addition, condor_init will create any soft links on each machine that are needed so that Condor can find its global config file.
If you don't have a shared filesystem, you will need to run condor_install on each machine in your pool to setup Condor. In this case, there is no need to run condor_init at all.
In addition, you will want to run condor_install on your central
manager machine if that machine is different from your file server,
using the ``central-manager'' option in step #1 described below. Run
condor_install on your file server first, then on your central
manager. If this step fails for some reason (NFS permissions, etc),
you can do it manually quite easily. All this does is copy the
condor_config.local.central.manager file from
<release_dir>/etc/examples to the proper location for the local config
file of your central manager machine. If your central manager is an
Alpha or an SGI, you might want to add ``KBDD'' to the
DAEMON_LIST parameter. See
section 3.4 ``Configuring Condor'' on
page of the manual for details.
condor_install assumes you have perl installed in /usr/bin/perl. If this is not the case, you can either edit the script to put the right path in, or you will have to invoke perl directly from your shell (assuming perl is in your PATH):
% perl condor_install
condor_install breaks down the installation procedure into various steps. Each step is clearly numbered. The following section explains what each step is for, and suggests how to answer the questions condor_install will ask you for each one.
There are three types of Condor installation you might choose: 'submit-only', 'full-install', and 'central-manager'. A submit-only machine can submit jobs to a Condor pool, but Condor jobs will not run on it. A full-install machine can both submit and run Condor jobs.
If you are planning to run Condor jobs on your machines, you should either install and run Condor as root, or as user 'condor'.
If you are planning to setup a submit only machine, you can either install Condor machine-wide as root or user 'condor', or, you can install Condor as yourself into your home directory.
The other possible installation type is setting up a machine as a central manager. If you do a full-install and you say that you want the local host to be your central manager, this step will be done automatically. You should only choose the central-manager option at step 1 if you have already run condor_install on your file server and you now want to run condor_install on a different machine that will be your central manager.
If you are installing Condor for multiple machines, and you have a shared file system, condor_install will prompt you for the hostnames of each machine you want to add to your Condor pool. If you don't have a shared file system, you will have to run condor_install locally on each machine, anyway, so it doesn't bother asking you for the names. If you provide a list, it will use the names to automatically create directories and files later. At the end, condor_install will dump out this list to a 'roster' file which can be used by scripts to help maintain your Condor pool.
If you are only installing Condor on 1 machine, you would just answer 'no' to the first question, and move on.
If you have multiple machines with a shared filesystem that will be running Condor, you should put the release directory on that shared filesystem so you only have one copy of all the binaries, and so that when you update them, you can do so in one place. Note that the release directory is architecture dependent, so you will need to download separate binary distributions for every platform in your pool.
condor_install tries to find an already installed release directory. If it can't find one, it asks if you have installed one already. If you have not installed one, it tries to do so for you by untarring the release.tar file from the binary distribution.
NOTE: If you are only setting up a central manager (you chose 'central manager' in step 1) step 3 is the last question you will need to answer.
Various parts of Condor will send email to a condor administrator if something goes wrong that needs human attention. You will need to specify the email address of this administrator.
You will also need to specify the full path to a mail program that Condor will use to send the email. This program needs to understand the '-s' option, which is how you specify a subject for the outgoing message. The default on most platforms will probably be correct. On Linux machines, since there is such variation in Linux distributions and installations, you should verify that the default works. If the script complains that it cannot find the mail program that was specified, you can try 'which mail' from your shell prompt to see what 'mail' program is currently in your PATH. If there is none, try 'which mailx'. If you still can't find anything, ask your system administrator. You should verify that the program you end up using supports '-s'. The man page for that program will probably tell you.
It is recommended that you install the user-level condor programs in the release directory, (where they go by default). This way, when you want to install a new version of the Condor binaries, you can just replace your release directory and everything will be updated at once. So, one option is to have Condor users add <release_dir>/bin to their PATH, so that they can access the programs. However, we recommend putting in soft links from some directory already in their PATH (such as /usr/local/bin) that point back to the Condor user programs. condor_install will do this for you, all you have to do is tell it what directory to put these links into. This way, users don't have to change their PATH to use Condor but you can still have the binaries installed in their own location.
If you are installing Condor as neither root nor condor, there is a perl script wrapper to all the Condor tools that is created which sets some appropriate environment variables and automatically passes certain options to the tools. This is all created automatically by condor_install. So, you need to tell condor_install where to put this perl script. The script itself is linked to itself with many different names, since it is the name that determines the behavior of the script. This script should go somewhere that is in your PATH already, if possible (such as ~bin).
At this point, the remaining steps are different depending on what kind of installation you are doing. Skip to the appropriate section depending on what kind of installation you selected in STEP 1 above.
Simply type in the full hostname of the machine you have chosen for your central manager. If condor_install can't find information about the host you typed by querying your nameserver, it will print out an error message and ask you to confirm.
This is the directory discussed in question #5 from the introduction. condor_install tries to make some educated guesses as to what directory you want to use for the purpose. Simply agree to the correct guess, or (when condor_install has run out of guesses) type in what you want. Since this directory needs to be unique, it is common to use the hostname of each machine in its name. When typing in your own path, you can use '$(HOSTNAME)' which condor_install (and the Condor config files) will expand to the hostname of the machine you are currently on. condor_install will try to create the corresponding directories for all the machines you told it about in STEP 2 above.
Once you have selected the local directory, condor_install creates all the needed subdirectories of each one with the proper permissions. They should have the following permissions and ownerships:
drwxr-xr-x 2 condor root 1024 Mar 6 01:30 execute/ drwxr-xr-x 2 condor root 1024 Mar 6 01:30 log/ drwxr-xr-x 2 condor root 1024 Mar 6 01:30 spool/
If your local directory is on a shared file system, condor_install will prompt you for the location of your lock files, as discussed in question #5 above. In this case, when condor_install is finished, you will have to run condor_init on each machine in your pool to create the lock directory before you can start up Condor.
As discussed in question #6 above, there are a few different levels of Condor config file. There's the global config file that will be installed in <release_dir>/etc/condor_config, and there are machine-specific, or local config files that override the settings in the global file. If you are installing on multiple machines or are configuring your central manager machine, you must select a location for your local config files.
The two main options are to have a single directory that holds all the local config files, each one named '$(HOSTNAME).local', or to have the local config files go into the individual local directories for each machine. Given a shared filesystem, we recommend the first option, since it makes it easier to configure your pool from a centralized location.
Since there are a few known places Condor looks to find your config file, we recommend that you put a soft link from one of them to point to <release_dir>/etc/condor_config. This way, you can keep your Condor configuration in a centralized location, but all the Condor daemons and tools will be able to find their config files. Alternatively, you can set the CONDOR_CONFIG environment variable to contain <release_dir>/etc/condor_config.
condor_install will ask you if you want to create a soft link from either of the two fixed locations that Condor searches.
Once you have completed STEP 9, you're done. condor_install prints out a messages describing what to do next. Please skip to section 3.2.3.
Now that Condor has been installed on your machine(s), there are a few things you should check before you start up Condor.
To start up the Condor daemons, all you need to do is execute <release_dir>/sbin/condor_master. This is the Condor master, whose only job in life is to make sure the other Condor daemons are running. The master keeps track of the daemons, restarts them if they crash, and periodically checks to see if you have installed new binaries (and if so, restarts the affected daemons).
If you're setting up your own pool, you should start Condor on your central manager machine first. If you have done a submit-only installation and are adding machines to an existing pool, it doesn't matter what order to start them in.
To ensure that Condor is running, you can run either:
ps -ef | egrep condor_or
ps -aux | egrep condor_depending on your flavor of Unix. On your central manager machine you should have processes for:
Once you're sure the Condor daemons are running, check to make sure that they are communicating with each other. You can run condor_status to get a one line summary of the status of each machine in your pool.
Once you're sure Condor is working properly, you should add ``condor_master" into your startup/bootup scripts (i.e. /etc/rc ) so that your machine runs condor_master upon bootup. condor_master will then fire up the neccesary Condor daemons whenever your machine is rebooted.
If your system uses System-V style init scripts, you can look in <release_dir>/etc/examples/condor.boot for a script that can be used to start and stop Condor automatically by init. Normally, you would install this script as /etc/init.d/condor and put in soft link from various directories (for example, /etc/rc2.d) that point back to /etc/init.d/condor. The exact location of these scripts and links will vary on different platforms.
If your system uses BSD style boot scripts, you probably have an /etc/rc.local file. Just add a line in there to start up <release_dir>/sbin/condor_master and you're done.
Now that the Condor daemons are running, there are a few things you can and should do: